INTERNETARCHIVE.BAK/iabak-sharp implementation

From Archiveteam
< INTERNETARCHIVE.BAK
Revision as of 21:48, 3 July 2020 by Antiufo (talk | contribs) (Add iabak-sharp implementation for IA.BAK)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

iabak-sharp is an experimental implementation of INTERNETARCHIVE.BAK.

It is available for Windows and Linux, and is a command line tool that can be left running in the background.

A central server takes care of coordinating the Internet Archive items that each client should back up. Each item can be given a priority score (currently, these priority are assigned based on size and "uniqueness"[1] of the item type).

Currently implemented features

  • User registration (optional)
  • Retrieval of items from IA
  • Hash consistency checks
  • Disk space checks
  • Coordination server and job assignment
  • Self-update
  • Download resume (file granularity)
  • Run on startup (Windows only)

GitHub repository

More info on the github page: iabak-sharp

Comparison with git-annex implementation

  • Written in a more maintainable language (as opposed to bash)
  • No concept of shards: because we're not constrained by git repository size limits, each client only has to worry about the metadata of the files that they're actually storing on their drive. The server only stores a minimal amount of metadata (identifier, total size, and users having that item).
  • We're free to implement features that don't perfectly match the git use cases (eg. remote verification/challanges, encryption support, alternate distribution mechanisms eg. ipfs)
  • Supports Windows (in addition to Linux)

Notes

  1. For example, "warc-example1.com" has more priority than all the "warc-example2-20200623", "warc-example2-20200624", "warc-example2-20200625" etc.