Internet Archive mainpage in 2010-12-21
|Archiving status||Saved by itself|
The Internet Archive is a non-profit digital library with the stated mission/motto: "universal access to all knowledge". The Internet Archive stores several billion webpages from different dates and times for historical purposes that are available through the Wayback Machine, arguably an archivists wet dream. The Archive.org website also archives books, music and videos.
There are currently two mirrors of the Internet Archive collection - the official mirror available at archive.org, and a second mirror at Bibliotheca Alexandrina. Both seem to be up and stable.
Raw Numbers as of December 2010
- 4 data centers, 1,300 nodes, 11,000 spinning disks
- Wayback Machine: 2.4 PetaBytes
- Books/Music/Video Collections: 1.7 PetaBytes
- Total used storage: 5.8 PetaBytes
Uploading to archive.org
Just upload there any content you manage to preserve! Registering takes a minute. Tools:
- s3 interface (for direct usage with curl, or indirect with the tool of your choice);
- internetarchive python tool is one such tool;
- handy script for mass upload with automatic error checking and retry;
- torrent upload, useful if you need resume (for huge files or because your bandwidth is insufficient for upload in one go):
- just create the item, make a torrent with your files in it, name it like the item and uplod it to the item;
- archive.org will connect to you and other peers and keep downloading all the contents till done;
- for a command line tool you can use e.g. mktorrent or buildtorrent, example:
mktorrent -a udp://tracker.publicbt.com:80/announce -a udp://tracker.openbittorrent.com:80 -a udp://tracker.ccc.de:80 -a udp://tracker.istole.it:80 -a http://tracker.publicbt.com:80/announce -a http://tracker.openbittorrent.com/announce "DIRECTORYTOUPLOAD";
- you can then seed the torrent with one of the many graphical clients (e.g. transmission) or on the command line (transmission and rtorrent are the most popular; btdownloadcurses reportedly doesn't work with udp trackers).
Don't use FTP upload, try to keep your items below 400 GiB size, add plenty of metadata.
Formats: anything, but:
- sites should be uploaded in WARC format;
- audio, video, books and other prints are supported from a number of formats;
- for .tar and .zip files archive.org offers an online browser to search and download the specific files one needs, so you probably want to use either unless you have good reasons (e.g. if 7z or bzip2 reduce the size tenfold).