Difference between revisions of "ArchiveTeam Warrior"

From Archiveteam
Jump to: navigation, search
m (Projects: update on zoocasa and toshiba)
(Projects: NEW PROJECTS PAGES LAYOUT. See: user:bzc6p/Restructuring projects pages.)
Line 240: Line 240:
 
== Projects ==
 
== Projects ==
  
Previous and current warrior projects:
+
See: [[Warrior projects]].
  
{| class="wikitable sortable"
+
== Are you a coder? ==
! Project
 
! Status
 
! Began
 
! Finished
 
! Result
 
! class="unsortable" | Archive Location
 
|-
 
| [[MobileMe]] || '''Archive Posted''' || April 3, 2012 || Aug 8, 2012 || Success ||
 
[http://archive.org/details/archiveteam-mobileme-hero archive], [http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html user lookup], [http://archive.org/details/archiveteam-mobileme-index index]
 
|-
 
| [[FortuneCity]] || '''Archive Posted''' || April 4, 2012 || April 11, 2012 || Qualified Success || [http://archive.org/details/archiveteam-fortunecity archive], [http://archive.org/download/test-memac-index-test/fortunecity.html user lookup]
 
|-
 
| [[Tabblo]] || '''Archive Posted''' || May 23, 2012 || May 26, 2012 || Success || [http://archive.org/details/tabblo-archive archive], [http://archive.org/download/test-memac-index-test/tabblo.html user lookup]
 
|-
 
| [[Picplz]] || '''Archive Posted''' || June 3, 2012 || June 15, 2012 || || [http://archive.org/details/archiveteam-picplz archive], [http://archive.org/download/archiveteam-picplz-index/picplz-20120823.html user lookup], [http://archive.org/details/archiveteam-picplz-index index]
 
|-
 
| [[Tumblr]] (test project) || '''Archive Posted''' || August 9, 2012 || August 19, 2012 || || [http://archive.org/details/archiveteam-tumblr-test archive (tar)], [http://archive.org/details/archiveteam-tumblr-test-warc archive (warc)]
 
|-
 
| [[Cinch]].FM || '''Archive Posted''' || August 20, 2012 || August 22, 2012 || Success || [http://archive.org/details/archiveteam-cinch archive]
 
|-
 
| [[City of Heroes]] || '''Archive Posted''' || September 3, 2012 || December 1, 2012 || Success || [https://archive.org/search.php?query=collection%3Aarchiveteam%20city%20of%20heroes archive]
 
|-
 
| [[Webshots]] || '''Archive Posted''' || October 4, 2012 || November 18, 2012 || || [https://archive.org/details/webshots-freeze-frame archive], [http://archive.org/download/webshots-freeze-frame-index/index.html user lookup]
 
|-
 
| [[BT Internet]] || '''Archive Posted''' || October 10, 2012 || November 2, 2012 || Success || [http://archive.org/details/archiveteam-btinternet archive]
 
|-
 
| [[DailyBooth| Daily Booth]] || '''Archive Posted''' || November 19, 2012 || December 29, 2012 || || [http://archive.org/details/archiveteam_dailybooth archive], [http://archive.org/download/dailybooth-freeze-frame-index/index.html user lookup]
 
|-
 
| [[GitHub Downloads]] || '''Archive Posted''' || December 13, 2012 || December 17, 2012 || Success || [http://archive.org/details/github-downloads-2012-12 archive], [http://archive.org/details/archiveteam-github-repository-index-201212 index]
 
|-
 
| [[Yahoo! Blog]] || '''Archive Posted''' || January 8, 2013 || January 19, 2013 || || [http://archive.org/details/yahoo_korea_blogs archive]
 
|-
 
| [[weblog.nl]] || '''Archive Posted''' || January 19, 2013 || February 2, 2013 || || [http://archive.org/details/archiveteam_weblognl archive], [http://archive.org/download/archiveteam_weblognl-index/ user lookup]
 
|-
 
| [[URLTeam]] || Active || || || || [http://urlte.am/releases/ all releases]
 
|-
 
| [[Punchfork]] || '''Archive Posted''' || January 11, 2013 || March 6, 2013 || || [http://archive.org/details/archiveteam_punchfork archive], [http://archive.org/download/archiveteam_punchfork_index/ user lookup]
 
|-
 
| [[Xanga]] || Downloads Paused || January 22, 2013 || February 16, 2013 || || [http://archive.org/details/archiveteam_xanga archive], [http://archive.org/download/archiveteam_xanga_index/ user lookup], [http://archive.org/details/archiveteam-xanga-userlist-20130142 user list]
 
|-
 
| [[Posterous]] || '''Archive Posted''' || February 23, 2013 || June 29, 2013 || || [http://archive.org/details/archiveteam_posterous archive]
 
|-
 
| [[Storylane]] || '''Archive Posted''' || March 8, 2013 || March 15, 2013 || || [https://archive.org/search.php?query=storylane%20warc archive]
 
|-
 
| [[Yahoo! Messages]] || '''Archive Posted''' || March 20, 2013 || March 31, 2013 || || [http://archive.org/details/archiveteam_yahoo_messages archive]
 
|-
 
| [[Formspring]] || '''Archive Posted''' || March 24, 2013 || September 19, 2013 || Success || [http://archive.org/details/archiveteam_formspring archive]
 
|-
 
| [[Yahoo! Upcoming]] || '''Archive Posted''' || April 20, 2013 || April 25, 2013 || || [http://archive.org/details/archiveteam archive]
 
|-
 
| [[Streetfiles|Streetfiles.org]] || '''Archive Posted''' || April 28, 2013 || April 30, 2013 || Qualified Success || [https://archive.org/search.php?query=collection%3Aarchiveteam%20streetfiles archive]
 
|-
 
| [[Xanga]] || Downloads Paused || June 21, 2013 || August 31, 2013 || || [http://archive.org/details/archiveteam_xanga archive]
 
|-
 
| [[Zapd]] || '''Archive Posted''' || October 1, 2013 || October 8, 2013 || Success || [https://archive.org/details/archiveteam_zapd archive]
 
|-
 
| [[Blip.tv]] || Hiatus || October 11, 2013 || ||  || [https://archive.org/details/bliptv archive]
 
|-
 
| [[Hyves]] || '''Archive Posted''' || November 10, 2013 || December 2, 2013 || Success ||  [http://archive.org/details/hyves archive]
 
|-
 
| [[Wretch]] & [[Yahoo! Blog]] || '''Archive Posted''' || December 17, 2013 || January 9, 2014 || Qualified Success || archives: [https://archive.org/details/archiveteam_wretch Wretch], [https://archive.org/details/archiveteam_yahooblogs Yahoo Blog]
 
|-
 
| [[Dogster]] || '''Archive Posted''' || February 7, 2014 || February 16, 2014 || Success || [https://archive.org/details/archiveteam_dogster archive]
 
|-
 
| [[My Opera]] || '''Archive Posted''' || February 16, 2014 || March 3, 2014 || Success || [https://archive.org/details/archiveteam_myopera archive]
 
|-
 
| [[Bebo]] || Hiatus || February 18, 2014 ||  ||  || [https://archive.org/details/archiveteam_bebo archive]
 
|-
 
| [[Viddler]] || Cancelled || February 21, 2014 || February 27, 2014 || Qualified Success || [https://archive.org/details/archiveteam_viddler archive]
 
|-
 
| [[Justin.tv]] || '''Archive Posted''' || June 5, 2014 || June 15, 2014 || Success || [https://archive.org/details/justintv archive]
 
|-
 
| [[Yahoo! Voices]] || '''Archive Posted''' || July 28, 2014 || July 31, 2014  || Success || [https://archive.org/details/archiveteam_yahoovoices archive]
 
|-
 
| [[Fotopedia]] || '''Archive Posted''' || August 5, 2014 || August 7, 2014 || Success || [https://archive.org/details/archiveteam_fotopedia archive]
 
|-
 
| [[Twitch.tv]] || '''Archive Posted''' || August 9, 2014 || August 24, 2014 || Qualified Success || [https://archive.org/details/archiveteam_twitchtv archive]
 
|-
 
| [[Canv.as]] || '''Archive Posted''' || August 11, 2014 || August 12, 2014|| Success || [https://archive.org/details/archiveteam_canvas archive]
 
|-
 
| [[Swipnet]] || '''Archive Posted''' || August 19, 2014 || September 1, 2014 || Success || [https://archive.org/details/archiveteam_swipnet archive]
 
|-
 
| [[Verizon Personal Web Space]] || '''Archive Posted''' || September 2, 2014 || October 1, 2014 || Qualified Success || [https://archive.org/details/archiveteam_verizon archive]
 
|-
 
| [[TwitPic]] || '''Archives Posted''' || September 4, 2014 || January 2, 2015 || Qualified Success || [https://archive.org/details/archiveteam_twitpic archive]
 
|-
 
| [[Ancestry.com]] || '''Archive Posted''' || September 19, 2014 || November 5, 2014 || Success || [https://archive.org/details/archiveteam_ancestry archive]
 
|-
 
| [[Quizilla]] || '''Archive Posted''' || September 4, 2014 || October 1, 2014 || Success || [https://archive.org/details/archiveteam_quizilla archive]
 
|-
 
| [[Qwiki]] || '''Archive Posted''' || September 28, 2014 || November 1, 2014 || Qualified Success || [https://archive.org/details/archiveteam_qwiki archive]
 
|-
 
| [[Panoramio]] || In development || October 4, 2014 || || ||
 
|-
 
| [[GameMaker Sandbox]] || '''Archive Posted''' || October 15, 2014 || October 19, 2014 || Success || [https://archive.org/details/archiveteam_gamemaker archive]
 
|-
 
| [[Halo]] || Active || November 6, 2014 || || || [https://archive.org/details/archiveteam_halo archive]
 
|-
 
| [[Viddy]] || '''Archive Posted''' || December 2, 2014 || December 15, 2014 || Success || [https://archive.org/details/archiveteam_viddy archive]
 
|-
 
| [[ZipList]] || '''Archive Posted''' || December 2, 2014 || December 4, 2014 || Success || [https://archive.org/search.php?query=collection%3Aarchiveteam%20ziplist archive]
 
|-
 
| [[Roon]] || '''Archive Posted''' || December 20, 2014 || December 21, 2014 || Success || [https://archive.org/details/archiveteam_roon archive]
 
|-
 
| [[Microsoft Clip Art]] || '''Archive Posted''' || December 23, 2014 || December 29, 2014 || Success || [https://archive.org/details/archiveteam_msclipart archive]
 
|-
 
| [[Nokia Memories]] || Downloads Finished || December 30, 2014 || December 30, 2014 || Success ||
 
|-
 
| [[Vstreamers]] || Downloads Finished || January 6, 2015 || January 10, 2015 || Success ||
 
|-
 
| [[Brace.io]] || '''Archive Posted''' || January 12, 2015 || January 18, 2015 || Success || [https://archive.org/details/brace_io_panic_2015_01 archive]
 
|-
 
| [[Inkblazers]] || '''Archive Posted''' || January 18, 2015 || January 31, 2015 || Success || [https://archive.org/details/archiveteam_inkblazers archive]
 
|-
 
| [[Ovi Store]] || '''Archive Posted''' || February 3, 2015 || February 15, 2015 || Qualified Success || [https://archive.org/search.php?query=archiveteam%20ovi%20store archive]
 
|-
 
| [[Cobook]] || Downloads Finished || February 9, 2015 || February 11, 2015 || Success ||
 
|-
 
| [[TestFlight]] || '''Archive Posted''' || February 13, 2015 || February 25, 2015 || Success || [https://archive.org/details/archiveteam_testflight archive]
 
|-
 
| [[Blogger]] || Active || February 25, 2015 || || ||
 
|-
 
| [[Google Business Sitebuilder]] || Downloads Finished || March 9, 2015 || March 10, 2015 || Success ||
 
|-
 
| [[Trovebox]] || Active || March 14, 2015 || || ||
 
|-
 
| [[RapidShare]] || Downloads Finished || March 20, 2015 || March 29, 2015 || Qualified Success ||
 
|-
 
| [[Madden GIFERATOR]] || '''Archive Posted''' || March 21, 2015 || March 23, 2015 || Success || [https://archive.org/details/archiveteam_madden archive]
 
|-
 
| [[FurAffinity]] || Active || March 26, 2015 || || ||
 
|-
 
| [[Last.fm]] || Downloads Finished || March 30, 2015 || May 13, 2015 || ||
 
|-
 
| [[FriendFeed]] || Downloads Finished || April 2, 2015 || April 9, 2015 || Qualified Success ||
 
|-
 
| [[LayerVault]] || Downloads Finished || April 6, 2015 || April 11, 2015 || Success ||
 
|-
 
| [[Google Helpouts]] || Downloads Finished || April 16, 2015 || April 21, 2015 || Success ||
 
|-
 
| [[Google Baraza]] || '''Archive Posted''' || April 28, 2015 || May 7, 2015 || Success || [https://archive.org/details/googlebaraza archive]
 
|-
 
| [[Pomf.se]] || '''Archive Posted''' || June 9, 2015 || June 17, 2015 || Success || [https://archive.org/details/archiveteam_pomf archive]
 
|-
 
| [[SourceForge]] || Active || June 17, 2015 || || ||
 
|-
 
| [[Zoocasa]] || Downloads finished || June 18, 2015 || June 25, 2015 || Success ||
 
|-
 
| [[Xfire|Xfire Social Website]] || Active || June 19, 2015 || || ||
 
|-
 
| [[Toshiba Support]] || Active || June 24, 2015 || || ||
 
|}
 
 
 
=== Status ===
 
:; In Development : a future project
 
:; Active : start up a Warrior and join the fun; this one is in progress right now
 
:; Downloads Finished : we've finished downloading the data
 
:; Archived : the collected data has been properly archived
 
:; Archive Posted : the archive is available for download
 
 
 
=== Result ===
 
:; Success : downloaded all of the data and posted the archive publicly
 
:; Qualified Success :  either we couldn't get all of the data, or the archive can't be made public
 
:; Failure : the site closed before we could download anything
 
 
 
=== Are you a coder? ===
 
  
 
Like the warrior? Interested in how it works under the hood? Got software skills? '''[[Dev|Help us improve it!]]'''
 
Like the warrior? Interested in how it works under the hood? Got software skills? '''[[Dev|Help us improve it!]]'''
  
 
{{Navigation box}}
 
{{Navigation box}}

Revision as of 18:08, 28 June 2015

What is the Archive Team Warrior?

Archive team.png
Warrior-vm-screenshot.png
Warrior-web-screenshot.png

The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!

The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the Tracker.

Basic usage

The warrior runs on Windows, OS X and Linux using a virtual machine. You'll need one of:

Quick start instructions for VirtualBox

  1. Download the appliance (174MB).
  2. Launch VirtualBox
  3. In VirtualBox, click File > Import Appliance and open the file.
  4. Start the virtual machine.
    • It will fetch the latest updates and will eventually tell you to start your web browser.
  5. Using your regular web browser, visit http://localhost:8001/
  6. On the left, click "Your settings".
  7. Choose a username - we'll show your progress on the leaderboard.
  8. On the left, click "Available projects" tab and pick a project to work on.
    • Even better: select "ArchiveTeam's Choice" to let your warrior work on the most urgent project.


Contents

Alternative virtual machines

Thanks to user-effort, there are alternatives:

Please note that these alternatives are not in widespread use by our warriors, so we may not be able to help with either issues or advanced usage.

Warrior FAQ

Can I use whatever internet access for the warrior?

No. We need "clean" connections. Please ensure the following:

  • No OpenDNS. No ISP DNS that redirects to a search page. Use non-captive DNS servers.
  • No ISP connections that inject advertisements into web pages.
  • No proxies. Proxies can return bad data. The original HTTP headers and IP address is needed for the WARC file.
  • No content-filtering firewalls.
  • No censorship. If you believe your country implements censorship, do not run a warrior.
  • No Tor. The server may return an error page instead of content if they ban exit nodes.
  • No free wifi cafe. Archiving your cafe's wifi service agreement repeatedly is not helpful.
  • We prefer connections from many public IP addresses if possible. (For example, if your apartment building uses a single IP address, we don't want your apartment banned.)

Why am I seeing a message that no item was received?

It means that there is no work available. This happens for several reasons:

  • There project has just finished and someone is inspecting the work done. If a problem is discovered, items may be re-queued and more work is available.
  • You have checked out / claimed too many items. Reduce your concurrency and let others do some of the work too.
  • In a rare case, you have been banned by a tracker administrator because you were requesting too much work, you were tampering with the scripts, a malfunction has occurred, or your internet connection is "unclean".

Why am I seeing a message about rate limiting?

Keep in mind that although downloading the internet for digital preservation and fun are the primary goals of all Archive Team activities, serious stress on the target's server may occur. The rate limit is imposed by a tracker administrator and should not be subverted.

(In other words, we don't want to DDoS the servers.)

Why am I seeing a message about code being out of date?

The warrior will update its code every hour. If you are impatient, please restart the warrior and it will download the latest code and resume work.

Help! The warrior is eating all my bandwidth!

You can limit the warrior's bandwidth quite easily for VirtualBox as long as you are running a relatively recent version. The option is not offered with a GUI however.

The command

VBoxManage bandwidthctl archiveteam-warrior-2 add limit --type network --limit 3m

will limit the warrior instance called archiveteam-warrior-2 (the default name of the warrior vm currently) to 3Mb/s. Adjust as needed.

(limit units: k=kilobit, m=megabit, g=gigabit, K=kilobyte, M=megabyte, G=gigabyte)


In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:

VBoxManage bandwidthctl archiveteam-warrior-2 add netlimit --type network --limit 3

For more info, consult the VirtualBox manual (Chapter 6, Section 9).

NAT sucks! I want directly-bridged networking!

Simples! (If you're running linux, that is.)

VBoxManage modifyvm "archiveteam-warrior-2" --nic1 bridged
VBoxManage modifyvm "archiveteam-warrior-2" --bridgeadapter1 eth0

(We presume you want to bind to eth0. Adjust as required. :))

I turned my warrior VM appliance off. Will those tasks be lost?

If you've killed your warrior VM instances, then the work your warrior did has been lost, however the tasks will be returned to the pool after a period of time. If you want, you can alert the admins via IRC of what's happened, and they can clear the claims your username may have made. However, this isn't very important on most projects.

I closed my browser or tab with the warrior's web interface. Will those tasks be lost?

No, the web browser interface just provides, well, a user interface to the warrior. As long as the VM is not stopped, it will continue normally.

I need to disconnect my internet / reboot my PC, but I don't want to lose work.

If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.

If you decided to use the suspend feature in VirtualBox, please note that if you keep it suspended for too long (more than a few hours), the admins will assume that the item is lost and be re-queued. Using the suspend feature so that you can reboot your computer is perfectly fine.

I told the warrior to shutdown from the interface but nothing has changed! What gives?

The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away, go ahead. Your progress will be lost, however the jobs will eventually cycle out to another user.

How much disk space will the warrior use?

Short answer: it depends on the project.

Long answer: because the way each project defines an item differently, the warrior may be downloading a small file or downloading a whole subsection of a website. The virtual machine is configured by default to use 60GB as an absolute maximum. Any unused virtual machine disk space is not used on the host computer. You may, however, run the virtual machine on less than 60GB if you like to live dangerously. We're downloading the internet after all!

The secondary disk is using up space even though it's not running a project.

Virtual machine disk images do not behave like a regular file. There are several ways to reclaim space:

  • Delete the second disk and put back an empty disk. The warrior should reformat the second disk.
  • Delete the entire warrior application and re-import it.
  • Use the zerofree program and then clone the disk image. Reattach the cloned disk image.

I can't connect to localhost.

The application includes a configuration to set up port forwarding to the guest machine on port 8001 so you can access the interface through your web browser. If this does not happen, you may need to double check your machine's network settings.

The warrior can't connect to the internet.

It may be possible that the virtual machine has picked up the address of the local DNS cache on your computer which the virtual machine does not have access to.

If you experience this on VirtualBox, see this question and answer.

I'm looking at the text scrolling by and I notice some errors. rsync is not working.

Uh-oh! Something is not right. Notify us immediately in the appropriate IRC channel.

The item I'm working on is downloading thousands of URLs and it's taking hours.

See the above question and reboot the warrior as appropriate.

I'm looking at the leaderboard. What's that icon beside the username?

That's just the warrior logo: Archive team.png (click on the image for a larger version). It means that that person is using the warrior. Those without the icon are running the scripts manually.

What's that guy doing in the logo?

The place is on fire! But don't worry, he safely escaped with the rescued data in his arms.

I want to log in to the virtual machine. How do I do this?

Unless you know what you are doing, you should not need to do this. But if you want to, the username is root and the password is archiveteam. Then, you can execute sudo -u warrior -i to log in as the warrior user.

Press ALT+F3 to switch to virtual console number 3. Use ALT+Left or ALT+Right to switch between virtual consoles. There are 6 virtual consoles in total. Consoles 1 and 2 are reserved for the warrior.

Can I run multiple virtual machines at the same time?

Yes, but you'll need to adjust the networking settings.

On the machine, open up Settings → Network → Adapter 1 → Port Fowarding. You need to adjust the Host Port. For example, ensure your table looks like TCP | 127.0.0.1 | 8123 | | 8001. In this example, you can then visit http://localhost:8123/ as it maps port 8123 in your browser to port 8001 which the warrior uses.

The warrior seems to have too much overhead. I can't run a VM in a VPS!

You don't need to run a virtual machine.

An option is running Docker containers, based on LXC the overhead is far less than running a full VM on a VPS, it should be noted if you plan on running the (warrior-dockerfile) to publish the port to allow access to the web interface.

 docker run -d -p 8001:8001 archiveteam/warrior-dockerfile 

(Above is assumed direct mapping VPS port to container port so if you wanted say port 38001 it would be docker run -d -p 38001:8001 archiveteam/warrior-dockerfile Adjust as required. :P)


If you are managing a VPS, it's likely you are comfortable with some Linux stuff. Projects can be run manually. Consult the project wiki page or the source code repository readme file.

(Note that multiple projects can be also run in isolated environments(containers) for rapid deployment using: (at-as-dockerfile))

Why a virtual machine in the first place?

The virtual machine is a quick, safe, and easy way for newcomers to help us out. It offers many features:

  • Graphical interface
  • Automatically selects which project is important to run
  • Self-updating software infrastructure
  • Allows for unattended use
  • In case of software faults, your machine is not ruined
  • Restarts itself in case of runaway programs
  • Runs on Windows, Mac, and Linux painlessly
  • Ensures consistency in the archived data regardless of your machine's quirks

If you have suggestions for improving this system, please talk to us as described below.

I'm running the scripts manually in a VPS but it says the code is out of date a while later

It happens when a bug in the scripts is discovered. Bugs are unavoidable especially when the server is out of our control.

Try the --auto-update option available in Seesaw version 0.8. However, please be aware that you are now executing code automatically. Be sure to run the scripts in a separate user account for safety.

I just imported the ova image and the warrior is stuck on "Preparing the data partition"

This issue has cropped up before and we do not know what causes it. It is recommended to just delete the warrior image and import the ova again. Testing shows that such a reimport works in the majority of cases.

Why is the default project not working? / Why is a manual project not in the Warrior yet?

Sorry. Sometimes the administrators are too busy...

Why are there no projects?

If there are no projects showing, you can help us write one. No projects does not mean there is nothing left to archive!

The instructions to run the software/scripts are awful and they are difficult to set up.

Well, excuuuuse me, princess!

We're not a professional support team so help us help you help us all. See below for bug reports, suggestions, or contribute writing code.

Where can I file a bug, suggestion, or a feature request?

If the issue is related to the warrior's web interface or the library that grab scripts are using, see seesaw-kit issues. Other issues should be filed into their own repositories.

I'd like to help write code. Where can I find more info?

Check out the Dev documentation for details on the infrastructure and details of the source code layout.

I still have a question!

Check out the general FAQ page. Talk to us on IRC. Use #warrior for specific warrior questions or #archiveteam for general questions.

Projects

See: Warrior projects.

Are you a coder?

Like the warrior? Interested in how it works under the hood? Got software skills? Help us improve it!


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · forums.starwars.com · HeavenGames · Invisionfree · NeoGAF · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · Steam · SteamDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger

Music/Audio

AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Oddshot.tv · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ