Alive... OR ARE THEY

From Archiveteam
(Difference between revisions)
Jump to: navigation, search
m
(Sites: +AMV.org)
(22 intermediate revisions by 8 users not shown)
Line 3: Line 3:
 
Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.
 
Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.
  
=== Sites ===
+
== Sites ==
 +
<!--
  
 +
Sorted alphabetically
  
* '''[[Wikia]] ([http://www.wikia.com/wiki/Wikia www.wikia.com])''', the for-pay arm of Wikipedia (just kidding, it's a different company, but shares a lot of people) is a repository of directed, unsubject-to-wikipolitics wikis, many of them intense and completist. It'd be bad for them to go away.
+
-->
* '''[http://www.fanfiction.net/ fanfiction.net]''' represents many thousands of user-generated stories, essays and huge amounts of work.  
+
* '''[[Academic Earth]]''' ({{url|1=http://academicearth.org/}}) has been worryingly unloved for a while, and holds a mountain of free education that's invaluable to the world.
* '''[http://pixiv.net/ Pixiv]''' and '''[http://deviantart.com/ deviantArt]''' are the largest Japanese and American (respectively) fanart collections on the internet.
+
* '''[[AnimeMusicVideos.org]]''' ({{url|1=http://www.animemusicvideos.org/}}) is fine right now, but they rely on donations and host vast amounts of user-edited music videos on their server (presumably without mirrors). Hard to download as you have to be a member to get all the download links, and after downloading a handful you have to vode before you can d/l again (or you can donate which presumably gives you 1 year of free d/l access). Also, this site might be a grey area, copyright-wise, as the videos are all cut together from copyrighted material.
* '''[http://www.sourceforge.net SourceForge]''' is a critical repository of open source code, information, and webpages. It is mirrored and maintained, but there are sure to be parts that are neither.
+
* '''[[Delicious]]''' ({{url|1=http://www.delicious.com/}}) loves to change their API, which has a side effect of making it difficult to back up.
* '''[[Facebook]]''' seems stable at the moment.
+
* '''[[Facebook]]''' ({{url|1=http://www.facebook.com/}}) seems stable at the moment.
* '''[[Friendfeed]]''' is a happy clam who recently shacked up with Facebook.
+
* '''[[FanFiction]]''' ({{url|1=http://www.fanfiction.net/}}) represents many thousands of user-generated stories, essays and huge amounts of work.
* '''[[Google]]''' wants you to think they will be here forever.
+
* '''[[Formspring]]''' is a popular website centered around answering questions. It is ''extremely'' user-unfriendly with regards to seeing old questions, and no known backup tools exist.
* '''[[Twitter]]''' is tweaking away.
+
* '''[[Friendfeed]]''' ({{url|1=http://www.friendfeed.com/}}) is a happy clam who recently shacked up with Facebook.
* '''[[Wikipedia]]''' will surely be here forever and ever! Fortunately, we don't have to take their word for it as they offer dumps of the data minus the photos. However no-one has verified that Wikipedia can actually be restored from these dumps. If disaster strikes then we could discover a serious problem.
+
* '''[[Google]]''' ({{url|1=http://www.google.com/}}) wants you to think they will be here forever.
* '''[[Delicious]]''' loves to change their API, which has a side effect of making it difficult to back up.
+
* '''[[Infoanarchy]]''' ({{url|1=http://www.infoanarchy.org/}}) The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [http://eng.anarchopedia.org/infoAnarchy] The site was down for four days in June 2010. There is now an archive of the '''content''' at: [http://mirrors.sdboyd56.com/infoanarchy infoAnarchy wiki archive]
* '''[[whitehouse.gov]]''' is up and running for #44, <s>but we've lost all info for #43. (See also: [http://www.kottke.org/09/01/old-whitehousegov-down-the-memory-hole kottke] and [http://www.readwriteweb.com/archives/whitehousegov_president_web_presence.php Read Write Web].)</s> and #43 is available at http://georgewbush-whitehouse.archives.gov/ thanks to the [http://kitenet.net/~joey/blog/entry/ephemera_vs_the_law/ Presidential Records Act]. We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.
+
* '''[[Internet Archive]]''' ({{url|1=http://www.archive.org/}}) seems stable at the moment but its 2 petabytes of data aren't mirrored anywhere else, the code for their system isn't open source and generally they're a single point of failure for a large amount of the web's history. Why should there be only 1 internet archive?
* '''[http://www.infoanarchy.org Infoanarchy]''' The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [http://eng.anarchopedia.org/infoAnarchy]
+
* '''[[LiveJournal]]''' fired a bunch of US-based developers, but is still serving from its new (presumably cheaper) data center in Montana.
+
* '''[[Last.fm]] is being cloned by free software developers in the form of [http://libre.fm Libre.fm] -- they have a tool, [http://svn.savannah.gnu.org/viewvc/*checkout*/trunk/lastscrape/lastscrape.py?root=librefm Lastscrape] which can get all your listening data out into a tab delimited text file.
+
*'''[http://wikileaks.org/ WikiLeaks]''' is a valuable site that will be making enemies.
+
* '''[http://www.archive.org Archive.org]''' seems stable at the moment but its 2 petabytes of data aren't mirrored anywhere else, the code for their system isn't open source and generally they're a single point of failure for a large amount of the web's history. Why should there be only 1 internet archive?
+
 
** There seems to be a second instance at [http://www.bibalex.org/isis/frontend/archive/archive_web.aspx Bibliotheca Alexandrina] although it's currently broken and out of date.
 
** There seems to be a second instance at [http://www.bibalex.org/isis/frontend/archive/archive_web.aspx Bibliotheca Alexandrina] although it's currently broken and out of date.
 +
*'''[[Know Your Meme]]''' ({{url|1=http://knowyourmeme.com/}}) is at this point the de facto central repository for information on internet memes and culture. It is as popular as ever at the moment, but even with this popularity, former owners Rocketboom had trouble financing it. In the spring of 2011 was sold to Cheezburger Networks, a site which has been known to "reorganize" its properties, sometimes with a detrimental effect on content. Though it was quite a different story, I might remind people what happened to [[Encyclopedia Dramatica]].
 +
* '''[[Last.fm]]''' ({{url|1=http://www.last.fm/}}) is being cloned by free software developers in the form of [http://libre.fm Libre.fm] -- they have a tool, [http://svn.savannah.gnu.org/viewvc/*checkout*/trunk/lastscrape/lastscrape.py?root=librefm Lastscrape] which can get all your listening data out into a tab delimited text file.
 +
* '''[[Literotica.com]]''' ({{url|1=http://literotica.com/}}) Contains over 290,000 user-written stories and poems. First pass at a backup: [http://mir.cr/12CMQUTL part1.rar], [http://mir.cr/HO79CCUO part2.rar], [http://mir.cr/TOVJWQ4E part3.rar], [http://mir.cr/1SIAB4AM part4.rar] -- contains the text of all stories as of the backup date in XML format. (One page of one story is missing because it doesn't exist on the site; embedded images and audio are not included this time; non-English stories aren't labelled with their language).
 +
* '''[[LiveJournal]]''' ({{url|1=http://www.livejournal.com/}}) fired a bunch of US-based developers, but is still serving from its new (presumably cheaper) data center in Montana.
 +
* '''[[Pixiv]]''' ({{url|1=http://www.pixiv.net/}}) and '''[[deviantArt]]''' ({{url|1=http://www.deviantart.com/}}) are the largest Japanese and American (respectively) fanart (and valuable art in general) collections on the internet.
 +
* '''[[Reddit]]''' ({{url|1=http://www.reddit.com/}}) is where many of the users have now migrated. Stable for now, but team is small
 +
* '''[[SourceForge]]''' ({{url|1=http://www.sourceforge.net/}}) is a critical repository of open source code, information, and webpages. It is mirrored and maintained, but there are sure to be parts that are neither.
 +
* '''[[TVTropes]]''' ({{url|1=http://www.tvtropes.org/pmwiki/pmwiki.php/Main/HomePage}}) is a popular wiki dedicated to finding recurring patterns in fiction, and discussing fiction in general. No word on whether there are backups. The administrators have a tendency to delete things indiscriminately, usually to save on disk space: article edit histories are frequently purged, and old forum threads are regularly deleted mercilessly (and, until recently, without any sort of warning).
 +
* '''[[Twitter]]''' ({{url|1=http://www.twitter.com/}}) is tweaking away.
 +
* '''[[whitehouse.gov]]''' ({{url|1=http://www.whitehouse.gov/}}) is up and running for #44, <s>but we've lost all info for #43. (See also: [http://www.kottke.org/09/01/old-whitehousegov-down-the-memory-hole kottke] and [http://www.readwriteweb.com/archives/whitehousegov_president_web_presence.php Read Write Web].)</s> and #43 is available at http://georgewbush-whitehouse.archives.gov/ thanks to the [http://kitenet.net/~joey/blog/entry/ephemera_vs_the_law/ Presidential Records Act]. We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.
 +
* '''[[Wikia]]''' ({{url|1=http://www.wikia.com/}}), the for-pay arm of Wikipedia (just kidding, it's a different company, but shares a lot of people) is a repository of directed, unsubject-to-wikipolitics wikis, many of them intense and completist. It'd be bad for them to go away.
 +
* '''[[WikiLeaks]]''' ({{url|1=http://wikileaks.org/}}) contains several thousand leaked documents from sources such as the Iraq War and the cables famously known under the label 'Cablegate'. Due to the content on the website, and that PayPal and Amazon (very) quickly dropped their hosting for them during Cablegate's opening days, it should be considered a potential target for any number of government committees for quick shutdown.
 +
* '''[[Wikipedia]]''' ({{url|1=http://www.wikipedia.org/}}) will surely be here forever and ever! Fortunately, we don't have to take their word for it as they offer dumps of the data minus the photos. However no-one has verified that Wikipedia can actually be restored from these dumps. If disaster strikes then we could discover a serious problem.
 +
 +
 +
{{Navigation box}}
 +
 +
[[Category:Archive Team]]

Revision as of 18:41, 11 January 2012

Like many sites before them, these places indicate a sunny outlook, a clean bill of health and a total sense of "all systems go". But as we've found out from those many sites before them, fortunes can change overnight.

Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.

Sites



[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY · Deathwatch · Projects
Archiveteam.jpg
Archiving projects Archive.is · BetaArchive · Gmane · Internet Archive · It Died · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite
Blogging Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd
Cloud hosting/file sharing AnyHub · Box · Dropbox · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase
Corporations Apple · IBM · Google · Lycos Europe · Microsoft · Yahoo!
Events Arab Spring · Occupy movement · Spanish Revolution
Font Repos Google Web Fonts · GNU FreeFont · Fontspace
Forums 4chan · College Confidential · ESPN Forums · forums.starwars.com · HeavenGames · Yahoo! Messages · Yahoo! Neighbors
Gaming City of Heroes · Club Nintendo · Desura · Emulation Zone · GameMaker Sandbox · Halo · Infinite Crisis · Minecraft.net · Player.me · Playfire · Steam · Warhammer · Xfire
Image hosting AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotopedia · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgur · Inkblazers · Instagr.am · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons
Knowledge/Wikis arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram) · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia) · Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal
Magazines/Blogs/News Cyberpunkreview.com · Game Developer Magazine · Gigaom · Helium · JPG Magazine · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices
Microblogging Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger
Music/Audio AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · TuneWiki · Twaud.io · WinAmp
People Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project
Protocols/Infrastructure FTP · Gopher · IRC · Usenet · World Wide Web
Q&A Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers
Recipes/Food Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList
Social bookmarking Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero
Social networks Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...
Shopping/Retail Alibaba · AliExpress · Amazon · Apple Store · eBay · Printfection · RadioShack · Sears · Target · The Book Depository · ThinkGeek · Walmart
Software/code hosting Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads
Torrenting/Piracy ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz
Video hosting Academic Earth · Blip.tv · Epic · Google Video · Justin.tv · Nokia Trailers · Qwiki · Stickam · TED Talks · Twitch.tv · Ustream · Viddler · Viddy · Vimeo · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)
Web hosting Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch) · Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media
Web applications Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin
Other AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · Volán · Widgetbox · Windows Technical Preview · Wunderlist · Zoocasa
Information A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Backup Tips · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG
Projects Audit2014 · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census) · IRC Quotes · ISP Hosting · JSMESS · JSVLC · Just Solve the Problem · Project Newsletter · University Web Hosting · Valhalla · Woohoo
Tools ArchiveBot · ArchiveTeam Warrior (Tracker) · Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)
Teams Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam
About Archive Team Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ
Personal tools