Difference between revisions of "URLTeam"

From Archiveteam
Jump to: navigation, search
(URL shorteners: adds bull.hn (bullhorn), marsdd (mars dd))
(Update table)
Line 14: Line 14:
  
 
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.
 
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.
 
== Who did this? ==
 
You can join us in our IRC channel: [irc://irc.efnet.org/urlteam #urlteam] on [http://www.efnet.org/ EFNet]
 
* [[User:Scumola]] started this wiki page
 
* [[User:Chronomex]] started the Urlteam scraping effort
 
* [[User:Soult]] Helps with scraping
 
* [[User:Jeroenz0r]] Helps with scraping (and stalking Soult)
 
* ... many ArchiveTeam people who run the scrapers
 
  
 
== 301Work cooperation ==
 
== 301Work cooperation ==
Line 50: Line 42:
 
|-
 
|-
 
| [http://tinyurl.com/ Tinyurl.com]
 
| [http://tinyurl.com/ Tinyurl.com]
| 1,000,000,000
+
| 10,000,000,000
 
| [[Warrior]]
 
| [[Warrior]]
| scraping: sequential, <= 6 characters
+
| scraping: sequential, done up to azzzzz
 
| new shorturls: non-sequential, 7 characters
 
| new shorturls: non-sequential, 7 characters
 
|-
 
|-
 
| [http://bit.ly/ Bit.ly]
 
| [http://bit.ly/ Bit.ly]
| 4,000,000,000
+
| 50,000,000,000
 
| [[Warrior]]
 
| [[Warrior]]
 
| scraping: non-sequential, 6 characters
 
| scraping: non-sequential, 6 characters
Line 68: Line 60:
 
|-
 
|-
 
| [http://is.gd is.gd]
 
| [http://is.gd is.gd]
| 810,264,745 (2013-01-30)
+
| 934,134,706 (2013-05-20)
 
| [[Warrior]]
 
| [[Warrior]]
| scraping: sequential, <= 5 characters
+
| scraping: non-sequential, 6 characters
 
| new shorturls: non-sequential, 6 characters
 
| new shorturls: non-sequential, 6 characters
 
|-
 
|-
Line 103: Line 95:
 
| dead (2010-11-18)
 
| dead (2010-11-18)
 
|-
 
|-
| [http://tr.im tr.im]
+
| Old tr.im
 
| 1990425
 
| 1990425
 
| [[User:Soult]]
 
| [[User:Soult]]
Line 109: Line 101:
 
| dead (2011-12-31)
 
| dead (2011-12-31)
 
|-
 
|-
| adjix.com
+
| [http://tr.im/ New tr.im]
 
| ?
 
| ?
| [[User:Jeroenz0r]]
+
| [[Warrior]]
| Already done: 00-zz, 000-zzz, 0000-izzz.
+
| scraping: sequential, done up to 42pzz
| case-insensitive, incremental
+
| new shorturls: sequential
|-
 
| rod.gs
 
| ?
 
| [[User:Jeroenz0r]]
 
| Done: 00-ZZ, 000-2Qc
 
| case-sensitive, incremental, server can't keep up with all the requests.
 
|-
 
| biglnk.com
 
| ?
 
| [[User:Jeroenz0r]]
 
| Done: 0-Z, 00-ZZ, 000-ZZZ
 
| case-sensitive, incremental
 
|-
 
| go.to
 
| 60000
 
| [[User:Asiekierka]]
 
| Done: ~45000 (go.to network links only: [http://64pixels.org/goto_dump.zip goto_dump.zip])
 
| no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less
 
 
|-
 
|-
 
| visibli (hex)
 
| visibli (hex)
Line 151: Line 125:
  
 
* adf.ly
 
* adf.ly
 +
* adjix.com
 
* ask.fm - ask.fm/a/40k05kgp
 
* ask.fm - ask.fm/a/40k05kgp
 
* awe.sm
 
* awe.sm
 +
* biglnk.com
 
* budurl.com - Appears non-incremental
 
* budurl.com - Appears non-incremental
 
* buff.ly - Buffer App
 
* buff.ly - Buffer App
Line 165: Line 141:
 
* flip.it - Flipboard
 
* flip.it - Flipboard
 
* fnd.us (See offical shorteners)
 
* fnd.us (See offical shorteners)
 +
* go.to
 
* ilix.in - HTML redirect
 
* ilix.in - HTML redirect
 
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388
 
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388
Line 175: Line 152:
 
* po.st
 
* po.st
 
* r.ebay.com
 
* r.ebay.com
 +
* rod.gs
 
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok
 
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok
 
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX
 
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX
Line 189: Line 167:
 
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv
 
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv
 
* tiny.cc - Appears non-incremental
 
* tiny.cc - Appears non-incremental
* tr.im (2nd generation)
 
 
* tweetburner.com / twurl.nl - Appears incremental
 
* tweetburner.com / twurl.nl - Appears incremental
 
* twitthis.com
 
* twitthis.com
 
* u.mavrev.com - Not accepting new urls.
 
* u.mavrev.com - Not accepting new urls.
* ur1.ca - Database is downloadable from website directly.
 
 
* urlcut.com
 
* urlcut.com
 
* vimeo.com
 
* vimeo.com

Revision as of 00:41, 20 May 2013

Urlteam
URLTeam logo
url shortening was a fucking awful idea
url shortening was a fucking awful idea
URL http://urlte.am
Project status Online!
Archiving status In progress...
Project source https://github.com/ArchiveTeam/urlteam-stuff
Project tracker http://tracker.tinyarchive.org/
IRC channel #urlteam (on EFnet)
Project lead Unknown

TinyURL, bit.ly and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.

Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see Wikipedia: Link Rot). Archive.org/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member bit.ly does not actually share their databases and most other big shorteners don't share theirs either.

301Work cooperation

301works logo.jpg

The fine folks at archive.org have provides us with upload permissions to the 301Works archive: http://www.archive.org/details/301utm. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).

Tools

TinyBack

The easiest way to help with scraping is to run the Warrior and select the URLTeam project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:

 git clone https://github.com/ArchiveTeam/tinyback
 cd tinyback
 # Use ./run.py --help for more information on command-line options
 ./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180

URL shorteners

New table

The new table includes shorteners we have already started to scrape.

Name Est. number of shorturls Scraping done by Status Comments
Tinyurl.com 10,000,000,000 Warrior scraping: sequential, done up to azzzzz new shorturls: non-sequential, 7 characters
Bit.ly 50,000,000,000 Warrior scraping: non-sequential, 6 characters new shorturls: non-sequential, 6 characters
Goo.gl ? User:Scumola started (2011-03-04) goo.gl throttles pulls
is.gd 934,134,706 (2013-05-20) Warrior scraping: non-sequential, 6 characters new shorturls: non-sequential, 6 characters
ff.im ? User:Chronomex only used by FriendFeed, no interface to shorten new URLs
4url.cc 1279 (2009-08-14)[1] User:Chronomex dead (2011-02-15)
litturl.com 17096 (2010-04-15)[2] User:Chronomex dead (2010-11-18)
xs.md 3084 (2009-08-15)[3] User:Chronomex done dead (2010-11-18)
url.0daymeme.com 14867 (2009-08-14)[4] User:Chronomex done dead (2010-11-18)
Old tr.im 1990425 User:Soult got what we could dead (2011-12-31)
New tr.im ? Warrior scraping: sequential, done up to 42pzz new shorturls: sequential
visibli (hex) 16777216 User:Chfoo Cake at 19%. Incomplete ~2.7mil 59MB Using links.sharedby.co/links/ as URL prefix.
Name Number of shorturls Scraping done by Status Comments

Alive

Last verified 2013-04-17. Original list last updated 2009-08-14 [5].

  • adf.ly
  • adjix.com
  • ask.fm - ask.fm/a/40k05kgp
  • awe.sm
  • biglnk.com
  • budurl.com - Appears non-incremental
  • buff.ly - Buffer App
  • burl.se
  • cli.gs - Appears non-incremental
  • cl.ly - CloudApp
  • decenturl.com - Not at all easy to scrape.
  • dld.bz - "private URL shortening service"
  • dlvr.it
  • doiop.com - Appears non-incremental
  • easyurl.net - Appears non-incremental: http://easyurl.net/afd2f
  • flip.it - Flipboard
  • fnd.us (See offical shorteners)
  • go.to
  • ilix.in - HTML redirect
  • jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388
  • korta.nu
  • metamark.net / xrl.us - ? http://xrl.us/bfabog
  • myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect
  • notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/
  • nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.
  • ph.ly Related to the pond called Philadelphia, where links are born and raised
  • po.st
  • r.ebay.com
  • rod.gs
  • redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok
  • sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX
  • shar.es (See offical shorteners)
  • shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu
  • shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok
  • shrinkurl.us - Alway telling URL is malformed
  • shrd.by - see sharedby.co
  • shrt.st - Appears incremental: http://shrt.st/vpz
  • simurl.com - Doesn't appear guessable: http://simurl.com/panpes
  • smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.
  • snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt
  • surl.co.uk - Many shortening options.
  • tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv
  • tiny.cc - Appears non-incremental
  • tweetburner.com / twurl.nl - Appears incremental
  • twitthis.com
  • u.mavrev.com - Not accepting new urls.
  • urlcut.com
  • vimeo.com
  • xrl.us - see metamark.net
  • yatuc.com - Not accepting new urls.
  • yep.it

"Official" shorteners

  • bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)
  • CokeURL.com - Coca-Cola
  • db.tt - DropBox
  • fb.me - Facebook
  • flic.kr - Flickr
  • fnd.us - Fundrazr.com
  • goo.gl - Google
  • go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)
  • gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )
  • hub.me - HubPages
  • igg.me - Indiegogo
  • lnkd.in - LinkedIn
  • post.ly - Posterous
  • shar.es - ShareThis - 404 on homepage, otherwise ok
  • skfb.ly - Sketchfab
  • spoti.fi - Spotify
  • stanford.io - Stanford University
  • su.pr - StumbleUpon
  • t.co - Twitter
  • tmblr.co - Tumblr
  • wapo.st - Washington Post
  • wp.me - Wordpress.com
  • y.ahoo.it - Yahoo
  • youtu.be - YouTube
bit.ly aliases
  • 1.usa.gov - USA Government
  • 4sq.com - Foursquare
  • aje.me - Aljazeera
  • amzn.to - Amazon
  • binged.it - Bing (bonus points for being longer than bing.com)
  • chzb.gr - Cheezeburger
  • conta.cc - Constant Contact Inc.
  • dennysd.in - Denny's Restaurants
  • dtoid.it - Destructoid
  • econ.st - The Economist
  • es.pn - ESPN
  • gaw.kr - Gawker
  • grd.to - The Grid TO
  • huff.to - Huffington Post
  • j.mp - bit.ly[6]
  • jrnl.to - thejournal.ie
  • kck.st - Kickstarter
  • marsdd.it - MaRS Discovery District
  • nyti.ms - New York Times
  • onforb.es - Forbes
  • read.bi - Business Insider
  • rseo.co - realseo
  • slackers.co - slackers.com
  • s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly
  • stjo.es - St. Joseph Media
  • squid.us - Laughing Squid
  • tcrn.ch - Techcrunch
  • theatln.tc - The Atlantic
  • usat.ly - USA Today Newspaper
  • vrge.co - The Verge

Dead or Broken

  • 1link.in - Website dead
  • 6url.com - HTML redirect, Error 500
  • ad.vu - mirror of adjix.com, application not found
  • canurl.com - Website dead
  • chod.sk - Appears non-incremental, not resolving
  • digg.com - discontinued - [1]
  • dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041
  • easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3
  • go2cut.com - Website dead
  • gonext.org - not resolving
  • imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve.
  • ix.it - Not resolving
  • jijr.com - Doesn't appear to be a shortener, now parked
  • jump.to - dead as of February 1, 2013
  • kissa.be - "Kissa.be url shortener service is shutdown"
  • kl.am - "kl.am Closes its Shell"
  • kurl.us - Parked.
  • lnkurl.com - Website dead
  • memurl.com - Pronounceable. Broken.
  • miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."
  • minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead
  • minurl.org - Presently in ERROR 404
  • muhlink.com - Not resolving
  • myurl.us - cpanel frontend
  • nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters
  • pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc
  • qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.
  • s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.
  • shortlinks.co.uk - Working again. Maybe not.
  • short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp
  • shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.
  • traceurl.com - DNS fails to resolve.
  • tr.im (1st generation) - "Be back soon!"
  • twitpwr.com - Domain parked.
  • u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)
  • url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."
  • urlborg.com - 404 Not Found.
  • urlcover.com - Domain parked.
  • urlhawk.com - Domain parked.
  • url-press.com - Suspended by web host.
  • urlsmash.com - DNS not resolving.
  • urltea.com - Dreamhost's coming soon page.
  • urlvi.be - Domain parked.
  • urlx.org - Owner has agreed to share his database
  • vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.
  • w3t.org - 403 Forbidden.
  • wlink.us - Domain parked.
  • xaddr.com - Domain parked.
  • xil.in - Under construction.
  • x.se - Cannot resolve, but www.x.se works.
  • xym.kr - Gibberish (?) Korean text blog.
  • yweb.com - Suspicious iframe with long url and fake loading gif image.
  • zi.ma - DNS not resolving.

Discontinued

  • urlbrief.com - co-operates with 301Works.org

Hueg list

[2]

References

Weblinks


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · forums.starwars.com · HeavenGames · Invisionfree · NeoGAF · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · Steam · SteamDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger

Music/Audio

AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Oddshot.tv · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ