Difference between revisions of "URLTeam"

From Archiveteam
Jump to: navigation, search
Line 46: Line 46:
 
  ifood.tv
 
  ifood.tv
 
  ilix.in
 
  ilix.in
imdb.com
 
 
  is.gd
 
  is.gd
 
  ix.it
 
  ix.it
Line 67: Line 66:
 
  poprl.com
 
  poprl.com
 
  qurlyq.com
 
  qurlyq.com
readthisurl.com
 
 
  redirx.com
 
  redirx.com
 
  s3nt.com
 
  s3nt.com
Line 79: Line 77:
 
  shurl.net
 
  shurl.net
 
  simurl.com
 
  simurl.com
 +
shorl.com
 
  smarturl.eu
 
  smarturl.eu
 
  snipr.com
 
  snipr.com
Line 96: Line 95:
 
  tr.im
 
  tr.im
 
  tweetburner.com
 
  tweetburner.com
  TwitPWR.com
+
  twitpwr.com
 
  twitthis.com
 
  twitthis.com
 
  twurl.nl
 
  twurl.nl
Line 109: Line 108:
 
  url-press.com
 
  url-press.com
 
  urlsmash.com
 
  urlsmash.com
urlsn.com
 
 
  urltea.com
 
  urltea.com
 
  urlvi.be
 
  urlvi.be
Line 123: Line 121:
 
  yweb.com
 
  yweb.com
 
  zi.ma
 
  zi.ma
 +
w3t.org

Revision as of 20:04, 26 January 2009

Too many people using TinyURL and similar services

Twitter is a great example of what's wrong with trusting an online service with something of value. Check out some 'tweets':

  • Hah, I'm a Zombie! http://tinyurl.com/8gnnb7 Ahh, the fun we all have with each other. about 1 hour ago from web
  • Health privacy is dead. Here's why: http://ff.im/GMpx about 14 hours ago from FriendFeed
  • Hmm, friendfeed released a new "import Twitter" feature today. It is taking a LONG time on my account. I wonder why.... http://ff.im/GM5W about 14 hours ago from FriendFeed

If these TinyURL services go away, there's not much content here.

So, the project, scrape the TinyURL (and similar) services. It's actually not as hard as it sounds, because we don't need to scrape any web pages or parse any html, since the services just send a Location: header when queried for the hash, we just ask the service for the hash and parse the headers for the redirect url:

(18) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://tinyurl.com/6dvm2t | grep Location 
Location: http://www.readwriteweb.com/archives/too_many_people_use_tinyurl.php
(19) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://ff.im/GMpx | grep Location
Location: http://friendfeed.com/e/08954685-00fe-4e55-b28f-4b99f83bfb0d/Health-privacy-is-dead-Here-s-why/

Walk through all possible hash tags, check for errors, and we're good-to-go.

STATUS: Crawling tinyurl.com and ff.im as a first test at an acceptable rate so I won't get my IP banned.

Sites that I've collected that offer similar services:

1link.in
4url.cc
6url.com
adjix.com
ad.vu
bellypath.com
bit.ly
bkite.com
budurl.com
canurl.com
cli.gs
decenturl.com
dn.vc
doiop.com
dwarfurl.com
easyuri.com
easyurl.net
ff.im
go2cut.com
gonext.org
hulu.com
hypem.com
ifood.tv
ilix.in
is.gd
ix.it
jijr.com
kissa.be
kurl.us
litturl.com
lnkurl.com
memurl.com
metamark.net
miklos.dk
minilien.com
minurl.org
muhlink.com
myurl.in
myurl.us
notlong.com
ow.ly
plexp.com
poprl.com
qurlyq.com
redirx.com
s3nt.com
shorterlink.com
shortlinks.co.uk
short.to
shorturl.com
shrinklink.co.uk
shrinkurl.us
shrt.st
shurl.net
simurl.com
shorl.com
smarturl.eu
snipr.com
snipurl.com
snurl.com
sn.vc
starturl.com
surl.co.uk
tighturl.com
timesurl.at
tiny123.com
tiny.cc
tinylink.com
tinyurl.com
tobtr.com
traceurl.com
tr.im
tweetburner.com
twitpwr.com
twitthis.com
twurl.nl
u.mavrev.com
ur1.ca
url9.com
urlborg.com
urlbrief.com
urlcover.com
urlcut.com
urlhawk.com
url-press.com
urlsmash.com
urltea.com
urlvi.be
vimeo.com
wlink.us
xaddr.com
xil.in
xrl.us
x.se
xs.md
yatuc.com
yep.it
yweb.com
zi.ma
w3t.org