Difference between revisions of "URLTeam"

From Archiveteam
Jump to: navigation, search
(New page: === Too many people using TinyURL and similar services === Twitter is a great example of what's wrong with trusting an online service with something of value. Check out some 'tweets': *...)
 
Line 13: Line 13:
 
  (18) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://tinyurl.com/6dvm2t | grep Location  
 
  (18) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://tinyurl.com/6dvm2t | grep Location  
 
  Location: http://www.readwriteweb.com/archives/too_many_people_use_tinyurl.php
 
  Location: http://www.readwriteweb.com/archives/too_many_people_use_tinyurl.php
 +
(19) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://ff.im/GMpx | grep Location
 +
Location: http://friendfeed.com/e/08954685-00fe-4e55-b28f-4b99f83bfb0d/Health-privacy-is-dead-Here-s-why/
  
 
Walk through all possible hash tags, check for errors, and we're good-to-go.
 
Walk through all possible hash tags, check for errors, and we're good-to-go.

Revision as of 18:17, 21 January 2009

Too many people using TinyURL and similar services

Twitter is a great example of what's wrong with trusting an online service with something of value. Check out some 'tweets':

  • Hah, I'm a Zombie! http://tinyurl.com/8gnnb7 Ahh, the fun we all have with each other. about 1 hour ago from web
  • Health privacy is dead. Here's why: http://ff.im/GMpx about 14 hours ago from FriendFeed
  • Hmm, friendfeed released a new "import Twitter" feature today. It is taking a LONG time on my account. I wonder why.... http://ff.im/GM5W about 14 hours ago from FriendFeed

If these TinyURL services go away, there's not much content here.

So, the project, scrape the TinyURL (and similar) services. It's actually not as hard as it sounds, because we don't need to scrape any web pages or parse any html, since the services just send a Location: header when queried for the hash, we just ask the service for the hash and parse the headers for the redirect url:

(18) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://tinyurl.com/6dvm2t | grep Location 
Location: http://www.readwriteweb.com/archives/too_many_people_use_tinyurl.php
(19) swebb@swebb.cluster Wed 11:10am  [~] % curl -LLIs http://ff.im/GMpx | grep Location
Location: http://friendfeed.com/e/08954685-00fe-4e55-b28f-4b99f83bfb0d/Health-privacy-is-dead-Here-s-why/

Walk through all possible hash tags, check for errors, and we're good-to-go.