Difference between revisions of "Starwars.yahoo.com"

From Archiveteam
Jump to: navigation, search
(Created page with 'Problems encountered: * Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this. ** TOR ** multiple IPs The ta…')
 
Line 1: Line 1:
 
Problems encountered:
 
Problems encountered:
 
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP.  We used two approaches to get around this.
 
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP.  We used two approaches to get around this.
** TOR
+
** TOR (slow as molasses, but worked) - collected using httrack
** multiple IPs
+
** multiple IPs (fast, but needs large IP resources) - collected using wget
  
 
The tarballs in the archive reflect both archiving methods:
 
The tarballs in the archive reflect both archiving methods:
 
  -rw-r--r--  1 root  root  228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2
 
  -rw-r--r--  1 root  root  228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2
 
  -rw-r--r--  1 root  root    36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2
 
  -rw-r--r--  1 root  root    36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2

Revision as of 20:10, 23 December 2009

Problems encountered:

  • Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this.
    • TOR (slow as molasses, but worked) - collected using httrack
    • multiple IPs (fast, but needs large IP resources) - collected using wget

The tarballs in the archive reflect both archiving methods:

-rw-r--r--  1 root   root   228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2
-rw-r--r--  1 root   root    36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2