A screen shot of the home page taken on 10 April 2012.
|Archiving status||Not saved yet|
Wiki-Site is a wikifarm.
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis
This farm offers API, but applies very strict throttles and captchas in addition to being quite slow; best results have currently been reached with a --delay of 200 seconds.
If you see yourself in a loop of HTTP errors, stop everything for 3600 seconds or more before restarting.
Do not run uploader.py at the same time! Do not visit or query the website from the same machine at the same time!
Concretely, you can apply something like the following patch and then run
while true; do python launcher.py wiki-site.com ; sleep 3600s; done in a screen. Be ready to make more local patches to reduce requests (e.g. to saveIndexPHP() and friends) and work around blocks.
73,77c72,73 < < # time.sleep(60) < # Uncomment what above and add --delay=60 in the dumpgenerator.py calls below for broken wiki farms < # such as editthis.info, wiki-site.com, wikkii (adjust the value as needed; < # typically they don't provide any crawl-delay value in their robots.txt). --- > > time.sleep(400) 80c76 < subprocess.call('./dumpgenerator.py --api=%s --xml --images --resume --path=%s' % (wiki, wikidir), shell=True) --- > subprocess.call('./dumpgenerator.py --api=%s --delay=200 --xml --images --resume --path=%s' % (wiki, wikidir), shell=True) 82c78 < subprocess.call('./dumpgenerator.py --api=%s --xml --images' % wiki, shell=True) --- > subprocess.call('./dumpgenerator.py --api=%s --delay=200 --xml --images' % wiki, shell=True)