Difference between revisions of "Google Baraza"
(→How can I help?: added recommended concurrency) |
(more howcanihelp) |
||
Line 41: | Line 41: | ||
== How can I help? == | == How can I help? == | ||
=== Running a Warrior === | === Content discovery === | ||
==== Running a Warrior ==== | |||
You can start up a [[Warrior]] and there select ''Baraza Discovery''. (If you don't really care what you are archiving, select ''ArchiveTeam's Choice'' instead, as at some points ArchiveTeam may priorize another project.) '''Increasing number of concurrent threads too much may result in a temporary(?) ban.''' | |||
==== Running the script manually ==== | |||
If you use Linux and you're a bit familiar with it, you can try running the script directly. | |||
The instructions can be found at [https://github.com/ArchiveTeam/baraza-discovery github.com/ArchiveTeam/baraza-discovery]. | |||
'''Increasing number of concurrent threads too much may result in a temporary(?) ban.''' | |||
{| class="mw-collapsible mw-collapsed" style="text-align:left;" | |||
! Some additional information | |||
|- | |||
| Don't forget to replace YOURNICKHERE with your nickname. | |||
The number after <code>--concurrent</code> determines how many threads run at the same time. '''Increasing number of concurrent threads too much may result in a temporary(?) ban.''' | |||
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file. | |||
If you see "Project code is out of date", kill the script, go to its folder (<code>cd baraza-discovery</code>) and issue <code><nowiki>git pull https://github.com/ArchiveTeam/</nowiki>baraza-discovery</code>. After the updating has finished, re-launch the script. | |||
|} | |||
=== Content grab === | |||
==== Running a Warrior ==== | |||
You can start up a [[Warrior]] and there select ''Baraza''. (If you don't really care what you are archiving, select ''ArchiveTeam's Choice'' instead, as at some points ArchiveTeam may priorize another project.) '''Recommended number of concurrent threads: 3''' per IP. (4 is risky, 5 results in ban). | You can start up a [[Warrior]] and there select ''Baraza''. (If you don't really care what you are archiving, select ''ArchiveTeam's Choice'' instead, as at some points ArchiveTeam may priorize another project.) '''Recommended number of concurrent threads: 3''' per IP. (4 is risky, 5 results in ban). | ||
=== Running the script manually === | ==== Running the script manually ==== | ||
If you use Linux and you're a bit familiar with it, you can try running the script directly. | If you use Linux and you're a bit familiar with it, you can try running the script directly. | ||
The instructions can be found at [https://github.com/ArchiveTeam/baraza- | The instructions can be found at [https://github.com/ArchiveTeam/baraza-grab github.com/ArchiveTeam/baraza-grab]. | ||
'''Recommended number of concurrent threads for this project: 3''' per IP. (4 is risky, 5 results in ban). | '''Recommended number of concurrent threads for this project: 3''' per IP. (4 is risky, 5 results in ban). | ||
Line 62: | Line 90: | ||
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file. | If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file. | ||
If you see "Project code is out of date", kill the script, go to its folder (<code>cd baraza- | If you see "Project code is out of date", kill the script, go to its folder (<code>cd baraza-grab</code>) and issue <code><nowiki>git pull https://github.com/ArchiveTeam/</nowiki>baraza-grab</code>. After the updating has finished, re-launch the script. | ||
|} | |} | ||
Revision as of 04:12, 29 April 2015
Google Baraza | |
URL | http://www.google.com/baraza/, http://otvety.google.ru/ |
Status | Closing |
Archiving status | In progress... |
Archiving type | Unknown |
Project source | baraza-discovery, baraza-grab, baraza-items |
Project tracker | barazadisco, baraza |
IRC channel | #bonanza (on hackint) |
Google Baraza (also known as Google Questions and Answers) was a Q&A service designed to replace Google Answers. It later shut down on June 23, 2014, and left a public archive behind. This archive will be deleted on April 30, 2015.
Site structure
- http://www.google.com/baraza/en/thread?tid=009241deff6deee3
- http://www.google.com/baraza/en/fhistory?fid=009241deff6deee30004a4ecdc10793e (history for an individual post)
- http://www.google.com/baraza/en/label?lid=6bb539a0d62b5bf7
- http://www.google.com/baraza/en/user?userid=09917256578240167148
- http://www.google.com/baraza/en/labelusers?lid=2ba73dfbe84c4940
- the following pages can be (ab)used for discovery:
- http://www.google.com/baraza/en/topics DONE
- http://www.google.com/baraza/en/labels DONE
- http://www.google.com/baraza/en/users DONE
- individual user and label pages can be used for discovering threads, individual thread pages can also be used for discovering users TODO (likely as a warrior project)
- unique content is located at google.com/baraza/en/, google.com/baraza/fr/, and otvety.google.ru/otvety/
- all sites have the same structure
Archive Status
It got thrown into ArchiveBot (job ident 6p1on8xhw243qqku1x1l3nsyj & 74zrpb2bapc0o4h4o7iff9r6u (en), russian and french jobs don't exist).
May need to be a warrior job as SimpleBrain has been getting redirect login messages.
Warrior project for discovery started on 2015-04-25.
How can I help?
Content discovery
Running a Warrior
You can start up a Warrior and there select Baraza Discovery. (If you don't really care what you are archiving, select ArchiveTeam's Choice instead, as at some points ArchiveTeam may priorize another project.) Increasing number of concurrent threads too much may result in a temporary(?) ban.
Running the script manually
If you use Linux and you're a bit familiar with it, you can try running the script directly.
The instructions can be found at github.com/ArchiveTeam/baraza-discovery.
Increasing number of concurrent threads too much may result in a temporary(?) ban.
Some additional information |
---|
Don't forget to replace YOURNICKHERE with your nickname.
The number after If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: If you see "Project code is out of date", kill the script, go to its folder ( |
Content grab
Running a Warrior
You can start up a Warrior and there select Baraza. (If you don't really care what you are archiving, select ArchiveTeam's Choice instead, as at some points ArchiveTeam may priorize another project.) Recommended number of concurrent threads: 3 per IP. (4 is risky, 5 results in ban).
Running the script manually
If you use Linux and you're a bit familiar with it, you can try running the script directly.
The instructions can be found at github.com/ArchiveTeam/baraza-grab.
Recommended number of concurrent threads for this project: 3 per IP. (4 is risky, 5 results in ban).
Some additional information |
---|
Don't forget to replace YOURNICKHERE with your nickname.
The number after If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: If you see "Project code is out of date", kill the script, go to its folder ( |
Donating to the Internet Archive
Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate
Do you like our cause?
If you want to help in other projects, want to learn more about ArchiveTeam, or even help in development in general, navigate to the Main Page of this wiki, from there you can reach a lot of information. The Team consists of volunteers working on the projects in their free time, so helping hands (and resources) are always welcome.