LeighRoberts' notes and first impressions
Individual Warrior runner advice
- Find out what the maximum item is. Your available disk must be at least twice that big + 10%.
- Find out what the median item is. Add 20%. Multiply by 2, because of the WARC file! Divide your available disk space by that.
- Workers will get stuck on the largest jobs, so even that sizing might not be conservative enough.
- The Docker Warrior needed about 50 MB/container of memory. I did not get figures on yahoo-group-archiver, but I imagine it would be similar.
- Scale slowly - don’t fire off all your containers at once, because they all need to load the same requirements.
- Start about 1/4 of your planned containers and wait 15-30 minutes to check CPU and memory use before starting the next 1/4.
- Check your disk space regularly
- Grafana setup advice for this?
Specific advice for this project that would have been useful
- Recommend to people to have two categories of group-joiner accounts: one for public groups, one for private groups we hope to preserve (project specific)
- Maximum object size available to all (Grafana?)
- queue management is super important
- way to add new tracker capacity as needed
- Make reports.pl part of the Dockerfile and run automatically
- Instructions don’t *exactly* work for Python
- apt install pip3
- location of the run-pipeline3
- might need to pip3 install setuptools first to get requirements.txt to run all the way
(probably most important) Organizational improvements
- have clear information available to all participants about who is in charge of what, who can perform operations (changing limits on trackers/targets, seeing and resetting jobs), what times they’re generally available, contact information (if not publicly-distributed, at least to two or three others, with them as alternate PoC’s)
- determine the maximum number of clients the tracker(s) can feasibly serve (it’s counterproductive and demoralizing for too many clients to be “competing” for jobs)