Difference between revisions of "User:Bzc6p"
(→Recommended tools: comments, plus the IA S3 API)
|Line 70:||Line 70:|
== Recommended tools ==
== Recommended tools ==
*Chfoo's [http://github.com/chfoo/wpull Wpull]
* Chfoo's [http://github.com/chfoo/wpull Wpull]to
* Ikreymer's [http://webrecorder.io webrecorder.io] when things are too difficult
* wgetlacks some handy features wpull already has got
*Alard's [https://github.com/alard/warc-proxy warc-proxy]
* Ikreymer's [http://webrecorder.io webrecorder.io]when things are too difficult
*Kngenie's [https://github.com/kngenie/ias3upload ias3upload]
* Alard's [https://github.com/alard/warc-proxy warc-proxy]
Ikreymer's [https://github.com/ikreymer/webarchiveplayer webarchiveplayer]
* Kngenie's [https://github.com/kngenie/ias3upload ias3upload]uploading
* [https://pypi.python.org/pypi/internetarchive internetarchive]tool ,
== Further plans ==
== Further plans ==
Revision as of 11:25, 11 April 2017
„Beauty consists of its own passing, just as we reach for it. It’s the ephemeral configuration of things in the moment, when you see both their beauty and their death. [...] Maybe that’s what being alive is all about: so we can track down those moments that are dying.”
„Az a szép, amit akkor ragad meg az ember, miközben elmúlik. A dolgok tiszavirág-életű formája abban a percben, amikor egyszerre látjuk életüket és a pusztulásukat. [...] Talán ilyen az, hogy élők vagyunk: olyan pillanatokat hajszolunk, amelyek elenyésznek.”
bzc6p is a Hungarian amateur who joined the efforts of ArchiveTeam. "Specialized" in watching and saving Hungarian websites.
vichratimot (kukac) euromail (pont) hu
vichratimot (at) euromail (dot) hu
Websites that I've archived, I'm archiving or I've took part in organizing their archival, in reversed chronological order in each category. If the website has an entry on this wiki, consult that page for the archives. If not, a link to the archives should be found in the appropriate line.
- Dagály Fürdő (archive)
- WirtschaftsBlatt (archive)
- Balassi Intézet (archive)
- hi.co (archive)
- 2000 (archive)
- Mozaik TV (archive)
- Precedens Nyelvstúdió (archive)
- Álomautó Múzeum (archive)
- Kecskeméti Szimfonikus Zenekar (archive)
- Mele Café (archive)
- Café Alibi (archive)
- Kecskeméti Kulturális és Konferencia Központ (archive)
- Széplaki Erzsébet (archive)
- Freddy Fitness (archive)
- legalja.hu (archive)
- Astra Insurance: Romania (archive), Hungary (archive)
- Kajászószentpéter (archives: website, photos, videos)
- netszar.com (archive)
- Hungarian Volán websites
- A few wikis in the beginning.
My to-do list
In order of urgency.
- nol.hu – after being darkened on 2016-10-08, returned on 2016-12-13 as an archive
- TVN.hu websites – site seems to be okay, but company behind it is very much in the red for years now
- hotdog.hu – company that bought it performs very badly
- mindenkilapja.hu – owner seems okay but site is not cared of, full of spam
- Ingyenweb – a very old and obsoleted free webhosting site not cared for by the maintainer. Also sitting on some valuable domain names
- videok.hu – a video sharing site not (any more) too popular, not really cared for, owner same as the one of Ingyenweb
- News+C: Saving (Hungarian) news websites that have considerable amount of user content (=Comments).
My experience with my few website archiving endavours so far suggests that there are very few websites today that can be mirrored completely in automated ways without human control and intervention. Thus, if one wants to make quality archives even of a small website, it needs more or less attention, often additional work, or several, supplemental runs of archiving tools.
These archiving tools (wget, wpull, ArchiveBot etc.) are very important and useful, but in most cases, are themselves incapable of making complete archives. My philosophy is that we should do as complete and quality archives as possible, if we set off on the journey of archiving a website, so we cannot rely solely on these tools. Of course, constrained by time and resources, we must make a compromise. Otherwise, however, the above applies. At least for me. This is how I archive.
Saving to WARC
- Chfoo's Wpull: a good alternative to wget, still being developed, with good archiving support
- wget: faster, but lacks some handy features wpull already has got, and is pretty much in its final state
- Ikreymer's webrecorder.io: save while browsing, when things are too difficult
- Alard's warc-proxy: using a proxy, provides more accurate replay, but doesn't support HTTPS, and development seems to be stopped
- Ikreymer's webarchiveplayer: doesn't use proxy, works similarly to the Wayback Machine, but because of that some URLs are not rewritten in the files and may not play back properly
Uploading to IA
- Kngenie's ias3upload: just uploading, and needs a metadata CSV-file beforehand, but otherwise works fine
- IA-developed internetarchive: more versatile tool (upload, download, search etc.)
- Direct use of the Internet Archive S3 API with the curl program. The above uploading tools are based on this interface.
I hope one day I can re-host Hungarian websites that are dead now but have been archived. Or, at least, create a Wayback Machine for Hungarian websites, that would also serve as a mirror to the corresponding Internet Archive items.
As for the URL Team project, given that the discovered URLs have not been saved in WARC format (yet) but in a format difficult to access and read, a shorturl-resolver service for already gone URL shorteners would be useful. It would be kind of a Wayback Machine for URL shorteners. It wouldn't even be difficult to set up, based on URL Team databases.
I would also be glad to record Hungarian radio and television channels' programme 24/7, but that would require a vast amount of resources, Until / instead of that, I'll probably collect some recordings of notable Hungarian TV and radio programmes and moments from YouTube.
Hungarian articles about Archive Team
Below I've collected online Hungarian news articles published about Archive Team that I've been able to find. The list is in reversed chronological order.
- Péter Szűcs: Az internet nem felejt (The internet doesn't forget). itcafe.hu, 2015-03-05. (About ArchiveTeam's activity in general.)
- Dániel Dojcsák: Elpusztulhat a nem profitképes online tartalom (Non-profitable online content may vanish). hwsw.hu, 2013-12-03. (Mentions ArchiveTeam saving Blip videos.)
- Mit szóltok filmletöltők? Két héttel a bezárása után ismét működik a népszerű torrentoldal (What do you say, movie leechers? Two weeks after its closure popular torrent site runs again). hvg.hu, 2013-10-30. (About IsoHunt restoration.)
- Lementik a legnagyobb torrentkeresőt (They download the biggest torrent search site). index.hu, 2013-10-21. (About saving IsoHunt.)
- Ádám Szedlák: Új otthont kaptak az őshonlapok (The ancient websites got a new home). origo.hu, 2009-11-02. (About Geocities.)
- Ádám Szedlák: Megmentik az őshonlapokat (They are saving the ancient websites). origo.hu, 2009-05-13. (About Geocities.)
- Sándor Berta: Archiválják a GeoCities-tartalmakat (They archive GeoCities' contents). sg.hu, 2009-05-04.