User talk:Bzc6p/Archive1

From Archiveteam
Jump to: navigation, search

This is the first archive of User talk:bzc6p, made on 2015-06-28.

Re: Some friendly words

Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.

And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, this site. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo...

And by the way, SketchCow disliked the fact that I "asked too many questions". Archive Maniac 13:25, 19 October 2014 (EDT)

I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... Archive Maniac 17:47, 19 October 2014 (EDT)
Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... Archive Maniac 17:42, 20 October 2014 (EDT)

Any Help on Chat?

What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P Archive Maniac 20:59, 21 October 2014 (EDT)

ArchiveBot Requests

Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. Archive Maniac 18:55, 18 November 2014 (EST)

I have two more questions (the thing that made users upset at me):
  1. I like archiving stuff. What archiving tools do you know of and recommend?
  2. Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.
  3. Why doesn't the ArchiveTeam make C++ ports of their Python tools?
  4. When I try to use Wget, I get this error in the command prompt: Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor. Do you know how to fix this problem?

I hope you're not too annoyed by these questions, like the others would probably be. Archive Maniac 12:01, 20 November 2014 (EST)

Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. Archive Maniac 11:45, 22 November 2014 (EST)

Blank CD Question

Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? Archive Maniac 14:51, 29 November 2014 (EST)

Blogter.hu's Unexpected Downfall

Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. Archive Maniac 19:46, 7 December 2014 (EST)

What I'm Currently Doing

Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading my own collections to the Internet Archive. There's some stuff in there which you'll probably enjoy. :)

And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.

P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.

Anyway, nice to message you again. Good luck saving Hungarian sites! :) Archive Maniac 21:15, 5 January 2015 (EST)

Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D

And it's a shame extra.hu is gone... It looked like an excellent web host...


I also have issues with using wikiadownloader.py. It gives me this error:

Traceback (most recent call last):
  File "wikiadownloader.py", line 41, in <module>
    f = open('wikia.com', 'r')
IOError: [Errno 2] No such file or directory: 'wikia.com'


Do you know what that is? Archive Maniac 12:25, 6 January 2015 (EST)

View Archive.org Directories as Text Only

Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? Archive Maniac 18:56, 22 January 2015 (EST)

I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). Archive Maniac 18:32, 23 January 2015 (EST)
Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. Archive Maniac 14:46, 24 January 2015 (EST)
Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) Archive Maniac 17:30, 24 January 2015 (EST)
Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) PiRSquared 23:59, 25 January 2015 (EST)
Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. Archive Maniac 14:14, 26 January 2015 (EST)

FTP Sites

Hey, bzc6p, have you ever considered trying to crawl FTP sites (see FTP article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do:
Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off
Hope this helps. It's sort of an ArchiveBot alternative.

Or if you don't want to save files on to your computer and delete them every time you crawl a site:

wget http://web.archive.org/save/urlgoeshere.com -r --spider -np -e robots=off
Archive Maniac 14:15, 1 February 2015 (EST)
Thanks for your input; allows me to learn more. And if you'd like, I'm finding some Hungarian FTP hosts to archive. :) Archive Maniac 23:45, 1 February 2015 (EST)
Like here's the first one I uploaded: https://archive.org/details/ftp.debella.aszi.sztaki.hu . Archive Maniac 12:17, 2 February 2015 (EST)
Oh, and about about the fact that bulk saving URLs on to the Wayback Machine is not as efficient and making WARC files, the newer Wget releases can create WARC files; to be hones, it's effectively an alternative to the ArchiveBot. According to Arkiver, the Internet Archive staff can inject WARC files into the Wayback Machine, even if crawled by Wget.
Oh yeah, and check the dec3199 tag daily (link). I've put more Hungarian FTP sites on the Internet Archive for your sake (because you're like my buddy) and categorized it under said tag. And guess what surprise I have for you? That's right—I'm busy uploading Microsoft's FTP site (66 GB zipped!) on to the Internet Archive! Archive Maniac 14:48, 5 February 2015 (EST)