myVIP

From Archiveteam
Revision as of 18:20, 10 October 2015 by Bzc6p (talk | contribs) (→‎Archiving: a bit about size)
Jump to navigation Jump to search
myVIP
MyVIP logo
Second most popular Hungarian social network
Second most popular Hungarian social network
URL http://myvip.com[IAWcite.todayMemWeb]
Status Endangered
Archiving status Upcoming...
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

AT TLDR

myVIP is an earlier popular Hungarian social network, it started in 2006 as the second of its kind. Although iWiW was very popular then, myVIP also could collect a lot of users in no time. However, Facebook took over the social network market in Hungary too, and myVIP also got deserted. And, considering that iWiW shut down with a lot more visitors than myVIP in 2014, it's a wonder that myVIP is still up.

History

The site started on April 8, 2006, and grew insanely quickly. Number of users reached 500,000 in just 52 days, no matter that registration was bound to invitation. The 1 millionth member registered after 7 months, the 2 millionth in less than 2 years, and by mid-2009 it had 2.5 million profiles.

It is not easy to find up-to-date information about the number of registered users, but, if we consider the case of iWiW, we can assume that it increased until 2010, when a quick decline started. The highest profile ID is a bit above 4,600,000 but some profiles have been deleted.

In 2014, people report that the once popular myVIP is now deserted, hardly anyone logs in regularly.

Interesting that Epicenter Média Kft, that bought up myVIP and other stuff from Generál Média, keeps running the site, however, it's possibly not making much profit. On the one hand, myVIP is a bit more aggressive in showing ads than iWiW. On the other hand, it was never as popular as iWiW. And iWiW was shut down in 2014.

All in all, we should presume that myVIP won't stay for long. And as it used to be a central point of Hungarian internet, it must be preserved.

Status

The site seems to be generally stable. Some bugs appear, but staff is – surprisingly – active in fixing some of them. However, bugs seem to appear more and more often (currently[1] 404 and 502 errors), but the most worrysome is that the owner probably doesn't have much profit in keeping the site up.

Archiving

Due to myVIP's closed system and robots.txt, Wayback Machine can't archive the site.

user:bzc6p decided to save the site, but it's too big for one person. He has written a working bash script that is able to save profiles with all related stuff – but this script must be "translated" into one that fits into ArchiveTeam's infrastructure. However, user:bzc6p doesn't have the time and knowledge to do this in the near future (after which it might be too late).

The algorithm is shown in the bash script, which is also well commented.

Some notes about the site, the script and the archiving process:

  • The site utilizes a lot of javascript but still can be saved perfectly.
  • Scraping persons will need to register an account, and the cookies file (exported from the browser) must be fed with the script. Note: visiting myvip.com in a new browser session instantly invalidates the old cookie!
  • The total size of content is probably less than 5 terabytes, and definitely less than 10 terabytes.
  • The bash script is an amateur work, might be bash-specific at some points (i.e. not usable with other shells).
  • However, the script has been tested, should be reliable and do its job.
  • The script currently supports only profiles. Club pages should also be saved later, having this algorithm makes it simpler to write the one needed for that.
  • The script accepts userids, that go from 1 up to like 4,600,000 (?) sequentially, but not all profiles exist.
  • The script currently saves each user's stuff into separate WARC files (should be changed, as one such WARC file might be too little, resulting in lots of little files).
  • The script saves the followings of a user: profile page, acquaintances (or "friends") list, photo albums, photos, comments on photos. That's all that should be saved, if any.
  • The script supports creating a "directory" of users: it extracts some identifying information and stores in a one-line CSV file per user. (Should be adjusted just like the WARC; later they can be concatenated to form a database.)
  • The script also creates lists of profile picture and club avatar thumbnails that are used in lists on the site. They could be saved for every user, but that would mean that a profile picture would be requested as many times as many acquaintances the user has. So, creating a list of them and then downloading all those tiny pics only once is the feasible solution.
  • The script currently has a(n almost) separate discovery and grab phase. This means that some (many) pages are requested twice: while discovery and while WARCing. This could be probably optimized.
  • A user's acquaintances list is a problematic point. When first visiting the list (clicking "Ismerősök"), an alphanumeric pager ID is generated. The request for the other pages of the list needs this pager ID. However, a new request for the initial pager ID invalidates the earlier one! Also, the pager ID expires in 20 minutes – that is, all pages of a user's acq. list must be saved in 20 minutes. (This is why the script currently does it strictly in one separate phase in the end, and that the initial page is grabbed separately, to find out the current pager ID.)
  • The site should be saved in Hungarian. There is an English language option, but how it works hasn't been tested out. (Is it automatically set to English when visited outside of Hungary? Does the site remember the setting? Is the setting sent in a cookie or in the URL? etc.)
  • The script uses wget for discovery. It's much faster, but it's not immune to DNS resolution errors (doesn't retry), that's why a separate bash function for fetching with wget.
  • The script uses wpull for grabbing (WARCing), beacuse it's much more intelligent than wget. (The wget-lua version could probably also be used, though, but that needs some coding.)
  • The script often checks whether we still are logged in. If not, then the item – depending on which phase we are in – pauses (sleep) or fails.
  • The bash script doesn't support running multiple instances of it, in its current state.
  • A list of static files (that need to be downloaded only once) is here.

For more info, see the code. Further questions should be addressed to user:bzc6p, either on this page's talk page, or on his talk page.

Size

The first ~4400 profiles show that like half of the possible profiles exist, and those have an average profile size of ~850 kilobytes, with the largest ones being 40 megabytes (WARC compressed). Note that this is a rough estimate with a small sample. (That would mean that the profiles would be like ~2 TB, not counting the profile pic thumbnails and the clubs, but those are probably not too significant in size.)

Partisan actions

Until the Warrior project kicks off, user:bzc6p has started to save some profiles (starting from 1) with his own script, with a low pace (1000–1500 profiles/day).

Notes

  1. 2015-07-30

Sources


     Hungarian websites     
Red entries indicate websites which don't have an article on this wiki yet. Striked-through entries indicate websites that have already been shut down.
Archives & Digital Libraries mek.oszk.hu  · epa.oszk.hu  · dka.oszk.hu  · webarchivum.oszk.hu  · NAVA  · Fortepan  · fentrol.hu
Blogging Blog.hu  · Blogter  · Freeblog  · Blogger.hu  · reblog.hu  · xfree.hu  · cafeblog.hu
Social networks iWiW  · myVIP  · hotdog.hu  · Baratikor.com  · network.hu  · Mommo  · privi.hu
Webhosting Extra  · tar.hu  · ATW  · Ingyenweb  · Freeweb  · Ultraweb  · x3.hu  · ini.hu  · ininet.hu  · G-Portál  · uCoz  · eOldal  · ewk  · 5mp.eu  · mindenkilapja  · Webnode
Forums, message boards* Index  · SG  · Nők Lapja Cafe  · Hoxa
Video hosting Indavideó  · Videa  · videoplayer.hu  · xfree.hu  · videok.hu
Image hosting Kepfeltoltes.hu  · Fotoalbum.hu  · Indafotó  · Kephost.com  · pics.coldline.hu  · kep.tar.hu  · noob.hu  · PSharing (a.k.a. ivPicture)  · Kephost.hu  · kepfeltoltes.eu  · kephost.net  · kepkuldes.com  · xfree.hu  · GTF Képhost  · fotozz.hu  · Kepkezelo.com  · keptarad.hu  · darkweb.hu  · fos.hu
Questions and Answers gyakorikerdesek.hu  · tudjatok.hu
File sharing data.hu  · toldacuccot.hu  · hellshare.hu  · addat.hu  · fileposta.hu
Document sharing doksi.hu  · Docplayer
Fun Demotiváló  · keptelenseg.hu  · csubakka.hu  · nemkutya.com  · legalja.hu  · szanalmas.hu  · trollfesz.cc  · gumicsizma.hu
Trash napiszar.com  · napiszar.hu  · netszar.com  · napiszar.org
Other News+C  · moly.hu  · gyertyalang.hu  · Volán websites  · Szuperinfó