Polygamia.pl is a Polish video gaming news website. It was originally a part of the media portal gazeta.pl until December 2015, when it was bought out by another company.
All old articles were retained, but old user accounts (tied to the gazeta.pl portal) were lost.
The user comments under articles are missing and presumably gone for good. The user blogs are likewise missing, and it has been stated that the old blog posts will not be transferred to the new site, though supposedly they're still on the old owner's servers and may be recovered in some unknown way.
Article comments (outdated)
The article URLs are of the form http://polygamia.pl/Polygamia/1,107162,19605465,zapraszamy-na-nowa-polygamie-w-early-access.html
The comments are visible under the articles, but, for articles with a large number of comments, a limited 'infinite scroll' system is used. By scrolling to the bottom of the page, the appearance of several more comments is triggered. After two-three rounds of this, the page instead shows a link to the "show all comments" page. The URL of that page is of the form http://polygamia.pl/Polygamia/1,107162,19605465,zapraszamy-na-nowa-polygamie-w-early-access.html?v=1&obxx=19605465&offset=19#opinions
In other words, it's exactly the article URL, except with several additional parameters:
- v=1 means do not display the full text of the article, just the heading.
- obxx= means that this is the "show all comments" page. The value of the parameter doesn't seem to matter. Even if it is blank, the comments are shown correctly.
- offset=19 means skipping several top comments (these which were displayed with the infinite scroll.) By removing this parameter, you can easily get every single comment.
- There is also a page= parameter (starting at 1). However, it is almost always unnecessary, and only a small part of all articles needs more than one page to fit all comments.
So it's possible to create a list of specially prepared URLs with parameters picked exactly to display all comments. Here (currently up-to-date as of Feb 20)
One downside is that they're not linked to directly from the article pages, which means that once they're in the IA it might be a little complicated for the end user to find them, but there's no real way around that
A list of every single blog post, in reverse chronological order, is at http://polygamia.pl/blogi It is an easy matter to scrape all the blog posts' URLs from that, either by hand, or by pointing a web scraper at this address and telling it to scrape everything in that folder.
A blog post has an URL of the form http://polygamia.pl/blogi/kumasztotera/2012/05/jak_zjezdzilem_pol_warszawy_by_kupic_gre_ktora_miala_dzis_premiere/2 ("kumasztotera" is the author's username, then follows the date and the title. The significance of the number at the end is unknown but it is necessary.)
Comments are displayed directly beneath a blog post. Only a handful of blog posts have so many comments that they have a "show all comments" link (no infinite scrolling here), whose URL follows the same rules as explained in the "Article comments" section.
This article is from .