|Archiving status||In progress...|
|Project source||zetaboards-discovery zetaboards-grab|
ZetaBoards (formally InvisionFree) is a forum host that offers both paid and free-with-ads forums to anyone. It claims to have been "used by millions of people looking for a place to gather, discuss and share.". Initial checks show that over 75,000 boards are listed on their "Featured Board Index" page on their site, showing the scale of this project. ZetaBoards is owned by Zathyus Networks, Inc., which also owns zIFBoards.
Aside from a few hiccups, there was no huge concerns about the host going away. However, it wasn't until Zathyus' official announcement that the entirety of ZetaBoards have been thrown into jeopardy. Every single board will eventually be ported to Tapatalk, a mobile-centric forum platform, which is known to not only severely cripple the boards it hosts, but also kill them off after three months of inactivity (the staff makes it sound ambiguous as hell, saying the forums will be locked, not removed - but TT's Terms of Service say otherwise). Apparently, the Tapatalk staff are also known to monitor every single forum they can and delete posts on sight, as long as they violate their ToS even metaphorically.
Needless to say, a lot of users are annoyed that they, among other things, can't modify their boards' design. Aside from slapping a logo on the header. That's about it.
Currently a list of board URLs is being discovered, using the numerical URL format described below.
Once this is completed, these boards will be sorted into private and public boards, before being archived.
ZetaBoards has experienced multiple periods of downtime, the recorded latest being for a few days 2016-11-21 to 2016-11-23 for many of their boards.
With the announcement of the Tapatalk merger in May 2018, many people that still use forums decided it'd be better to pack their things and go someplace else, to a superior host. Thus, many boards are being deleted prior to full migration - make haste to keep the remaining ones alive.
ZetaBoards URLs have two formats:
The standard on is SERVER.zetaboards.com/BOARD where SERVER is s1 - s15 (and maybe also w1 - w15?) and BOARD is the text name of the board.
The other format is zetaboards.com/directory/?p=jump&v=BOARDNUM&s=SERVERNUM where SERVERNUM is the numerical ID of the board and SERVERNUM is the numerical number of the server. At this point in time, it is unclear how the distinction is made between s servers and w servers, however by making SERVERNUM 0 you will usually be redirected to the appropriate server. This URL will redirect you to the standard URL that is detailed above.
This information is only applicable to boards that are listed on the Zetaboards Forum Directory, which is a directory that board owners can apply to be part of. It is therefore unsuitable for scraping board URLs.
Instead, board URLs should be discovered using the following format:
http://BOARD.SERVER.zetaboards.com/ where SERVER is 1 - 15 and BOARD is the numerical ID of the board. It is likely (but not confirmed) that the board IDs are unique across all the servers, rather than being unique to an individual server.
ZetaBoards provides two types of boards - public and private. Public boards can be accessed by anyone, whereas private boards can only be read by a logged in member, making their archiving harder. ZetaBoards has an optional captcha on the signup page and uses Cloudflare anti spam protection, increasing the difficulty.
The URL formats are as below:
All boards have an /index/ page
They then have many /forum/FORUMID/ pages, which are the sub-boards of the board
Within each /forum/FORUMID/ you have /topic/TOPICID/ pages, one for each thread
The /topic/ s can have multiple pages, which are selected by a number after /topic/ , ie: /topic/10193784/1 is page 1 of topic 10193784
Posts are shown under the /topic/ pages, however each post also has an alternative view, /single/?p=POSTID&t=TOPICID. It's almost certainly unnecessary to archive these, as they are simply duplicates of the info on /topic/
There is a page /members/ which lists all members. Like /topic/ and /forum/, this can have multiple pages which are done in the usual format.
User profiles are /profile/PROFILEID/ pages.
PMs are done with the /comm/ and /msg/ pages. Not needed to archive.
Searching is handled by /search/. A search URL could look like http://BOARDURL/search/?c=1&q=SEARCHQUERY&type=post&sort=desc&forum%5B%5D=-1&s_m=3&s_d=8&s_y=2014&e_m=5&e_d=20&e_y=2018 which looks for posts, in all forums, within all dates, sorted descending. Probably not necessary to archive these.
However, searching is also used for:
- listing active topics: /BOARDURL/search/?c=5
- showing posts by certain member: http://BOARDURL/search/?c=2&mid=USERID
- showing topics by certain member: http://BOARDURL/search/?c=4&mid=USERID
There are probably more types of pages, but these are all I could see upon quick inspection.
Summary: a simple grab of the entire board should be sufficient, ignoring /single/* /comm/* /msg/* /search/?c=1* /stats/marktopics/ /stats/mark/ /home/?c=4 /online/* /home/?c=4 /home/?c=2 /home/?c=16 /home/?c=20 /home/?c=8 /home/?c=37 /home/?c=6 /home/?c=32 /home/?c=10 /home/?c=14 /home/?c=22 /login/logout/ /login/multi_login/