Difference between revisions of "GeoCities Japan"

From Archiveteam
Jump to: navigation, search
(Marking the status as “in progress”.)
(Updating discovery info)
Line 16: Line 16:
  
 
== Discovery Info ==
 
== Discovery Info ==
* Geocities.jp first level list: https://archive.org/details/geocities_jp_first_level
+
* DNS CNAMEs for geocities (JSON format) ('''dead link'''): https://transfer.sh/QYWEG/geocities-dns-data
* Blogs.yahoo.co.jp first level list: https://archive.org/details/blogs_yahoo_co_jp_first_level
+
 
* DNS CNAMEs for geocities (JSON format): https://transfer.sh/QYWEG/geocities-dns-data
+
* Several records available at: https://anonfile.com/z1z62ak8ba/records_zip
 +
** geocities_jp_first.txt: First level subdirectory list under geocities.jp, compiled from IA CDX data. 566,690 records in total.
 +
** geocities_co_jp_first.txt: Same as above, for geocities.co.jp. 12,470 records in total.
 +
*** NOTE: The majority of sites under geocities.co.jp are not first-level sites, but "field" sites which are second-level (there could be, in theory, 1.79M of them; how many actually exist unknown), see explanation below.
 +
** blogs_yahoo_co_jp_first.txt: Same as above, for blogs.yahoo.co.jp. 646,901 records in total.
 +
** geocities_co_jp_fields.txt: List of field names under geocities.co.jp.
 +
*** Individual websites are listed in the following format: "http://www.geocities.co.jp/[FieldName]/[AAAA]" where AAAA ranges from 0000 to 9999.
 +
** include-surts.txt: List of subdomains that should be allowed by your crawler.

Revision as of 00:35, 5 November 2018

GeoCities Japan
GeoCities Japan logo
Employee captured tearing page.png
URL http://www.geocities.jp/, http://www.geocities.co.jp/
Project status Closing
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #notagain (on EFnet)
Project lead Unknown

GeoCities Japan is the Japanese version of GeoCities. It survived the 2009 shutdown of the global platform.

Shutdown

On 2018-10-01, Yahoo! Japan announced that they would be closing GeoCities at the end of March 2019. (New accounts can still be created until 2019-01-10.)

Discovery Info

  • Several records available at: https://anonfile.com/z1z62ak8ba/records_zip
    • geocities_jp_first.txt: First level subdirectory list under geocities.jp, compiled from IA CDX data. 566,690 records in total.
    • geocities_co_jp_first.txt: Same as above, for geocities.co.jp. 12,470 records in total.
      • NOTE: The majority of sites under geocities.co.jp are not first-level sites, but "field" sites which are second-level (there could be, in theory, 1.79M of them; how many actually exist unknown), see explanation below.
    • blogs_yahoo_co_jp_first.txt: Same as above, for blogs.yahoo.co.jp. 646,901 records in total.
    • geocities_co_jp_fields.txt: List of field names under geocities.co.jp.
      • Individual websites are listed in the following format: "http://www.geocities.co.jp/[FieldName]/[AAAA]" where AAAA ranges from 0000 to 9999.
    • include-surts.txt: List of subdomains that should be allowed by your crawler.