GeoCities Project

From Archiveteam
(Redirected from Geocities Project)
Jump to: navigation, search

Upon the news of the closing of GeoCities by Yahoo, Archive Team initiated the GeoCities Project, a coordinated effort to rescue as much of GeoCities' data as possible off the to-be-decomissioned GeoCities servers. This project was begun in April of 2009, and continued throughout the summer of 2009 up to the closing date of October 26, 2009 by Yahoo. A list of Frequently Asked Questions about this project was generated and is available Here.

Parallel to our efforts (and in conjunction with them) archive.org began a major "deep crawl" of GeoCities to add to their wayback machine. The page for their project is here. Please note that Archive Team and archive.org are 100% separate entities, with different approaches to the project of saving data and history.

It can not be stressed enough how many people were involved with this project - some preferred to be behind the scenes, while Jason Scott continued his habit of being a complete media hog, getting a lot of the interviews and face time with people asking what was up. But there were dozens of people involved, and they supplied weeks of time and effort to find efficient ways to download all of this data before it was removed.

Technical Details About GeoCities

These are now-defunct facts about GeoCities, culled from various sources, intended to provide some technical context for the arrangement of GeoCities that were discovered during the harvesting phase of data.

GeoCities Neighborhoods

Before the acquisition by Yahoo, GeoCities used an unusual organization method for its userbase: Neighborhoods. Separating the subject matter of the pages by taste, neighborhoods with names like Area51 (Science Fiction and Fantasy), Nashville (Country Music), Augusta (Golf) and others allowed for an easier time of finding subject matter the browser was searching for. It helps to give context that search engines as the modern world knows them did not exist in such force.

A neighborhood would have up to 9,999 accounts underneath them, with the numbers representing the user's "block". Over time, GeoCities added "Suburbs", which allowed an expansion past 9,999 users; these would have names like "Vault" and "Cavern" under the "Area51" neighborhood. A URL would then be available in the form of www.geocities.com/NEIGHBORHOOD/SUBURB/XXXX.

Geocities Homestead Neighborhoods and Suburbs, although having not been updated since 2007, gives an excellent overview of the GeoCities history of Neighborhood organization.

The Various Names and Incarnations of GeoCities

Originally called Beverly Hills Internet, the company opened up free web hosting in 1995 after a beta period. [1] It renamed itself to Geopages, and then GeoCities. After its acquisition by Yahoo, its name was changed to Yahoo GeoCities, which is what it remained until its demise.

The Size and Amount of GeoCities Accounts

GeoCities would provide a limited amount of space for its users to build websites, although this amount grew over time. While the most famous is about fifteen megabytes per site, the number was actually much more variant and changed through different amounts over its lifetime. This is an attempt to find citations of the size from various sources; it is clear from the various points of reference that different people got different deals through GeoCities over the years, especially with regard to paid versus free hosting.

This small size explains the usual look and feel of GeoCities accounts, as users were naturally restricted in what items they could have on their pages, and would lean towards simple graphics or utilizing hotlinsk to build their look.

  • 1997: 2mb Limit for GeoCities. [2]
  • April 29, 1997: GeoCities welcomes its 500,000th "Homesteader" and increases the limit to 11mb. [3]
  • 1998: 15mb limit for small business service [4]
  • 1999: GeoCities has 12 terabytes of storage. [5]
  • 2001: 15mb for GeoCities, 25mb for $8.95 a month [6]
  • 2002: 15mb Limit for GeoCities.
  • 2002: 25mb for the newly introduced "GeoCities Plus"
  • 2003: 25mb for GeoCities Plus (As of June)
  • 2005: 75mb for GeoCities Plus (As of January)
  • 2005: 25mb for GeoCities Plus (As of April)

Yahoo's Site Explorer showed 23M html pages in Yahoo's index as of April 29th, 2009.

Tips n' Tricks

  • Although simple directory listings aren't accessible for users' accounts, you might be able to obtain Apache-style directory listing for their subdirectories. For example, by stripping off the page filename for http://www.geocities.com/nenehs_world1/discography/homebrew.html, we can obtain an index for the subdirectory http://www.geocities.com/nenehs_world1/discography/; the benefit of this is that there may exist files which are not linked internally or externally, so crawlers are not made aware of them. Unfortunately, it seems many users do not organize their content into subdirectories, instead preferring to dump all files directly into the user directory. Also, they may have been good webmasters and provided a directory index which overrides directory listings.

Lists

Users involved

  • User:Jscott, Joey paulprote and many others are downloading the main www.geocities.com stuff.
  • User:Soult downloaded parts of de.geocities.com, which is available as tar archive here (download takes 1-2 minutes to start before the first packets arrive, be patient)
  • User:Bbot is mirroring downloaded content.
  • User:Scumola is crawling GeoCities using the archive.org crawler but on hold in June due to Comcast's 250GB bandwidth limit. Will resume in July.
  • Asheesh Laroia (User:Paulproteus) helped test User-Agent tricks to download from GeoCities, and purchased geociti.es.
  • User:Gouki, is downloading br.geocities.com.
  • User:Jourdy288 is going to try to save Sega Master System Land.
Uf009617.gif

[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY · Deathwatch · Projects
Archiveteam.jpg
Archiving projects Archive.is · BetaArchive · Gmane · Internet Archive · It Died · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite
Blogging Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd
Cloud hosting/file sharing AnyHub · Box · Dropbox · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase
Corporations Apple · IBM · Google · Lycos Europe · Microsoft · Yahoo!
Events Arab Spring · Occupy movement · Spanish Revolution
Font Repos Google Web Fonts · GNU FreeFont · Fontspace
Forums 4chan · College Confidential · ESPN Forums · forums.starwars.com · HeavenGames · Yahoo! Messages · Yahoo! Neighbors
Gaming Atomicgamer · City of Heroes · Club Nintendo · Desura · Emulation Zone · GameMaker Sandbox · Halo · Infinite Crisis · Minecraft.net · Player.me · Playfire · Steam · Warhammer · Xfire
Image hosting AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgur · Inkblazers · Instagr.am · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons
Knowledge/Wikis arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram) · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia) · Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal
Magazines/Blogs/News Cyberpunkreview.com · Game Developer Magazine · Gigaom · Helium · JPG Magazine · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices
Microblogging Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger
Music/Audio AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · TuneWiki · Twaud.io · WinAmp
People Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project
Protocols/Infrastructure FTP · Gopher · IRC · Usenet · World Wide Web
Q&A Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers
Recipes/Food Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList
Social bookmarking Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero
Social networks Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...
Shopping/Retail Alibaba · AliExpress · Amazon · Apple Store · eBay · Printfection · RadioShack · Sears · Target · The Book Depository · ThinkGeek · Walmart
Software/code hosting Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads
Torrenting/Piracy ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz
Video hosting Academic Earth · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Qwiki · Stickam · TED Talks · Twitch.tv · Ustream · Viddler · Viddy · Vimeo · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)
Web hosting Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch) · Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media
Web applications Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin
Other AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · Volán · Widgetbox · Windows Technical Preview · Wunderlist · Zoocasa
Information A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Backup Tips · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG
Projects Audit2014 · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census) · IRC Quotes · ISP Hosting · JSMESS · JSVLC · Just Solve the Problem · Project Newsletter · University Web Hosting · Valhalla · Woohoo
Tools ArchiveBot · ArchiveTeam Warrior (Tracker) · Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)
Teams Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam
About Archive Team Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ
Personal tools