Difference between revisions of "Google Code"
(→URL lists: also) |
(→URL lists: reorgniase) |
||
Line 26: | Line 26: | ||
=== URL lists === | === URL lists === | ||
Some seeds for site discovery: | Some seeds for site discovery: | ||
* Underway: Scrape Google Code Search | * Underway: Scrape Google Code Search | ||
** Fetch results for each label, for example: label:javascript | ** Fetch results for each label, for example: label:javascript | ||
** Google Code search results can be grabbed in packs of 100, just add "&num=100" to the end of the URL. | ** Google Code search results can be grabbed in packs of 100, just add "&num=100" to the end of the URL. | ||
** [http://paste.archivingyoursh.it/raw/fajesufise.vhdl '''Phase 1''']. Quick grep says 114,262 projects, plus 71,972 labels for further searching. | ** [http://paste.archivingyoursh.it/raw/fajesufise.vhdl '''Phase 1''']. Quick grep says 114,262 projects, plus 71,972 labels for further searching. | ||
* [http://paste.archivingyoursh.it/raw/himupisime URLs from ArchiveTeam IRC logs] | |||
* [http://paste.archivingyoursh.it/raw/pehobejoxi List scraped from MediaWiki wikis] | |||
* [http://paste.archivingyoursh.it/raw/yulugedasa List from FlossMole's data] (sorted from a possibly-incomplete survey in November 2012: http://flossdata.syr.edu/data/gc/) | |||
* [http://paste.archivingyoursh.it/jepivocine.avrasm Links from Open Directory Project] | |||
* TODO: Scrape Google Search | |||
* TODO: Scrape Bing | * TODO: Scrape Bing | ||
* TODO: Scrape Twitter | * TODO: Scrape Twitter | ||
* TODO: Scrape the Common Crawl Index | * TODO: Scrape the Common Crawl Index | ||
===Tools === | ===Tools === |
Revision as of 21:46, 12 March 2015
Google Code | |
URL | Google Code[IA•Wcite•.today•MemWeb] |
Status | Closing |
Archiving status | Upcoming... |
Archiving type | Unknown |
IRC channel | #googlecodeblue (on hackint) |
Google Code (AKA Project Hosting) is a software repository that is owned by Google. It hosts only open source software paired with an open source license.[1]
Google Code allows people to commit their code into either a Subversion (SVN), Git or Mercurial repository. It has a downloads section for people to upload their software packages (with a quota limit of 4GB, can be increased upon request) and also a wiki for projects to document their work at. There is also an issue tracker to track bugs in the project's software.
Vital signs
Closing on 25th January, 2016[2].
Archiving
Archiving source code repositories is rather easy (and incremental). Just clone the git/hg repository, or checkout SVN repo. For SVN, make sure that you checkout all branches, not just trunk.
Archiving bugtrackers and the other stuff will be a bit harder.
A tool to export a repository to GitHub is available[3].
URL lists
Some seeds for site discovery:
- Underway: Scrape Google Code Search
- Fetch results for each label, for example: label:javascript
- Google Code search results can be grabbed in packs of 100, just add "&num=100" to the end of the URL.
- Phase 1. Quick grep says 114,262 projects, plus 71,972 labels for further searching.
- URLs from ArchiveTeam IRC logs
- List scraped from MediaWiki wikis
- List from FlossMole's data (sorted from a possibly-incomplete survey in November 2012: http://flossdata.syr.edu/data/gc/)
- Links from Open Directory Project
- TODO: Scrape Google Search
- TODO: Scrape Bing
- TODO: Scrape Twitter
- TODO: Scrape the Common Crawl Index
Tools
- FlossMole provides a set of tools to spider projects from GC