Wikimedia Commons

From Archiveteam
Jump to: navigation, search
Wikimedia Commons
Wikimedia Commons logo
Wikimedia Commons mainpage on 2010-12-13
Wikimedia Commons mainpage on 2010-12-13
URL http://commons.wikimedia.org
Project status Online!
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam

Wikimedia Commons is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).

Current size (based on January 18, 2012 estimate): 13.3TB, old versions 881GB

Contents

Archiving process

Tools

How-to

Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:

  • python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]

Don't forget 30th days and 31st days on some months. Also, February 29th in some years.

To verify the download data use the checker script:

  • python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]

Tools required

If downloading using a very new server (i.e. a default virtual machine), you got to download zip (Ubuntu: apt-get install zip)

Python should be already installed on your server, if not then just install it!

Also has a dependency on curl and wget, which should be installed on your server by default...

Volunteers

Please, wait until we do some tests. Probably, long filenames bug.
Nick Start date End date Images Size Revision Status Notes
Hydriz 2004-09-07 2005-06-30  ?  ? r643 Downloaded
Uploaded to the Internet Archive
Check:
October 2004: [1]
November 2004: [2]
December 2004: [3]
January 2005: [4]
February 2005: [5]
March 2005: [6] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)
April 2005: [7]
May 2005: [8]
June 2005: [9]
Hydriz 2005-07-01 2005-12-31  ?  ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2005: [10]
August 2005: [11]
September 2005: [12]
October 2005: [13]
November 2005: [14]
December 2005: [15]
Hydriz 2006-01-01 2006-01-10 13198 4.8GB r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-01-11 2006-06-30  ?  ? r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-07-01 2006-12-31  ?  ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ
August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ
September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw
October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w
November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg
December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw
Hydriz 2007-01-01 2007-12-31  ?  ? r349 Downloading Check:
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007

Errors

  • oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg
  • broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
  • Issue 45: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.
  • Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26, 2007-03-07, 2007-03-13, 2007-03-25, 2007-03-30, 2007-04-12, 2007-04-14, 2007-04-20, 2007-05-04, 2007-05-08, 2007-05-10, 2007-05-29, 2007-06-05, 2007-06-22.

I'm going to file a bug in bugzilla.

Uploading

UPLOAD using the format: wikimediacommons-<year><month>

E.g. wikimediacommons-200601 for January 2006 grab.

If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.

Other dumps

There is no public dump of all images. WikiTeam is working on a scraper (see section above).

Pictures of the Year (best ones):

Featured images

Wikimedia Commons contains a lot images of high quality.

Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png

Size stats

Combined image sizes hosted in Wikimedia Commons sorted by month.

date	sum(img_size) in bytes
2003-1	1360188
2004-10	637349207
2004-11	726517177
2004-12	1503501023
2004-9	188850959
2005-1	1952816194
2005-10	17185495206
2005-11	9950998969
2005-12	11430418722
2005-2	3118680401
2005-3	3820401370
2005-4	5476827971
2005-5	10998180401
2005-6	7160629133
2005-7	9206024659
2005-8	12591218859
2005-9	14060418086
2006-1	15433548270
2006-10	33574470896
2006-11	34231957288
2006-12	30607951770
2006-2	14952310277
2006-3	19415486302
2006-4	23041609453
2006-5	29487911752
2006-6	29856352192
2006-7	32257412994
2006-8	50940607926
2006-9	37624697336
2007-1	40654722866
2007-10	89872715966
2007-11	81975793043
2007-12	75515001911
2007-2	39452895714
2007-3	53706627561
2007-4	72917771224
2007-5	72944518827
2007-6	63504951958
2007-7	76230887667
2007-8	91290158697
2007-9	100120203171
2008-1	84582810181
2008-10	122360827827
2008-11	116290099578
2008-12	126446332364
2008-2	77416420840
2008-3	89120317630
2008-4	98180062150
2008-5	117840970706
2008-6	100352888576
2008-7	128266650486
2008-8	130452484462
2008-9	120247362867
2009-1	127226957021
2009-10	345591510325
2009-11	197991117397
2009-12	228003186895
2009-2	125819024255
2009-3	273597778760
2009-4	212175602700
2009-5	191651496603
2009-6	195998789357
2009-7	241366758346
2009-8	262927838267
2009-9	184963508476
2010-1	226919138307
2010-2	191615007774
2010-3	216425793739
2010-4	312177184245
2010-5	312240110181
2010-6	283374261868
2010-7	362175217639
2010-8	172072631498

See also

  • Wikipedia, some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use

External links


[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY · Deathwatch · Projects · Download available archives
Archiveteam.jpg
Archiving projects Archive.is · BetaArchive · Internet Archive · It Died · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES
The Dead, the Dying & The Damned · UK Web Archive · WebCite
Blogs/website hosts Angelfire · Blogger · Blogster · EtherPad · FortuneCity · Free ProHosting · Fuelmyblog · GeoCities (patch) · Google Sites · Jux · LiveJournal · My Opera · Open Diary · Posterous · Prodigy.net · Proust · Splinder · Tripod · Vox · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd
Corporations Apple · IBM · Google · Microsoft · Yahoo!
Events Arab Spring · Occupy movement · Spanish Revolution
Font Repos Google Web Fonts · GNU FreeFont · Fontspace
Image hosting services Cameroid · Flickr · Geograph Britain and Ireland · ImageShack · Imgur · Instagr.am · Panoramio · Photobucket · Picasa · Picplz · Ptch · puu.sh · Snapjoy · TwitPic · Wikimedia Commons
Knowledge/Wikis arXiv · Citizendium · Edit.This · Encyclopedia Dramatica · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books · Insurgency Wiki · Knol · Nupedia · OpenCourseWare · OpenStreetMap · Project Gutenberg · Puella Magi · Referata · SongMeanings · ShoutWiki · The Internet Movie Database · The Pirate Bay · TropicalWikis · Urban Dictionary · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia · Wikispaces · Wik.is · Wiki-Site · WikiTravel
Microblogging Heello · Identi.ca · Jaiku · Plurk · Sina Weibo · Tumblr · Twitter · TwitLonger
Music/Audio Audimated.com · digCCmixter · Dogmazic.net · Free Music Archive · Gogoyoko · Indaba Music · Jamendo · Last.fm · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Twaud.io
People Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project
Q&A Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Expers Exchange · GirlsAskGuys · Google Answers · Google Questions and Answers · JustAnswer · MetaFilter · Quora · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers
Social bookmarking Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · Microsoft TechNet · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Zootool · Zotero
Social networks Bebo · BlackPlanet · Classmates.com · Cyworld · deviantART · Dopplr · douban · Ello · Facebook · Flixster · Friendster · Gaia Online · Google+ · Habbo · hi5 · Hyves · LinkedIn · mixi · MyHeritage · MyLife · Myspace · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Tagged · Viadeo · Vkontakte · WeeWorld · Wretch · more sites...
Software Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHub · Gitorious · Gna! · Google Code · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · mozdev · OSOR.eu · OW2 Consortium · Openmoko · Ourproject.org · Project Kenai · RubyForge · SEUL.org · SourceForge · tigris.org · Transifex · TuxFamily
Video hosting services Academic Earth · Blip.tv · Google Video · Justin.tv · TED Talks · Ustream · Viddler · Vimeo · Yahoo! Video · YouTube
Other 4chan · April Fools' Day · Amplicate · Circavie · Co.mments · Dmoz · Electronic Frontier Foundation · Feedly · Ficlets · FriendFeed · Gopher · Google Books Ngram · Google Reader · IFTTT · isoHunt · MegaUpload · MyBlogLog · Pastebin · Propeller.com · Quantcast · Salon Table Talk · SOPA blackout pages · World Wide Web · Yahoo! Buzz · Yahoo! Groups
Teams Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam
About Archive Team Introduction · Philosophy · Who We Are · Why Back Up? · Software · Films and documentaries about archiving · Formats · Cheap storage · Storage Media · Recommended Reading · FAQ
Personal tools