Difference between revisions of "SourceForge"

From Archiveteam
Jump to: navigation, search
(External links: cat)
(updated)
Line 5: Line 5:
 
| URL = {{url|1=http://sourceforge.net/|2=sourceforge.net}}
 
| URL = {{url|1=http://sourceforge.net/|2=sourceforge.net}}
 
| project_status = {{online}}
 
| project_status = {{online}}
| archiving_status = '''Paused, Awaiting SF Staff Reply'''
+
| archiving_status = {{notsavedyet}}
 
| source = [https://github.com/ArchiveTeam/sourceforge-grab sourceforge-grab], [https://github.com/ArchiveTeam/sourceforge-grab-rsync sourceforge-grab-rsync]
 
| source = [https://github.com/ArchiveTeam/sourceforge-grab sourceforge-grab], [https://github.com/ArchiveTeam/sourceforge-grab-rsync sourceforge-grab-rsync]
 
| tracker = [http://tracker.archiveteam.org/sourceforge sourceforge], [http://tracker.archiveteam.org/sourceforgersync sourceforgersync]
 
| tracker = [http://tracker.archiveteam.org/sourceforge sourceforge], [http://tracker.archiveteam.org/sourceforgersync sourceforgersync]
Line 51: Line 51:
 
* Some projects have subdomain sites. Ex: http://supertuxkart.sourceforge.net/  Many can be listed by using the project API as an "external_homepage".
 
* Some projects have subdomain sites. Ex: http://supertuxkart.sourceforge.net/  Many can be listed by using the project API as an "external_homepage".
  
== How can I help? ==
+
== Archiving ==
  
There are two projects: one that grabs the web content and a copy of the binaries, and another that grabs the sourcecode repositories via rsync.
+
On June 17, 2015, ArchiveTeam started two simultaneous grabbing process: one for web-based content and binaries, and one for rsync-able source code repositories. Shortly afterwards, someone claiming to be a SourceForge staff member, told us to stop and first contact their representative.
  
For both, you can choose selecting the project in the [[Warrior]] appliance (only one of them), or set up and run the script(s) manually.
+
<div style="width:100%">
 +
<pre>
 +
jún 18 22:08:45 <burley-sf> FYI: I just blocked your archive client
 +
jún 18 22:09:05 <JRWR>      oh?
 +
jún 18 22:09:07 <burley-sf> it's not following robots.txt, and hitting recursive deep dives
 +
jún 18 22:09:18 <JRWR>      oh my
 +
jún 18 22:09:24 <arkiver>  burley-sf: We're currently trying to archive the software on your website
 +
jún 18 22:09:26 <burley-sf> I'll also be killing the rsync's here soon, you are going too heavy on this
 +
jún 18 22:09:37 <burley-sf> I understand, and I am OK with that -- but not the way you are doing it
 +
jún 18 22:09:58 <arkiver>  burley-sf: What is your limit?
 +
jún 18 22:09:59 <burley-sf> I suggest you stop, so I don't have to block the IPs for rsync
 +
jún 18 22:10:06 <burley-sf> and reach out to our community guy
 +
jún 18 22:10:14 <burley-sf> gimme min and I'll give you an email address
 +
jún 18 22:10:24 <achip>    rsync is paused
 +
jún 18 22:10:38 <arkiver>  burley-sf: thank you
 +
jún 18 22:11:05 <burley-sf> rgaloppini@slashdotmedia.com
 +
[...]
 +
jún 18 22:36:13 <burley-sf> So reach out to Roberto at the address above and then I am sure we can sort something out that doesn't cause impact to the other users
 +
jún 18 22:37:35 <burley-sf> And if you need to reach me for some reason -- david@sourceforge.net
 +
</pre>
 +
</div>
  
=== Web grab ===
+
We attempted to contact them but got no reply.
 
 
'''Warrior:''' SourceForge
 
 
 
'''Script:''' http://github.com/ArchiveTeam/sourceforge-grab
 
 
 
=== Code rsync ===
 
 
 
'''Warrior:''' SourceForge Rsync
 
 
 
'''Script:''' http://github.com/ArchiveTeam/sourceforge-grab-rsync
 
 
 
'''IMPORTANT''': in case of the rsync project, an item can be even several gigabytes in size! By default, item size accepted by the script/Warrior is limited to 5 GB. If you have orders of magnitude more space, you can bypass this limit (what's more, please, do), see the script README how. '''Note:''' count with twice the size of an item (that is, the downloaded copy plus the tar to be uploaded both sit on your HDD until the item finishes). Also multiply it by the concurrency level (Warrior default: 2).
 
 
 
'''Note:''' rsync download processes are limited, only one can run at the same time (to prevent banning from SourceForge's side).
 
 
 
=== General info for script runners ===
 
Read the instructions (README) of the corresponding repository.
 
 
 
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
 
! Some additional information
 
|-
 
| Don't forget to replace YOURNICKHERE with your nickname.
 
 
 
The number after <code>--concurrent</code> determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, HDD, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.
 
 
 
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.
 
 
 
If you see "Project code is out of date", kill the script, go to its folder and issue <code>git pull REPOSITORY</code>, where REPOSITORY stands for the URL of either the <code>sourceforge-grab</code> or the <code>sourceforge-grab-rsync</code> repository, see above. After the updating has finished, re-launch the script.
 
|}
 
 
 
=== Donating to the Internet Archive ===
 
 
 
Content downloaded by the ArchiveTeam will be uploaded to the [[Internet Archive]], where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate
 
 
 
=== Do you like our cause? ===
 
 
 
If you want to help in other projects, want to learn more about ArchiveTeam, or even help in development in general, navigate to the [[Main Page]] of this wiki, from there you can reach a lot of information. The Team consists of volunteers working on the projects in their free time, so helping hands (and resources) are always welcome.
 
  
 
== References ==
 
== References ==

Revision as of 11:16, 10 July 2016

SourceForge
SourceForge logo
SourceForge.png
URL sourceforge.net[IAWcite.todayMemWeb]
Project status Online!
Archiving status Not saved yet
Project source sourceforge-grab, sourceforge-grab-rsync
Project tracker sourceforge, sourceforgersync
IRC channel #coldstorage (on EFnet)
Project lead Unknown

SourceForge is a free software repository.

It's really old, ad supported, adware supported. And yet, it is still alive.

It hosts code migrated from BerliOS[1] which shut down.

Shutdown?

2015: Removal of FRS Area

Hello,
You have been identified as having saved files in you user FRS profile area (/home/pfs/<username>. We are planning on removing this
area for user accounts on March 17th 2015. We wanted to give you the opportunity to move your data to a new location before we
remove the data. Here is a link that should help you with moving your data:
https://sourceforge.net/p/forge/documentation/SFTP/
If you need any help please contact us.
Thanks
SourceForge.net Support
sfnet_ops@slashdotmedia.com
https://sourceforge.net/support

[2]

2015: Admins hijacking projects to add more adware

http://lwn.net/SubscriberLink/646118/f8f6483b64fdafb9/

Site Structure

Download files can be found on public ftp mirrors, priority on the rest of the site then download files last? e.g. http://www.mirrorservice.org/sites/ftp.sourceforge.net/

CVS/svn/git/hg/bzr repositories should be a priority; many projects do not have their source code on the ftp mirrors.

The main API is documented here: http://sourceforge.net/p/forge/documentation/Allura%20API/ and allows unauthenticated access to most services. It also can indicate what revision control system is used.

Appropriate tools, (such as git clone -m and svnrdump) can be used to backup, but SF suggests using rsync regardless of the actual revision control system used.

Archiving

On June 17, 2015, ArchiveTeam started two simultaneous grabbing process: one for web-based content and binaries, and one for rsync-able source code repositories. Shortly afterwards, someone claiming to be a SourceForge staff member, told us to stop and first contact their representative.

jún 18 22:08:45 <burley-sf> FYI: I just blocked your archive client
jún 18 22:09:05 <JRWR>      oh?
jún 18 22:09:07 <burley-sf> it's not following robots.txt, and hitting recursive deep dives
jún 18 22:09:18 <JRWR>      oh my
jún 18 22:09:24 <arkiver>   burley-sf: We're currently trying to archive the software on your website
jún 18 22:09:26 <burley-sf> I'll also be killing the rsync's here soon, you are going too heavy on this
jún 18 22:09:37 <burley-sf> I understand, and I am OK with that -- but not the way you are doing it
jún 18 22:09:58 <arkiver>   burley-sf: What is your limit?
jún 18 22:09:59 <burley-sf> I suggest you stop, so I don't have to block the IPs for rsync
jún 18 22:10:06 <burley-sf> and reach out to our community guy
jún 18 22:10:14 <burley-sf> gimme min and I'll give you an email address
jún 18 22:10:24 <achip>     rsync is paused
jún 18 22:10:38 <arkiver>   burley-sf: thank you
jún 18 22:11:05 <burley-sf> rgaloppini@slashdotmedia.com
[...]
jún 18 22:36:13 <burley-sf> So reach out to Roberto at the address above and then I am sure we can sort something out that doesn't cause impact to the other users
jún 18 22:37:35 <burley-sf> And if you need to reach me for some reason -- david@sourceforge.net

We attempted to contact them but got no reply.

References

External links


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · forums.starwars.com · HeavenGames · Invisionfree · NeoGAF · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · Steam · SteamDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger

Music/Audio

AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Oddshot.tv · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ