Difference between revisions of "FTP"

From Archiveteam
Jump to navigation Jump to search
(Undo revision 19109 by Dashcloud1 (talk))
m (Reverted edits by Megalanya1 (talk) to last revision by Jscott)
(14 intermediate revisions by 9 users not shown)
Line 4: Line 4:
| description =  
| description =  
| project_status = {{online}}
| project_status = {{online}}
| archiving_status = {{notsaved}}
| archiving_status = {{inprogress}}
| source = https://github.com/ArchiveTeam/ftp-nab
| source = https://github.com/ArchiveTeam/ftp-nab
| irc = effteepee
| irc = effteepee
}}
}}


Archiving a whole public '''FTP''' host/mirror is easy:
The '''File Transfer Protocol''', '''FTP''', is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.
SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt


OR, use this handy dandy function to put in your .bashrc file, you can also remove the first and last line to turn it into a fancy bash script. Made by SN4T14
The FTP grab started 30 November 2015. The dashboard for the grab can be viewed at [http://tracker.archiveteam.org/ftp/ http://tracker.archiveteam.org/ftp/].
ftp-grab(){
    target="$1"
    wget -r -l 0 -np -nc "$target"
    if [[ "$target" =~ ^ftp://.*$ ]]
        then
        target="$(echo "$target" | cut -d '/' -f 3)"
        echo "ftp"
        echo "$target"
    fi
    tar cvf $(date +%Y).$(date +%m)."$target".tar "$target"
    tar tvf $(date +%Y).$(date +%m)."$target".tar > $(date +%Y).$(date +%m)."$target".tar.txt
}


Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using <code>tar --remove-files</code>
== ==


lftp -c 'set ftp:use-feat no; du -sh ftp://site'
== How can I help? ==


Now zip/tar it up and [[Internet_Archive#Uploading_to_archive.org|send to the spacious Internet Archive]]![https://archive.org/details/ftpsites] (If you're short on space: <code>tar --remove-files</code> deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike <code>zip -rm</code>.)
=== Running the script manually ===


== The Project ==
If you use Linux and you're a bit familiar with it, you can try running the script directly.


* We're currently [https://github.com/ArchiveTeam/ftp-nab listing all FTP sites on the internet] to download them all.
The instructions can be found at [https://github.com/ArchiveTeam/ftp-grab github.com/ArchiveTeam/ftp-grab].
* We're auding a list of some select FTP sites manually: https://www.piratepad.ca/p/old-ftp-list


{| class="wikitable"
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
|+Who is grabbing what?
! Some additional information
|-
|Midas
|ftp.tu-chemnitz.de
|-
|Midas
|ftp.uni-muenster.de
|-
|Midas
|gatekeeper.dec.com
|-
|Midas
|ftp.uni-erlangen.de
|-
|Midas
|ftp.warwick.ac.uk
|-
|-
| Don't forget to replace YOURNICKHERE with your nickname.
The number after <code>--concurrent</code> determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.
If you see "Project code is out of date", kill the script, go to its folder (<code>cd ftp-grab</code>) and issue <code><nowiki>git pull https://github.com/ArchiveTeam/</nowiki>ftp-grab</code>. After the updating has finished, re-launch the script.
|}
|}
Uni FTP's are massive, currently only grabbing DEC and Sweex.
 
=== Discovery items ===
The project needs to have items to be able to run. You can help discovering these items.
 
Scripts for creating items for the grab can be found at [https://github.com/ArchiveTeam/ftp-queue https://github.com/ArchiveTeam/ftp-queue]. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at [[FTP/List]].
 
=== Donating to the Internet Archive ===
Content downloaded by the ArchiveTeam will be uploaded to the [[Internet Archive]], where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate


== External Links ==
== External Links ==

Revision as of 16:29, 17 January 2017

FTP
Threeplaces.jpg
Status Online!
Archiving status In progress...
Archiving type Unknown
Project source https://github.com/ArchiveTeam/ftp-nab
IRC channel #effteepee (on hackint)

The File Transfer Protocol, FTP, is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.

The FTP grab started 30 November 2015. The dashboard for the grab can be viewed at http://tracker.archiveteam.org/ftp/.

How can I help?

Running the script manually

If you use Linux and you're a bit familiar with it, you can try running the script directly.

The instructions can be found at github.com/ArchiveTeam/ftp-grab.

Some additional information
Don't forget to replace YOURNICKHERE with your nickname.

The number after --concurrent determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.

If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: touch STOP). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.

If you see "Project code is out of date", kill the script, go to its folder (cd ftp-grab) and issue git pull https://github.com/ArchiveTeam/ftp-grab. After the updating has finished, re-launch the script.

Discovery items

The project needs to have items to be able to run. You can help discovering these items.

Scripts for creating items for the grab can be found at https://github.com/ArchiveTeam/ftp-queue. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at FTP/List.

Donating to the Internet Archive

Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate

External Links