Difference between revisions of "FTP"

From Archiveteam
Jump to navigation Jump to search
(→‎The Project: old pad is back and added links to mirrors)
(14 intermediate revisions by 9 users not shown)
Line 3: Line 3:
| image = Threeplaces.jpg
| image = Threeplaces.jpg
| description =  
| description =  
| project_status = {{online}}
| project_status = {{specialcase}}
| archiving_status = {{notsaved}}
| archiving_status = {{inprogress}}
| source = https://github.com/ArchiveTeam/ftp-nab
| source = https://github.com/ArchiveTeam/ftp-nab
| tracker = https://tracker.archiveteam.org/ftp/
| irc = effteepee
| irc = effteepee
}}
}}


Archiving a whole public '''FTP''' host/mirror is easy:
The '''File Transfer Protocol''', '''FTP''', is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.
SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt


OR, use this handy dandy function to put in your .bashrc file, you can also remove the first and last line to turn it into a fancy bash script. Made by SN4T14
The FTP grab started 30 November 2015.
ftp-grab(){
    target="$1"
    wget -r -l 0 -np -nc "$target"
    if [[ "$target" =~ ^ftp://.*$ ]]
        then
        target="$(echo "$target" | cut -d '/' -f 3)"
        echo "ftp"
        echo "$target"
    fi
    tar cvf $(date +%Y).$(date +%m)."$target".tar "$target"
    tar tvf $(date +%Y).$(date +%m)."$target".tar > $(date +%Y).$(date +%m)."$target".tar.txt
}


Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using <code>tar --remove-files</code>
== How can I help? ==
=== Running the script manually ===
If you use Linux and you're a bit familiar with it, you can try running the script directly.


lftp ftp://site.com -e 'du -h'
The instructions can be found at https://github.com/ArchiveTeam/ftp-grab.


An alternate to try if the above does not work correctly (happens more often on old servers):
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
lftp -c 'set ftp:use-feat no; du -h ftp://site'
! Some additional information
|-
| Don't forget to replace YOURNICKHERE with your nickname.
 
The number after <code>--concurrent</code> determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.
 
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.


Now zip/tar it up and [[Internet_Archive#Uploading_to_archive.org|send to the spacious Internet Archive]]![https://archive.org/details/ftpsites] (If you're short on space: <code>tar --remove-files</code> deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike <code>zip -rm</code>.)
If you see "Project code is out of date", kill the script, go to its folder (<code>cd ftp-grab</code>) and issue <code><nowiki>git pull https://github.com/ArchiveTeam/</nowiki>ftp-grab</code>. After the updating has finished, re-launch the script.
|}


== The Project ==
=== Discovery items ===
The project needs to have items to be able to run. You can help discovering these items.


* We're currently [https://github.com/ArchiveTeam/ftp-nab listing all FTP sites on the internet] to download them all.
Scripts for creating items for the grab can be found at https://github.com/ArchiveTeam/ftp-queue. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at [[FTP/List]].
* We're auding a list of some select FTP sites manually:
** http://dat.serveert.me.uk/p/ftp (mirror: https://github.com/midasf/ftplist)
** https://www.piratepad.ca/p/old-ftp-list (mirror: http://dat.serveert.me.uk/p/old-ftp-list)


{| class="wikitable"
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
|+Who is grabbing what?
! Some additional information
|-
|Midas
|ftp.tu-chemnitz.de
|-
|Midas
|ftp.uni-muenster.de
|-
|Midas
|gatekeeper.dec.com
|-
|Midas
|ftp.uni-erlangen.de
|-
|Midas
|ftp.warwick.ac.uk
|-
|-
| [[User:Squidboy]] says:
It's worth noting that as of June 2019 <code>ftp-queue</code> has several [https://github.com/archiveteam/ftp-queue/issues issues] that may make it hard to use.
|}
|}
Uni FTP's are massive, currently only grabbing DEC and Sweex.
 
=== Donating to the Internet Archive ===
Content downloaded by the ArchiveTeam will be uploaded to the [[Internet Archive]], where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. https://archive.org/donate/


== External Links ==
== External Links ==
 
* [https://www.ftp-sites.org Anonymous FTP Sites List]
* [http://www.ftp-sites.org Anonymous FTP Sites List]
* [https://twitter.com/textfiles/status/423243512256028672 @textfiles Talked it over with a few people. We decided to download all the FTP sites. All. of. Them. Smile for your photograph, FTP.]
* [https://twitter.com/textfiles/status/423243512256028672 @textfiles Talked it over with a few people. We decided to download all the FTP sites. All. of. Them. Smile for your photograph, FTP.]
* [https://www.ghacks.net/2019/08/16/google-chrome-82-wont-support-ftp-anymore/ Google uses its browser market dominance to speed up the demise of FTP]


{{navigation_box}}
{{navigation_box}}
[[Category:Web applications]]
[[Category:Web applications]]

Revision as of 05:51, 29 August 2019

FTP
Threeplaces.jpg
Status Special case
Archiving status In progress...
Archiving type Unknown
Project source https://github.com/ArchiveTeam/ftp-nab
Project tracker https://tracker.archiveteam.org/ftp/
IRC channel #effteepee (on hackint)

The File Transfer Protocol, FTP, is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.

The FTP grab started 30 November 2015.

How can I help?

Running the script manually

If you use Linux and you're a bit familiar with it, you can try running the script directly.

The instructions can be found at https://github.com/ArchiveTeam/ftp-grab.

Some additional information
Don't forget to replace YOURNICKHERE with your nickname.

The number after --concurrent determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.

If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: touch STOP). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.

If you see "Project code is out of date", kill the script, go to its folder (cd ftp-grab) and issue git pull https://github.com/ArchiveTeam/ftp-grab. After the updating has finished, re-launch the script.

Discovery items

The project needs to have items to be able to run. You can help discovering these items.

Scripts for creating items for the grab can be found at https://github.com/ArchiveTeam/ftp-queue. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at FTP/List.

Some additional information
User:Squidboy says:

It's worth noting that as of June 2019 ftp-queue has several issues that may make it hard to use.

Donating to the Internet Archive

Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. https://archive.org/donate/

External Links