Difference between revisions of "Software"

From Archiveteam
Jump to navigation Jump to search
m (→‎Hosted tools: Added Webrecorder to hosted tools)
m (→‎Hosted tools: distinguished the two sites by bullet points)
Line 19: Line 19:


== Hosted tools ==
== Hosted tools ==
[http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing.  The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category.  This may pose problems if you ever need to get your data out in a hurry.
* [http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing.  The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category.  This may pose problems if you ever need to get your data out in a hurry.


[https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible.
* [https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible.


== Site-Specific ==
== Site-Specific ==

Revision as of 19:08, 5 December 2017

WARC Tools

The WARC Ecosystem includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.

General Tools

  • GNU WGET
    • Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
  • cURL
  • HTTrack - HTTrack options
  • Pavuk -- a bit flaky, but very flexible
  • http://warrick.cs.odu.edu/warrick.html
  • Beautiful Soup - Python library for web scraping
  • Scrapy - Fast python library for web scraping
  • Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
  • WiLiSe WikiLink Search - Python script to get links to specific pages of a site through the search in a Wiki (MediaWiki-type) has the api.php accessible or extension LinkSearch enabled (the project is still very immature and at the moment the code is only available in this SVN repository).
  • Mobile Phone Applications -- some notes on preserving old versions of mobile apps
  • freeyourstuff.cc -- Extensible open-source (source) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, IMDB, TripAdvisor, Amazon, GoodReads, and Quora as of 22:52, 11 June 2016 (EDT)

Hosted tools

  • Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.
  • Webrecorder is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible.

Site-Specific

Format Specific

Web scraping

Why Back Up?SoftwareFormats