ArchiveBot/Bot documentation

From Archiveteam
Jump to: navigation, search

This page documents how the User:HadeanEon ArchiveBot wiki bot works.


The bot takes a list of URLs and generates a table out of it, listing for each URL the relevant ArchiveBot jobs. This allows collecting resources related to a particular topic and keeping an overview of which have been archived.


The bot needs two pages. One is the page where the table is shown ("tracking page"), and the other contains the URL list ("list page"). The name of the list page is equal to the one of the tracking page with /list appended.

Create the page "ArchiveBot/Example" with this text:

Optional: an introduction what this page is about if it's not obvious from the title.

<!-- bot --><!-- /bot -->


And the page "ArchiveBot/Example/list" with a plain list of URLs:

Once the bot processes the page, "ArchiveBot/Example" will look something like this:

  • Statistics: Saved! (0) · Not saved yet (2) · Total size (0 KiB)

Do not edit this table, it is automatically updated by bot. There is a raw list of URLs that you can edit.


The bot replaces the contents between <!-- bot --> and <!-- /bot --> with its output. Editing any of it is useless since it will be overwritten the next time the bot runs, but you can do anything before or after those bot marks.

The bot also sorts the list page before generating the table. It ignores the protocol as well as www. in this sorting. There is special treatment for some file hosters (namely:, (when tricked into using a filename in the URL), and in that the file ID on those is removed from the URL before sorting such that the URLs are sorted by filename instead. If two entries have the same such post-processed URL, they are next sorted by the label, then by the full URL, then by the note, and finally by the full line as entered on the list page. Duplicate lines are removed.


You can add a label to a URL, which will be displayed instead of the URL in the table. Note that the entries will still be sorted according to the URL.

Usage (on the list page): | label = Example page

would cause the link to be rendered as:

Example page


If you want to further divide the tracking page to avoid huge, unmanageable tables and lists, you can use sections. Simply use sections on the list page – the section level is ignored entirely by the bot –, then refer to them using <!-- bot:Section name --> on the tracking page. The closing tag stays the same, <!-- /bot -->. The <!-- bot --> tag can be used to refer to anything that appears before the first section on the list page.

Note that the section names refer to the list page's sections; the sections on the tracking page can be titled, ordered, and nested differently and are irrelevant for the bot.

For example, on "ArchiveBot/Section example":

<!-- bot --><!-- /bot -->

== A section ==
<!-- bot:Part two --><!-- /bot -->

=== A subsection ===
<!-- bot:Part one --><!-- /bot -->


And on "ArchiveBot/Section example/list":

== Part one ==

== Part two ==


The bot also supports notes. Using a note on any entry within a list page section causes the bot to add an extra column to the table that contains the note. This is useful for example when listing social media profiles: the main URL would be the URL (so that the bot can detect it was saved, since this is what's fed into ArchiveBot), and the note field can contain the direct link to the relevant profile.

Usage (on the list page): | note = Something | note =


  • Some URLs are never detected as saved by the bot even though they were saved. This is mostly due to bugs or missing features in the ArchiveBot viewer.
  • It can take a while until pages are (re-)processed by the bot. Usually, it should happen once per day.


The bot's code is on GitHub.