Mozilla Addons

From Archiveteam
Jump to: navigation, search
Mozilla Addons
Mozilla Addons logo
Amo screenshot 2018-08-22.png
URL https://addons.mozilla.org/
Project status Special case
Archiving status Saved! (addon files)
In progress... (website)
Project source Unknown
Project tracker Unknown
IRC channel #outofammo
Project lead User:JustAnotherArchivist

Mozilla Addons, also known as AMO (from its domain, addons.mozilla.org), is a website run by the Mozilla Foundation which hosts extensions and themes for Firefox, Thunderbird, and other Mozilla software.

Extensions used to be based on XPI until the introduction of WebExtensions around 2016. Since Firefox 57 and Thunderbird 58, only WebExtensions are supported. XPI-based addons (called "legacy") are deprecated but still supported until the end-of-life of Firefox 52 ESR in September 2018. The legacy addons will be removed from AMO in early October 2018[1][2].

Website structure

As of September 2018, there are two different versions of AMO: the old version, called "classic desktop" on the website, and a redesigned new site. The two mostly serve the same content; the most important difference is that the new site does not serve user profile pages for non-developers while the old site does. The switching between the two sites happens through a cookie called mamo (modern AMO?); when it is set to off, the old site is served; when it's on or unset, the new site is served.

AMO uses numeric IDs and slugs for addon identification. (GUIDs are also used, but only in the API and internally in Firefox.) These IDs are shared with Thunderbird and Seamonkey addons, which used to be hosted on AMO but have since been moved to addons.thunderbird.net (which only exists in the "old" form; there is a "view the new site" link in the footer, but it doesn't have any effect as of 2018-09-30).

To track addon installations, AMO uses a src parameter everywhere on the site. There are at least 59 possible values for this parameter[3].

Addon download links have the general format https://addons.mozilla.org/firefox/downloads/file/$FILEID/$FILENAME?src=$SRC. Note that file IDs are separate from addon and version IDs. The filename typically contains the slug and a version identifier. When AMO detects that you're using a version of Firefox that is incompatible with an addon, it displays a "download anyway" link, which in additiona contains a type:attachment path segment between the file ID and the filename (i.e. .../file/$FILEID/type:attachment/$FILENAME...). All download URLs redirect to a CDN at addons.cdn.mozilla.net; the type:attachment is also reflected in that CDN URL as _attachments (which then inserts a Content-Disposition header); the src parameter is not included in the redirect target.

Besides the actual addon files, AMO also hosts preview screenshots, reviews, version history (including changelogs), statistics, and in some cases additional pages (e.g. privacy policy) for each addon. The review page only displays the most recent review of any particular user, and one needs to follow an extra link to discover a user's earlier reviews for the same addon.

Note that AMO does not only host extensions but also themes. These consist simply of a JSON object which provides the URLs for the relevant images and some additional settings (e.g. text colour), i.e. there is no real download for them.

The AMO API versions 3 and 4 are documented here and here, respectively.

Utilities

  • amo-links-getter: Both Wget and the Warrior are ineffective in downloading the site completely (besides there are many redundant links that are not taken into account as redirects causing the same content to be downloaded several times). This is a set of scripts that store all the links in a SQLite database to be downloaded later.

Archival

  • There were two (proper) attempts to archive AMO through ArchiveBot. job:4aa66jgox1pg1gp6gxzkgthiq ran from 2017-08-29 until early December 2017, and job:xew9sjj59osltx5oyjr6n9rg was started on 2018-07-29 and vanished sometime in August 2018.
  • All addon files (both from AMO for Firefox/Firefox Android and from addons.thunderbird.net for Thunderbird/Seamonkey) were downloaded by User:JustAnotherArchivist between 2018-09-14 and 2018-09-16.
  • The amo-links-getter list linked above was downloaded through ArchiveBot as job:akifc65k7kfhpdhfbveh79v1c (started on 2018-09-30, finished on 2018-10-07).
  • The old, "classic desktop" AMO website – minus downloads and src parameter variations, but including version history, reviews, and API data – was grabbed by User:JustAnotherArchivist between 2018-09-30 and 2018-10-13 (see below for details).
  • The src parameter variations and downloads are being grabbed by User:JustAnotherArchivist since 2018-10-15 (see below for details).
  • A wpull grab of the skeleton of the old website (with some special handling of the locale variations in the URLs) is in progress by User:JustAnotherArchivist since 2018-10-15. "Skeleton" here means the categories, tags, etc.; the addons themselves as well as user profiles are excluded.
    • Specifically, case variations of /en-US/<code> are normalised to this capitalisation. There is some bug in AMO which leads to links using <code>/en-us/, /eN-uS/, etc. Unfortunately, this means that some links will be broken, but that's unavoidable without retrieving the entire site 16 times...
    • Any URLs with a path starting with /en-US/(firefox|android)/(addon|user)/ or /(firefox|android)/downloads/ as well as all locales other than en-US (af, ar, bg, ...) are ignored.
    • In the search, combinations between the filters on the left or with the sorting are ignored.
  • A warrior project for the website was in the works (repository) but never active.

JustAnotherArchivist's website grab, part 1

General notes:

  • Any URL starting with https://addons.mozilla.org/en-US/firefox/addon/ADDONID/ redirects to a URL using the slug instead. Only the ADDONID URLs are listed below for brevity, but of course the redirect target with the slug was also grabbed in all cases.
  • For all API resources, both the v3 and the v4 version was retrieved, but only the v3 URL is given below for brevity. Unless otherwise noted, you can simply replace v3 with v4 in those URLs to get the v4 URL.

For all addon IDs between 0 and 1009999 (largest existing ID as of 2018-10-13 is 1003947), these URLs are covered:

Furthermore, during the relevant stages above (addon page, "more", addon detail API endpoint, and reviews pages and API endpoints), usernames were extracted, and the user profiles were afterwards retrieved as well:

JustAnotherArchivist's website grab, part 2

This grab covers the variations of the src URL parameter on the addon page and the downloads themselves with that parameter. It again operates on addon IDs. It also covers collections.

src variations and downloads

  • For each addon ID, it's checked whether the addon needs to be processed in this way. This could've been integrated into part 1, but it's tricky and time-consuming to do these checks after the fact, so we simply reretrieve the API addon detail endpoint. Inexistent and theme addons are skipped; note that themes do not use the src tracking parameter since their installation works very differently and there are no downloadable files either, so everything below is unnecessary for them.
  • For each variation of src, https://addons.mozilla.org/en-US/firefox/addon/ADDONID/?src=SRC is retrieved. SRC is empty or one of the 58 values listed in the documentation with the exception of collection and version-history; the former is handled below, and the latter is only used on the version history page but not on links to the addon page. (version-history is implicitly handled below.)
  • The version history page(s) are retrieved as described in part 1: https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/[?page=N]
  • From all of the above pages, download links are collected. There are a few different formats:
    • https://addons.mozilla.org/firefox/downloads/latest/SLUG/addon-ADDONID-latest.EXT?src=SRC – this is used by the install button at the top of the addon page and also on other pages (e.g. category listings).
    • https://addons.mozilla.org/firefox/downloads/file/FILEID/FILE.EXT?src=SRC – this appears in the version information at the bottom of the addon page and in the version history.
    • For both of these formats, there exist also URLs containing a type:attachment path segment. These are "download anyway" links for when a browser is incompatible with an addon version.
    • All four URLs are actually redirects to the CDN; the src parameter is fortunately not passed on to the CDN, so only two requests to the CDN (for the presence and absence of type:attachment) are necessary. The file is identical in both cases; the only difference is a Content-Disposition header to force a download.

Collections

Collection retrieval operates on users and is based on the users discovered in part 1 (i.e. covers all addon developers and reviewers).

References