Quizlet

From Archiveteam
Revision as of 02:45, 21 June 2018 by Adinbied (talk | contribs) (First pass of info entry - the quizlet API and site structure)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Quizlet
Quizlet logo
Quizlet Home Page.png
URL https://quizlet.com/
Project status Online!
Archiving status Not saved yet
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam (on EFnet)
Project lead Unknown

Overview

Quizlet is a mobile and web-based study application that allows students to study information via learning tools and games. It is currently used by 1-in-2 high school students and 1-in-3 college students in the United States. Quizlet trains students via flashcards and various games and tests. As of April 30, 2018, Quizlet has over 200 million user-generated flashcard sets and more than 30 million active users. It now ranks among the top 50 websites in the U.S.

Archival

Quizlet ‘sets’ are incremental, with the earliest public set having the id ‘173’ and one of the more recent sets being above ‘300000000’. They do have an open API (see https://quizlet.com/api/2.0/docs) that returns a JSON copy of each set. An example API result can be seen here. Back of the napkin math shows that 300,000,000 public sets would take about 400 GB to store uncompressed.

Grabbing the Data

As of now, I have been unsuccessful in finding a reliable way to get everything downloaded. The initial python script I wrote to incrementally grab all of the sets via the API and save them as txt files works, but is painfully slow (after a week of running it on three machines, I only got about 3 million downloaded). I have tried multithreading and multiprocessing, but have been unable to get the same amount downloaded using those methods. Maybe someone else might have some more luck.