Skip navigation

Data Is Plural

Banned Books, Mass Expulsions, and Saturday Night Live

This week's roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated July 6, 2022, has been republished with permission of the author.

Banned and challenged books. A recent report from PEN America identified 1,500-plus decisions, made between July 2021 and March 2022, to ban books from U.S. classrooms and school libraries. A spreadsheet accompanying the report lists each decision’s date, type, state, and school district, as well as each banned book’s title, authors, illustrators, and translators. Related: Independent researcher Tasslyn Magnusson, in partnership with EveryLibrary, maintains a spreadsheet of both book bans and book challenges, with 3,000-plus entries since the 2021–22 school year. [h/t Gary Price]

Mass expulsions. Political scientist Meghan M. Garrity’s Government-Sponsored Mass Expulsion dataset focuses on “policies in which governments systematically remove ethnic, racial, religious or national groups, en masse.” Using a combination of archival research and secondary sources, Garrity documents 139 such events, estimated to have expelled more than 30 million people between 1900 and 2020. For each expulsion the dataset provides “information on the expelling country, onset, duration, region, scale, category of persons expelled, and frequency.” To download it, visit the Journal of Peace Research’s replication data portal and search for “mass expulsion.”

European air traffic. Eurocontrol, the main organization coordinating Europe’s air traffic management, publishes an “aviation intelligence portal” with a range of industry metrics, including traffic reports that count the daily number of flights by country, by airport, and by operator. The portal also offers bulk datasets on topics such as airport traffic, flight efficiency, estimated CO2 emissions, and more. [h/t Giuseppe Sollazzo]

Shakespeare. The Folger Shakespeare “brings you the complete works of the world’s greatest playwright, edited for modern readers.” Its digital editions of the Bard’s plays and poems are available to read online and to download in various file formats. It also provides an API, with endpoints for synopses, roles, monologues, word frequency, and more. [h/t Cameron Armstrong]

Saturday Night Live. Joel Navaroli’s snlarchives.net aims to catalog and cross-reference every episode, cast member, host, character, sketch, impression, and other aspects of Saturday Night Live’s 47-and-counting seasons. An open-source project by Hendrik Hilleckes and Colin Morris scrapes much of that information into structured data files. As seen in: Morris’s 2017 analysis of gender representation in SNL sketches.


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now