Skip navigation

Data Is Plural

From 21st-Century Elections to Ancient Rome

This week's roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated June 8, 2022, has been republished with permission of the author.

Six decades of House primaries. In 2014, Stephen Pettigrew, Karen Owen, and Emily Wanless published a dataset of all Democratic and Republican primary election results for the U.S. House of Representatives between 1956 and 2010. It indicates each election’s year, state, redistricting status, primary system (open, closed, semi-open, multiparty), and more. The dataset also lists each candidate’s name, gender, prior office, and votes received. In 2020, Michael G. Miller and Nicki Camberg published a follow-up dataset, adding coverage for 2012 through 2018. It uses the same variable names and structure as the earlier dataset so that the two files can be easily combined.

Where college grads go. Johnathan Conzelmann et al. have created a dataset that estimates the geographic distribution of recent graduates from 2,600 U.S. colleges and universities, calculated from information on the schools’ official LinkedIn landing pages. For each institution, the dataset indicates the proportions of alumni in each of the 278 specific U.S. locations in LinkedIn’s geographic lexicon and cross-references them with government-defined metropolitan and micropolitan statistical areas. Read more: An introductory Twitter thread. [h/t Sharon Machlis]

Hong Kong political prisoners. The Hong Kong Democracy Council, a U.S.-based advocacy group, last month published the first version of its Hong Kong Political Prisoners Database, which contains information about 1,000-plus protesters, opposition leaders, and national security law defendants incarcerated since the city’s pro-democracy mass protests in mid-2019. It lists each defendant’s age, arrest date, arrest location, conviction date, convicted offenses, sentencing date, sentence length, and other details. An accompanying report describes the database’s context and methodology. [h/t Samuel Bickett]

Mercenaries. Ulrich Petersohn et al.’s Commercial Military Actor Database examines “the market for force” in 72 countries from 1980 to 2016. It contains information, primarily sourced from news reports, on thousands of contractual relationships between providers (mercenaries and private military/security companies) and their clients (governments, opposition groups, NGOs, and transnational corporations). The contracted work ranges “from combat services and support services (e.g., communication, maintenance), to logistics, security, consultancy, training, and reconstruction.”

Roman amphitheaters. Sebastian Heath, a professor of computational humanities and Roman archaeology, has constructed a dataset of 260-plus amphitheaters in the Roman Empire. It provides the structures’ known names, coordinates, orientations, and capacities, among other characteristics, and links the entries to external data sources.


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

How did we do that? It was thanks to you.

Reader support is an essential piece of The Markup equation. Your gift lets us report the stories that help to build a better future. Give today.

Donate Now