Skip navigation

Data Is Plural

Young Adult Migration and Talking to the Starship Enterprise

This week's roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated Aug. 10, 2022, has been republished with permission of the author.

Young adult migration. Researchers at Harvard University and the Census Bureau have linked federal tax filings, Census records, and other government data to track the migration patterns of young U.S. residents. Specifically, for each person born in the U.S. between 1984 and 1992, the researchers compared where they lived at age 16 to where they lived at age 26. The project’s public dataset counts the approximate number who moved to/from each pair of commuting zones—overall and disaggregated by race/ethnicity and parental income level. Read more: A reporting recipe from Brent Jones and Eric Schmid, who analyzed the data for St. Louis Public Radio.

Social capital. Using data on billions of Facebook connections and group memberships, Raj Chetty et al.’s Social Capital Atlas calculates three metrics for U.S. counties, zip codes, high schools, and colleges: economic connectedness (friendships between low-income and high-income users), cohesiveness (how often users’ friends are also friends with one another), and civic engagement (membership in volunteer groups). Read more: The Upshot explores and explains the project’s findings. Previously: Measurements of social connectedness (DIP 2020.09.30) and economic mobility (DIP 2019.06.12), from some of the same researchers. [h/t Johannes Stroebel]

CPUs and GPUs. Yifan Sun et al., seeking to test Moore’s law and Dennard scaling, “have collected data for all CPU and GPU products (to our best knowledge) that have been released by Intel, AMD … and NVIDIA since January 1st, 2000.” The authors’ dataset and charting tool, describing 4,800-plus processors through early 2021, uses information gathered from TechPowerUp, WikiChip, and company websites. They identify each product’s vendor, release date, transistor count, base frequency, and other details. [h/t matt_d]

Trade in post-unification Italy. The Lost Highway project, a collaboration between researchers at four Italian universities, aims “to test a number of broad historical conjectures about the long term shortcomings of the Italian development path by collecting as much quantitative evidence as possible.” Their Bankit-FTV database provides annual import and export totals for 1862 to 1939, by product and trading partner, with 6,000-plus product descriptions standardized into approximately 600 commodity groupings. [h/t Francesco Piccinelli Casagrande]

“Tea, Earl Grey, hot.” Combing the full transcripts of Star Trek: The Next Generation, Benett Axtell and Cosmin Munteanu found more than 1,000 lines of dialogue between the show’s characters and the Starship Enterprise’s computer. Their dataset of these interactions lists each line’s phrasing, character, interaction type, stage directions, and more. [h/t Christian A. Gebhard + Sara Stoudt + Tidy Tuesday]


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

How did we do that? It was thanks to you.

Reader support is an essential piece of The Markup equation. Your gift lets us report the stories that help to build a better future. Give today.

Donate Now