Skip navigation

Data Is Plural

Flu, Jewish Texts, Women’s College Basketball, and Mastodon Membership

This week’s roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated Dec. 14, 2022, has been republished with permission of the author.

Flu trends. The CDC’s Influenza Division collaborates with state and local health departments, hospitals, laboratories, and other partners to keep tabs on flu trends. Its Weekly U.S. Influenza Surveillance Report tracks case counts, positivity rates, strain distribution, and other metrics that you can explore and download through an interactive dashboard. The records go back to the 1997–98 flu season and are available at a national, regional, and state level. Read more: “The US has never recorded this many positive flu tests in one week” (Vox). [h/t Jay Arthur]

University enrollments. Elizabeth Buckner’s Global Longitudinal University Enrollment Dataset “compiles and estimates institution-level enrollment data on universities worldwide from 1950 to 2020” at five-year intervals—more than 17,000 institutions in all, across 180-plus countries. In addition to enrollment figures, the dataset “includes a number of other useful variables on institutional characteristics, merged from various sources, including sector (i.e., public/private), founding year, and whether the institution is PhD granting or not.” It also forms the basis of an accompanying, country-level dataset.

Jewish texts. Sefaria, a nonprofit co-founded a decade ago by author Joshua Foer and engineer Brett Lockspeiser, is “assembling a free living library of Jewish texts and their interconnections, in Hebrew and in translation.” Those texts include the Torah itself, plus rabbinic scholarship, legal works, prayer books, historical dictionaries, and more. In all, the project contains more than 300 million words and has generated three million intertextual links between them. The initiative provides its data via an API and bulk download, and its code is open source. Read more: “The quest to put the Talmud online” (Washington Post, 2018). [h/t Avi Levin]

Women’s college basketball rosters. Students in Derek Willis’s “Sports Data Analysis & Visualization” course at the University of Maryland’s journalism school have assembled data on 13,000-plus players on women’s college basketball teams, sourced from 900-plus rosters for the 2022–23 NCAA season. Their main dataset lists each player’s name, team, position, jersey number, height, year, hometown, high school, and more.

Mastodon membership. The open-source website instances.social tracks 16,000-plus servers running Mastodon, the most prominent of the decentralized social networks seen as alternatives to Twitter. It collects each server’s domain, name, description, user count, status count, and more. Since late November, Simon Willison has been creating a longitudinal record of the site’s directory and charting the overall trend.


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now