Shine a light on tech’s hidden impacts. Triple your donation.
Skip navigation

Data Is Plural

From Local Public Meetings to Milan Drinking Fountains

This week’s roundup of notable datasets

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated March 29, 2023, has been republished with permission of the author.  

This will be the last edition The Markup republishes, but please continue to read here.

Also, Jeremy has launched a new Data Is Plural podcast! You can listen to the whole season of crisp 15-minute episodes here.


Local public meetings. LocalView, developed by Soubhik Barari and Tyler Simko, “is the largest dataset of local government public meetings—the central policy-making process in American local government—as they are captured on video.” In a recent paper, the authors describe how they built the dataset, which is based on 130,000-plus YouTube-hosted videos of such meetings in more than 1,000 U.S. cities and counties, covering the years 2006 to 2022. The dataset lists each meeting’s date, jurisdiction, and government body (e.g., municipal council, school board, etc.), plus the video’s ID, title, channel, transcript, and more. [h/t Chris Goodman]

Health workers. The World Health Organization’s Global Health Workforce Statistics database presents annual national estimates of the number of medical doctors, nursing and midwifery personnel, community health workers, and several other types of health workers. The estimates come from the WHO’s National Health Workforce Accounts system, national censuses, labor force surveys, and other sources. For medical doctors, the estimates span nearly 200 countries, with the majority having estimates as recent as 2020 or 2021. Twenty countries’ estimates go back to the 1960s (and, for Spain, all the way back to 1952). [h/t Datasketch]

Kremlin posts. Giorgio Comai’s “Text As Data & Data in the Textproject “aims at facilitating structured analysis of on-line contents related to conflicts in the post-Soviet space by providing easier access to relevant datasets and tools.” Those datasets include the URL, title, text, date, and other metadata of all posts published on the Kremlin’s English-language website since the year 2000; on the Kremlin’s Russian-language website; and by “Zavtra” since late 1996. Previously: Foreign ministry statements (DIP 2022.03.09). [h/t EDJNet]

Carbon capture projects. The National Energy Technology Laboratory maintains a map and dataset of carbon capture and storage projects “active, proposed, and terminated” in 30-plus countries since the 1970s. The latest version of the dataset includes 400-plus entries, slightly more than are on the map. It lists each project’s name, company, location, date, type, scope, magnitude, status, technology, cost, summary, and other details. As seen in: “What is carbon capture and storage? Where is it happening in the US?” (USAFacts).

Milan drinking fountains. Milan’s government publishes a dataset of 600-plus local vedovelle, the distinctive (green, cast iron, dragon-headed) drinking fountains that dot the city. The dataset provides each fountain’s coordinates, municipal zone, and neighborhood. As seen in: “Tutto sulle fontanelle di Milano,” with a map of the locations, by Il Post’s Isaia Invernizzi.


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now