Skip navigation

Data Is Plural

Greenhouse Gases, Infectious Diseases, and Tinned Fish

This week’s roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated Nov. 16, 2022, has been republished with permission of the author.

Big emitters. Climate TRACE, a nonprofit coalition launched in 2020, uses satellite imagery, sector-specific datasets, and other sources to estimate greenhouse gas emissions in detail. Its most recent inventory, released last week, highlights more than 70,000 individual sites that “represent the top known sources of emissions in the power sector, oil and gas production and refining, shipping, aviation, mining, waste, agriculture, road transportation, and the production of steel, cement, and aluminum.” You can download the data, explore sector- and country-level estimates, and browse a map of the sites. Read more: Coverage in the New York Times. [h/t Ian Johnson]

Disease outbreaks. Juan Armando Torres Munguía et al. have built a dataset of infectious disease outbreaks, based on information extracted from the World Health Organization’s Disease Outbreak News alerts (DIP 2022.03.30) and its coronavirus dashboard. The authors have clustered the outbreaks by disease (classified by ICD-10 and ICD-11 codes), country, and year. Excluding the COVID-19 pandemic, this leads to 1,500-plus total combinations between January 1996 and March 2022, spanning 60-plus diseases and 200-plus countries/territories. [h/t Konstantin M. Wacker]

Permissively licensed code. The Stack, a new dataset from the BigCode project, “contains over 3TB of permissively-licensed source code files covering 30 programming languages crawled from GitHub.” Those terabytes hold more than 300 million files extracted from repositories whose licenses place “minimal restrictions on how the software can be copied, modified, and redistributed.” The dataset provides the contents of each file along with its repository name, path, size, programming language, detected licenses, and several high-level metrics. Read more: An introductory Twitter thread and preprint paper. [h/t Karsten Johansson]

Impact craters. The Earth Impact Database, maintained by the University of New Brunswick’s Planetary and Space Science Centre, catalogs nearly 200 impact craters caused by meteorites that have crashed into the planet. It presents the name, location, diameter, estimated age, geology, and other features of the craters, as well as photographs and bibliographies. Related: Cody Winchester has scraped the crater characteristics into CSV and GeoJSON files.

Tinned fish. Rainbow Tomatoes Garden is a farm in East Greenville, Pa., that also happens to run an online store selling “the largest selection of tinned seafood in the world.” Curator-owner Dan Waber publishes a spreadsheet of the store’s 630-plus offerings, listing each product’s name, type of seafood, brand, country of origin, tin size, and price and whether it’s organic, certified kosher, smoked, boneless, and/or skinless, and more. [h/t George Ho]


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

How did we do that? It was thanks to you.

Reader support is an essential piece of The Markup equation. Your gift lets us report the stories that help to build a better future. Give today.

Donate Now