Skip navigation

Data Is Plural

Wildfires, Hospital Prices, Startups, and Sharks

This week's roundup of notable data

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated July 27, 2022, has been republished with permission of the author.

Wildfires around the world. The Global Wildfire Information System, expanding on the work of European Forest Fire Information System, uses satellite data to provide weekly and annual estimates of the number of fires and area burned in 200-plus countries. Its bulk data indicates monthly burned hectares by country, sub-country unit, and land type from 2002 to 2019, as well as the boundaries of individual fires from 2001 to 2020. It also publishes gridded spatial data relating to fire danger forecasts, active fires, emissions, and more. As seen in: El Diario’s analysis of forest fires in Spain. [h/t Olaya Argüeso Pérez]

Hospital price lists. Since January 2021, the U.S. government has required hospitals to publish machine-readable files listing the standard charges for all items and services they provide. But there’s no standard format for these price lists (also known as “chargemasters”), no official central repository of them, and compliance has been lacking. Seeing those problems, the versioned-data platform DoltHub earlier this year ran a paid crowdsourcing campaign that pulled nearly 300 million prices from the published lists of roughly 1,800 hospitals into a single database. Related: Thanks to an earlier price transparency rule, California posts chargemasters for hundreds of hospitals, with records going back to 2011.

Monkeypox strains. Nextstrain, “an open-source project to harness the scientific and public health potential of pathogen genome data,” has begun analyzing genetic sequences from hundreds of monkeypox virus samples, the vast majority from infections in the past few months. The project provides metadata on each sample, including the date, country, variant, and mutation metrics, as well as detailed sequencing data from NCBI Virus. Previously: Coronavirus variant data from outbreak.info (DIP 2021.03.10). [h/t Karsten Johansson]

Startup factories. Venture studios are firms that build and launch startups. Jim Moran’s Venture Studio Index tracks 260-plus of them, plus 1,200-plus of the startups they’ve launched. The dataset, “collected manually by a team of researchers familiar with venture capital and the technology startup ecosystem,” includes founding years, locations, employee counts, relevant URLs, and more.

Shark bites. Madeline Riley et al. describe the Australian Shark-Incident Database, which contains details about 1,100-plus shark bites (and attempted shark bites) between 1791 and early 2022, gathered by the Taronga Conservation Society using “questionnaires provided to shark-bite victims or witnesses, media reports,” and information from state agencies. Read more: “New dataset shows shark bites in Australia are increasing and researchers want to know why” (The Guardian).


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now