Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated Oct. 12, 2022, has been republished with permission of the author.
Work-related injury counts. The U.S. Occupational Safety and Health Administration requires many (but not all) businesses to track employees’ work-related injuries and illnesses. Larger companies and those in high-risk industries must electronically submit annual counts to the agency. Thanks to freedom-of-information lawsuits by Reveal and Public Citizen, OSHA began to publish business-level data from those electronic submissions in 2020. The records, which go back to 2016, include each business’s name, location, industry, employee count, and employee hours worked, plus their reported number of deaths, injuries, skin disorders, respiratory conditions, poisonings, hearing loss, and other illnesses.
U.S. hydrography. The National Hydrography Dataset, maintained by the U.S. Geological Survey, “represents the water drainage network of the United States with features such as rivers, streams, canals, lakes, ponds, coastline, dams, and streamgages.” You can download the NHD geospatial files by hydrologic unit or state, or for the entire nation. Related: A dataset of waterfalls and rapids in the contiguous U.S., linked to the NHD and sourced partly from Bryan Swan and Dean Goss’s World Waterfall Database. [h/t Malcolm Tunnell + Christopher Ingraham]
Rebel leaders. Benjamin Acosta et al.’s Rebel Organization Leaders Database “provides a wide range of biographical information on all top rebel, insurgent, and terrorist leaders who were active in civil wars between 1980 and 2011.” It includes each leader’s name, gender, education, religion, languages spoken, number of children, years in role, country fought against, cause of death, and much more. The database covers 425 individuals fighting against 80-plus countries; the project also features written profiles for a sample of them.
File formats. The U.S. National Archives’ Digital Preservation Framework describes the agency’s risk assessments and recommended preservation plans for more than 600 file formats. The framework’s documentation places each format into one of 16 categories, such as “digital audio,” “spreadsheets,” “navigational charts,” and “software and code.” In August, the agency added “linked open data” representations of the plans for each format. [h/t Elizabeth England]
Wine economics. Researchers at the University of Adelaide’s Wine Economics Research Centre have compiled several longitudinal datasets. One, for example, quantifies the total area devoted to growing each grape variety in each country, 1960–2016. Another compiles various market statistics (e.g., national wine production, imports, exports) going back to 1835. Related: The International Organisation of Vine and Wine maintains a database of global and national statistics going back to 1995. As seen in: Jack Zhao’s exploration of the Adelaide data.
Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.