Skip navigation

Data Is Plural

From Congressional Votes to Coconut Thumps

This week’s roundup of notable datasets

Illustration of an open envelope, with arrows coming out from within. The arrows are pointing to various spreadsheets. Behind the spreadsheets are data visualizations, clouds and strings of numbers.
Gabriel Hongsdusit

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated March 1, 2023, has been republished with permission of the author.

Congressional votes and ideology. The Voteview project “allows users to view every congressional roll call vote in American history” and places those votes in the context of ideology estimates along a liberal-to-conservative spectrum. The core estimates come from DW-NOMINATE, a method developed by the project’s directors emeritus, Keith T. Poole and Howard Rosenthal. Voteview’s bulk data includes ideology estimates for every member of the House and Senate since 1789, every vote taken in either chamber, and every member’s position on those votes. [h/t Philip Bump]

EPA-regulated facilities. The U.S. Environmental Protection Agency’s Facility Registry System “provides Internet access to a single source of comprehensive information about facilities, sites or places subject to environmental regulations or of environmental interest.” It includes each entity’s name, type, location, industry, regulatory programs, and more. That information, which spans millions of facilities, is “subjected to rigorous verification and data management quality assurance procedures.” The records also provide facilities’ ID numbers from other EPA systems, such as the agency’s Risk Management Program database featured in last week’s edition. [h/t Michael Allen]

Programming languages. PLDB is a database that describes several thousand programming languages, file formats, communications protocols, and other related concepts. Its downloads, available in several formats, provide information on the languages’ year announced, technical features, creators, countries and communities of origin, relevant books and URLs, popularity metrics, and more. [h/t Derek M. Jones]

20th-century occupations. Between 1939 and 1991, the U.S. government published several iterations of the now-discontinued Dictionary of Occupational Titles, a precursor to the O*NET database (DIP 2017.09.27). The dictionaries included job descriptions, classification codes, and cross references but are mostly available only as scans. So Shahad Althobaiti et al. organized the manual transcription of five major editions into structured text files. A random sample of 1939’s titles: punch-press operator, seam dampener, base brander, box pleater, and necktie finisher.

Thump, thump, coconut. “Traditionally,” in the Philippines, “coconuts are classified into their maturity levels manually,” June Anne Caladcad and Eduardo Piedad Jr. write. “Traders often use their fingernails, knuckles, or the blunt end of the knife to tap the coconuts before assessing the sounds produced.” The authors and their colleagues have developed hardware and software to emulate that process, and used it to collect acoustic signal data from 129 premature, mature, and overmature coconuts, each mechanically knocked on each of its three ridges.


Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now