Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated July 20, 2022, has been republished with permission of the author.
New voting laws. The Voting Rights Lab has been tracking 2,000-plus laws proposed in U.S. state legislatures since 2021. The tracker focuses on “12 major issue areas relating to voter access and representation,” such as early voting, same-day registration, and ID requirements. It lists each bill’s state, number, author, date introduced, current status, and issue areas, plus a summary and the lab’s “assessment of whether the legislation is likely to improve or interfere with voter access or the administration of elections.” As seen in: “Has Your State Made It Harder To Vote?” (FiveThirtyEight) Related: States Newsroom’s Kira Lerner has compiled a spreadsheet of 120 new election-related criminal penalties, based partly on the tracker’s data.
Notable people. “A new strand of literature aims at building the most comprehensive and accurate database of notable individuals,” observe Morgane Laouenan et al., who contribute a “cross-verified database of 2.29 million individuals” mined from Wikidata and the English, French, German, Italian, Spanish, Portuguese, and Swedish editions of Wikipedia. For each person, the dataset provides their birth and death dates, gender, citizenship, occupations, and other details. Previously: The MIT-based Pantheon dataset (DIP 2016.02.03), also based on Wikipedia and since updated. [h/t Philip Jung]
Budget apportionments. Congress, through a process called appropriations, chooses how much money goes to each U.S. federal agency and program. But the Office of Management and Budget, through a process called apportionment, ultimately sets the rules for spending those funds, “typically limit[ing] the obligations [an agency] may incur for specified time periods, programs, activities, projects, objects, or any combination thereof.” Those binding decisions have generally not been available to the public—until last week, when OMB launched a database of apportionments for FY 2022, per a requirement in Congress’s 2022 spending bill. [h/t Caitlin Emma]
Digital trade provisions. Mira Burri et al.’s TAPED dataset, which “seeks to comprehensively trace developments in the area of digital trade governance,” categorizes 100-plus relevant aspects of 300-plus preferential trade agreements signed since 2000. The dataset indicates, for instance, that the Peru-Australia Free Trade Agreement contains binding agreements on personal data protection, nonbinding language on cybersecurity, and no provisions regarding net neutrality.
The World Cup. Josh Fjelstul’s World Cup Database, published this month, provides “extensively cleaned and cross-validated” information about each of the 21 FIFA World Cup tournaments played so far. Its 27 tables contain “approximately 1.1 million data points” regarding the teams that participated, their players and managers, the referees, match outcomes, goals, penalties, and more.
Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.