From Women’s Well-Being to Radiation-Contaminated Waste to English Football

Data Is Plural is a weekly newsletter of useful/curious datasets. This edition, dated Feb. 1, 2023, has been republished with permission of the author.

In last week’s newsletter, I noted that the Coast Guard’s list of boat recalls “seems possible to scrape.” Reader Michael Nolan took up the challenge; here’s the dataset he extracted. Thanks, Michael!

Women’s well-being. Camille Belmin et al.’s LivWell dataset presents “a set of key indicators on women’s socio-economic status, health and well-being, access to basic services and demographic outcomes” in 447 regions of 52 countries from 1990 to 2019. The indicators include, for example, rates of home ownership, educational attainment, and domestic violence; they’re based primarily on data from the Demographic and Health Surveys Program, a USAID-funded initiative that, since 1984 “has provided technical assistance to more than 400 surveys in over 90 countries, advancing global understanding of health and population trends in developing countries.” Read more: An introductory Twitter thread from Belmin.

Radiation-contaminated waste. The U.S. Nuclear Regulatory Commission regulates the disposal of “low-level” radioactive waste—“items that have become contaminated with radioactive material or have become radioactive through exposure to neutron radiation,” such as protective equipment and cleaning supplies. The NRC provides annual statistics (facility, volume, and total curies) for the country’s four active disposal sites. The Department of Energy’s Manifest Information Management System provides more detailed figures, with breakdowns by month, state of origin, waste classification, isotope, and more. Previously: Data from the Nuclear Fuel Data Survey (DIP 2022.11.23).

Novel dialogue. Krishnapriya Vishnubhotla et al.’s Project Dialogism Novel Corpus contains every quotation from 22 novels, plus who speaks each line, who they’re addressing, the characters they mention, and more. With 35,000-plus quotations, the corpus “is by an order of magnitude the largest dataset of annotated quotations for literary texts in English.” Jane Austen is the most-represented author (five novels), followed by E.M. Forster (two). The researchers have also published a document that they “hope will help standardize future annotation work in this domain.”

Browser capabilities. Alexis Deveria’s caniuse.com indicates which versions of which web browsers support various web technologies, such as CSS’s grid layout, the WebP image format, and the Image Capture API. The project’s dataset covers 530-plus technologies and 19 browsers (six desktop, 13 mobile). It also provides estimates of the percentage of all users whose browsers support a given technology. [h/t Simon Willison]

More English football. Josh Fjelstul’s English Football Database “is a comprehensive database of football matches played in the Premier League and the English Football League from the inaugural season of the Football League (1888–89) through the most recent season (2021–22).” It records each season, team, match, and the standings table at the end of each season. Previously: James P. Curley’s English soccer dataset (DIP 2016.05.04), since expanded to leagues in a dozen more countries. [h/t Derek M. Jones]

Notice: Unlike most of our content, this edition of Data Is Plural by Jeremy Singer-Vine is not available for republication under a Creative Commons license.

From Women’s Well-Being to Radiation-Contaminated Waste to English Football

We’re happy to make this story available to republish for free under the conditions of an Attribution–NonCommercial–No Derivatives Creative Commons license. Please adhere to the following:

Notify us: Please email us at republish@themarkup.org to let us know if you’ve republished the story.
Give prominent credit to The Markup and its journalists: Credit our authors at the top of the article and any other byline areas of your publication.
Do not edit the article: The complete, unaltered article text must be published. If you wish to translate the article, please contact us for approval.
Access to the article must remain free: Do not sell access to this article or place it behind a paywall, but you can republish our articles on sites with ads.
Images may not be available for republication: Not all of the imagery used in articles published on our site are licensed under Creative Commons. Some images are from commercial providers who do not allow their images to be republished without permission or payment. If you wish to use any image from our articles, email us at republish@themarkup.org for guidance.
Use the provided HTML to republish this article on your site: Simply copy the HTML that we have provided and publish it as is on your website. The provided HTML snippet includes all text formatting and hyperlinks, the author byline, and credit to The Markup. If the HTML code for The Markup credit image is incompatible with your CMS, let us know if you remove it.

From Women’s Well-Being to Radiation-Contaminated Waste to English Football

From Women’s Well-Being to Radiation-Contaminated Waste to English Football

Share This Article

From Women’s Well-Being to Radiation-Contaminated Waste to English Football

The Latest

Kaiser Permanente nurses say technology is making their jobs — and patient care — worse

Californians can protect their personal data with one click. Help us test if it works

Your medical provider might be recording your mental health care visits