How We Determined Crime Prediction Software Disproportionately Targeted Low-Income, Black, and Latino Neighborhoods

1. Introduction

The expansion of digital record keeping by police departments across the U.S. in the 1990s ushered in the era of data-driven policing. Huge metropolises like New York City crunched reams of crime and arrest data to find and target “hot spots” for extra policing. Researchers at the time found that this reduced crime without necessarily displacing it to other parts of the city—although some of the tactics used, such as stop-and-frisk, were ultimately criticized by a federal judge, among others, as civil rights abuses.

The next development in data-informed policing was ripped from the pages of science fiction: software that promised to take a jumble of local crime data and spit out accurate forecasts of where criminals are likely to strike next, promising to stop crime in its tracks. One of the first, and reportedly most widely used, is PredPol, its name an amalgamation of the words “predictive policing.” The software, derived from an algorithm used to predict earthquake aftershocks, was developed by professors at UCLA and released in 2011. By sending officers to patrol these algorithmically predicted hot spots, these programs promise they will deter illegal behavior.

But law enforcement critics had their own prediction: that the algorithms would send cops to patrol the same neighborhoods they say police always have, those populated by people of color. Because the software relies on past crime data, they said, it would reproduce police departments’ ingrained patterns and perpetuate racial injustice, covering it with a veneer of objective, data-driven science.

PredPol has repeatedly said those criticisms are off-base. The algorithm doesn’t incorporate race data, which, the company says, “eliminates the possibility for privacy or civil rights violations seen with other intelligence-led or predictive policing models.”

There have been few independent, empirical reviews of predictive policing software, because the companies that make these programs have not publicly released their raw data.

A seminal, data-driven study about PredPol published in 2016 did not involve actual predictions. Rather the researchers, Kristian Lum and William Isaac, fed drug crime data from Oakland, Calif., into PredPol’s open-source algorithm to see what it would predict. They found that it would have disproportionately targeted Black and Latino neighborhoods, despite survey data that shows people of all races use drugs at similar rates.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

PredPol’s founders conducted their own research two years later using Los Angeles data and said they found the overall rate of arrests for people of color was about the same whether PredPol software or human police analysts made the crime hot spot predictions. Their point was that their software was not worse in terms of arrests for people of color than nonalgorithmic policing.

However, a study published in 2018 by a team of researchers led by one of PredPol’s founders showed that Indianapolis’s Latino population would have endured “from 200% to 400% the amount of patrol as white populations” had it been deployed there, and its Black population would have been subjected to “150% to 250% the amount of patrol compared to white populations.” The researchers said they found a way to tweak the algorithm to reduce that disproportion but that it would result in less accurate predictions—though they said it would still be “potentially more accurate” than human predictions.

In written responses to our questions, the company’s CEO said the company did not change its algorithm in response to that research because the alternate version would “reduce the protection provided to vulnerable neighborhoods with the highest victimization rates.” He also said the company did not provide the study to its law enforcement clients because it “was an academic study conducted independently of PredPol.”

Other predictive police programs have also come under scrutiny. In 2017, the Chicago Sun-Times obtained a database of the city’s Strategic Subject List, which used an algorithm to identify people at risk of becoming victims or perpetrators of violent, gun-related crime. The newspaper reported that 85 percent of people that the algorithm saddled with the highest risk scores were Black men—some with no violent criminal record whatsoever.

Last year, the Tampa Bay Times published an investigation analyzing the list of people that were forecast to commit future crimes by the Pasco Sheriff’s Office’s predictive tools. Deputies were dispatched to check on people on the list more than 12,500 times. The newspaper reported that at least one in 10 of the people on the list were minors, and many of those young people had only one or two prior arrests yet were subjected to thousands of checks.

For our analysis, we obtained a trove of PredPol crime prediction data that has never before been released by PredPol for unaffiliated academic or journalistic analysis. Gizmodo found it exposed on the open web (the portal is now secured) and downloaded more than seven million PredPol crime predictions for dozens of American cities and some overseas locations between 2018 and 2021.

This makes our investigation the first independent effort to examine actual PredPol crime predictions in cities around the country, bringing quantitative facts to the debate about predictive policing and whether it eliminates or perpetuates racial and ethnic bias.

We examined predictions in 38 cities and counties crisscrossing the country, from Fresno, Calif., to Niles, Ill., to Orange County, Fla., to Piscataway, N.J. We supplemented our inquiry with Census data, including racial and ethnic identities and household income of people living in each jurisdiction—both in areas that the algorithm targeted for enforcement and those it did not target.

Overall, we found that PredPol’s algorithm relentlessly targeted the Census block groups in each jurisdiction that were the most heavily populated by people of color and the poor, particularly those containing public and subsidized housing. The algorithm generated far fewer predictions for block groups with more White residents.

Analyzing entire jurisdictions, we observed that the proportion of Black and Latino residents was higher in the most-targeted block groups and lower in the least-targeted block groups (about 10 percent of which had zero predictions) compared to the overall jurisdiction. We also observed the opposite trend for the White population: The least-targeted block groups contained a higher proportion of White residents than the jurisdiction overall, and the most-targeted block groups contained a lower proportion.

For more than half (20) of the jurisdictions in our data, the majority of White residents lived in block groups that were targeted less than the median or not at all. The same could only be said for the Black population in four jurisdictions and for the Latino population in seven.

When we ran a statistical analysis, it showed that as the number of crime predictions for block groups increased, the proportion of the Black and Latino populations also increased and the White population decreased.

We also found that PredPol’s predictions often fell disproportionately in places where the poorest residents live. For the majority of jurisdictions (27) in our dataset, a higher proportion of the jurisdiction’s low-income households live in the block groups that were targeted the most. In some jurisdictions, all of its subsidized and public housing is located in block groups PredPol targeted more than the median.

We focused on census block groups, clusters of blocks that generally have a population of between 600 to 3,000 people, because these were the smallest geographic units for which recent race and income data was available at the time of our analysis (2018 American Community Survey).

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

Block groups are larger than the 500-by-500-foot prediction squares that PredPol’s algorithm produces. As a result, the populations in the larger block groups could be different from the prediction squares. To measure the potential impact, we conducted a secondary analysis at the block level using 2010 Census data for blocks whose populations remained relatively stable. (See Limitations for how we define stable.)

We found that in nearly 66 percent of the 131 stable block groups, predictions clustered on the blocks with the most Black or Latino residents inside of those block groups. Zooming in on blocks showed that predictions that appeared to target majority-White block groups had in fact targeted the blocks nestled inside of them where more Black and Latino people lived. This was true for 78 percent of the 46 stable, majority-White block groups in our sample.

To try to determine the effects of PredPol predictions on crime and policing, we filed more than 100 public records requests and compiled a database of more than 600,000 arrests, police stops, and use-of-force incidents. But most agencies refused to give us any data. Only 11 provided at least some of the necessary data.

For the 11 departments that provided arrest data, we found that rates of arrest in predicted areas remained the same whether PredPol predicted a crime that day or not. In other words, we did not find a strong correlation between arrests and predictions. (See the Limitations section for more information about this analysis.)

We do not definitively know how police acted on any individual crime prediction because we were refused that data by nearly every police department. Only one department provided more than a few days’ worth of concurrent data extracted from PredPol that reports when police responded to the predictions, and that data was so sparse as to raise questions about its accuracy.

To determine whether the algorithm’s targeting mirrored existing arrest patterns for each department, we analyzed arrest statistics by race for 29 of the agencies in our data using data from the FBI’s Uniform Crime Reporting (UCR) project. We found that the socioeconomic characteristics of the neighborhoods that the algorithm targeted mirrored existing patterns of disproportionate arrests of people of color.

In 90 percent of the jurisdictions, per capita arrests were higher for Black people than White people—or any other racial group included in the dataset. This is in line with national trends. (See Limitations for more information about UCR data.)

Overall, our analysis suggests that the algorithm, at best, reproduced how officers have been policing, and at worst, would reinforce those patterns if its policing recommendations were followed.

Department	Analysis Start Date	Analysis End Date	Number of Days
Alexandria, La.	4/30/19	1/30/21	641
Birmingham, Ala.	9/1/19	1/30/21	517
Boone County, Ind.	2/24/18	1/30/21	1,071
Calcasieu Parish, La.	4/9/19	1/30/21	662
Clovis, Calif.	5/1/18	4/30/19	364
Cocoa, Fla.	2/15/18	8/31/18	197
Decatur, Ga.	2/21/18	1/30/21	1,074
El Monte, Calif.	2/21/18	1/30/21	1,074
Elgin, Ill.	2/16/18	7/14/20	879
Farmers Branch, Texas	2/21/18	9/29/18	220
Forsyth County, Ga.	12/12/18	1/30/21	780
Fort Myers, Fla.	12/14/19	1/30/21	413
Frederick, Md.	4/1/18	4/8/19	372
Fresno, Calif.	2/15/18	6/1/20	837
Gloucester Township, N.J.	3/22/19	1/30/21	680
Grass Valley, Calif.	8/30/18	9/3/19	369
Haverhill, Mass.	2/22/18	1/30/21	1,073
Homewood, Ala.	2/22/18	1/30/21	1,073
Jacksonville, Texas	2/24/18	10/30/19	613
Jefferson County, Ala.	2/23/18	1/26/21	1,068
Livermore, Calif.	2/16/18	2/1/20	715
Los Angeles, Calif.	2/15/18	4/15/20	790
Merced, Calif.	2/22/18	7/1/20	860
Modesto, Calif.	2/22/18	9/23/20	944
Niles, Ill.	8/24/18	9/23/20	761
Ocala, Fla.	2/24/18	3/1/19	370
Ocoee, Fla.	6/4/19	1/30/21	606
Orange County, Fla.	2/23/18	9/30/20	950
Piscataway, N.J.	10/25/18	1/30/21	828
Plainfield, N.J.	2/24/18	1/1/19	311
Portage, Mich.	6/20/19	1/30/21	590
Salisbury, Md.	2/27/18	1/30/21	1,068
South Jordan, Utah	2/23/18	5/1/20	798
Tacoma, Wash.	2/23/18	3/27/20	763
Temple Terrace, Fla.	2/1/19	1/30/21	729
Tracy, Calif.	2/23/18	1/30/21	1,072
Turlock, Calif.	2/22/18	5/12/20	810
West Springfield Town, Mass.	2/23/18	11/30/19	645

How We Determined Crime Prediction Software Disproportionately Targeted Low-Income, Black, and Latino Neighborhoods

Share This Article

1. Introduction

See our data here.

Report Deeply and Fix Things

Report Deeply and Fix Things

Data Gathering and Preparation

Dates of Analysis, by Department

Prediction Analysis and Findings

Methods

Disparate Impact Analysis

Report Deeply and Fix Things

Race and Ethnicity Analysis

Most- and Least-Targeted Block Groups

In a majority of 38 jurisdictions, more Blacks and Latinos lived in block groups that were most targeted, while more Whites lived in those that were least targeted

Number of jurisdictions where the proportion of each group living in the type of blocks is higher than the city overall

As predictions increased, the proportion of Blacks and Latinos in block groups increased. The opposite was true for Whites

Population of blocks compared with population of overall jurisdiction, average of 38 jurisdictions

Blocks with most predictions (top 5%)

Blocks with median predictions (middle 5%)

Blocks with fewest predictions (bottom 5%)

Block Groups Above and Below the Median

Block-Level Race Analysis

Correlation Between Predictions and Race

As the number of predictions in a block group increased, the Black and Latino proportion of the population increased

Distribution of correlation coefficients for all 38 jurisdictions by race

Race/Ethnicity Composition of Deciles

Neighborhoods with the most predictions had the lowest share of White residents

Proportion of each race/ethnicity in neighborhoods, by prediction volume, averaged across 38 jurisdictions

White

Black

Latino

Asian

Wealth and Poverty Analysis

Most- and Least-Targeted Block Groups

In 30 jurisdictions, the most-targeted block groups had poorer households

Number of jurisdictions where the proportion of each group living in the type of blocks is higher than the city overall

As predictions increased, poorer households increased and wealthy ones decreased

Number of jurisdictions where the proportion of each group living in the type of blocks is higher than the city overall

Blocks with most predictions (top 5%)

Blocks with median predictions (middle 5%)

Blocks with fewest predictions (bottom 5%)

Block Groups Above and Below the Median

Correlation Between Predictions and Income

The proportion of households earning less than $45,000 a year positively correlated with predictions

Distribution of correlation coefficients for all 38 jurisdictions by income

Income Composition of Deciles

As predictions increased, average household income decreased

Proportion of annual household income in neighborhoods grouped by decile from fewest to most predictions, averaged across 38 jurisdictions

Less than $45,000

$75,000 to $100,000

$125,000 to $150,000

Greater than $200,000

Public Housing Analysis

Stops, Arrests, and Use of Force

Stop, Arrest, and Use of Force Analysis

Neighborhoods with the most crime predictions had higher arrest rates

Arrests per capita relative to jurisdiction average

Overall Policing Patterns

Arrest rates tended to be higher for Black people than White people

Per capita arrest rates by race

Limitations

Prediction Data

Classifying Least-Targeted Block Groups

Jurisdictions That Didn’t Follow the Trend

2018 American Community Survey Data

Agency Jurisdictions

Block-Group-Level Data

Measuring Arrest Rates

Correlations between predictions and arrests were weak

Correlation between average number of arrests and average number of predictions

UCR, Arrest, and Use of Force Data

PredPol Response

Law Enforcement Agency Responses

Conclusion

Acknowledgements

The Latest

California colleges spend millions to catch plagiarism and AI. Is the faulty tech worth it?

Turnitin charged colleges vastly different amounts to detect plagarism

House lawmakers demand answers from California health exchange on sending data to LinkedIn