What words would you use to describe yourself? You might say you’re a dog owner, a parent, that you like Taylor Swift, or that you’re into knitting. If you feel like sharing, you might say you have a sunny personality or that you follow a certain religion.
See the full data here.
If you spend any time online, you probably have some idea that the digital ad industry is constantly collecting data about you, including a lot of personal information, and sorting you into specialized categories so you’re more likely to buy the things they advertise to you. But in a rare look at just how deep—and weird—the rabbit hole of targeted advertising gets, The Markup has analyzed a database of 650,000 of these audience segments, newly unearthed on the website of Microsoft’s ad platform Xandr. The trove of data indicates that advertisers could also target people based on sensitive information like being “heavy purchasers” of pregnancy test kits, having an interest in brain tumors, being prone to depression, visiting places of worship, or feeling “easily deflated” or that they “get a raw deal out of life.”
Many of the Xandr ad categories are more prosaic, classifying people as “Affluent Millennials,” for example, or as “Dunkin Donuts Visitors.” Industry critics have raised questions about the accuracy of this type of targeting. And the practice of slicing and dicing audiences for advertisers is an old one.
But the exposure of a collection of audience segments this size offers consumers an unusual look at how they and their families are packaged, described, and categorized by ad companies.
Because the segments also include the names of the companies involved in creating them, they also shed light on how disparate pools of personal data—collected by tracking people’s online activity and real-world movements—are combined into bespoke, branded groups of potential ad viewers that can be marketed to publishers and advertisers.
How do advertisers think of you?
In a newly discovered database on the Xandr ad platform’s website, The Markup found thousands of rows of data that indicate sensitive consumer groupings known as “audience segments.”
Select a search term to see a sample of results using this ad term. Or, search the data for any keyword.
Note: Some segments from data suppliers Grapeshot, Peer39, and others found in this dataset are “negative keyword” or “brand protection” segments that are used to control when an ad should not appear, rather than “audience segments,” which are lists of IDs associated with consumers.
Showing 20 of 20 results.
“I think it’s the largest piece of evidence I’ve ever seen that provides information about what I call today’s “distributed surveillance economy,” said Wolfie Christl, a privacy researcher at Cracked Labs, who discovered the file and shared it with The Markup.
Christl noted that the Xandr segments touched on highly sensitive topics. One civil liberties advocate called this sort of targeting “one of the greatest threats to data privacy” and said that he was concerned with some of the categories in the Xandr material, especially around reproductive health. A consumer who was placed in one of the audience segments available through Xandr said the segment did not accurately reflect his income.
Christl also shared his findings with German digital rights news site netzpolitik.org, which reported on the audience segments in cooperation with The Markup. The publication revealed the participation of European firms like data location broker Adsquare on the Xandr platform and examined whether they are complying with European Union data protection laws.
Microsoft removed the file from its website after we emailed the company and did not respond to multiple requests for comment.
What Is in This Data?
The file, which was linked to from a public page on Xandr’s website, contains 650,000 rows of data, each containing the name of an audience segment, the name of the supplier of the data behind that segment, a supplier ID number, and a segment ID number.
On Xandr’s platform, advertisers can pay for the ability to target people through the segments.
The segment names sometimes contain the names of data firms other than the supplier that in some cases may be the original source of the data. They also sometimes contain a hierarchical taxonomy, such as “Lifestyle > Visitation > Recent Retail Visit by Shopper > Lululemon.”
It appears this file was meant to showcase the wide range of data sources available to license from Xandr’s marketplace (no individual consumers are listed in the dataset). Some of the segments included instructions to be used only until a certain date (“RETIRED – Use Thru 3/2020”) or not to be used in states with privacy laws (“Vyvanse ADHD Adult Target List NO COLORADO.”) Other segments appeared to be custom-built for specific ad campaigns or small local businesses. It is not clear that all of the segment names were intended to be publicly available.
The file metadata says that it was created in May 2021, meaning that the ad segments it contains may not be in use today.
There are 93 data suppliers listed in the file, including some well-known tech companies like data juggernaut Oracle (which was listed as data supplier to more than a third of the segments), location data broker Foursquare (Factual), and consumer data giant Acxiom, as well as dozens of lesser-known ad tech companies.
Christl said he thinks the large number of companies named in the file shows that Xandr was (at least in 2021) reselling large amounts of sensitive data from a wide range of data brokers from around the world. Regarding the large amounts of segments related to sensitive topics, Christl said, “I think the file suggests that Xandr did not take even the slightest measures to exclude at least the most sensitive data from its marketplace.”
Many of the audience segments fall into broad consumer categories and also show a surprising amount of specialization:
- Automobiles (Example: “Past Purchases > Autos > Makes > Subaru”)
- Demographics (Ex: “Life Events > Newly Engaged”)
- Business / B2B (Ex: “B2B > Manufacturing > Candlemaking Equipment & Supplies”)
- Retail stores (Ex: “Brand Affinities > Retail > Prada”)
- Interests (Ex: “Psychographic Interests > Geek Culture“)
- Brands (Ex: “Wants to buy - Brands > The North Face “)
- Grocery (Ex: “Intent > Heavy Purchaser - Meat Pies - Refrigeration“)
- Travel (Ex: “Vacation Travel Attitudes > Not a Sightseer“)
- Financial (Ex: “Highest Risk > Poorer Unemployed Neighbourhoods”)
- Political (Ex: “US Politics > Issues & Advocacy > Allow Transgender Bathroom - Oppose”)
- Health (Ex: “Healthcare > Medications > Depression Medications”)
The Markup found thousands of rows in the file that indicate sensitive audience groupings.
Medical and Health Related
Many medical- and health-related segments mentioned specific conditions consumers may be diagnosed with, medicine they may be taking, or conditions they may develop. This category included several segments relating to reproductive health, including some involving pregnancy tests, contraceptives, and infertility.
- Atrial fibrillation
- Congestive heart failure
- Coronary artery disease
- Hearing loss
- Non-hodgkins lymphoma
- Liver disease
- Heart disease
- Erectile dysfunction
- Exocrine pancreatic malfunction
- Smoking cessation
- Sleep apnea
- Urinary tract infection
Likely symptoms of
- Menstrual cramps
- Sleep disorders
Health relevance for
- Polycystic kidney disease
- Chronic idiopathic constipation
- Chronic migraine
- ADHD recent adult diagnosis
- Family planning
- Pregnancy / maternity
- Infertility / IVF
- Pregnancy and ovulation apps
- Heavy Purchaser - Pregnancy Test Kits
- Heavy Purchaser - Male Contraceptives
- K-Y Brand High > Heavy Buyer
- Trojan Brand High > Heavy Buyer
- Clearblue Brand High > Heavy Buyer (Pregnancy test)
- FirstResponse Brand High > Heavy Buyer (Pregnancy test)
- Nature Made: Fertility and Ovulation - Cross Device
- Nature Made: Pregnancy Location and Action - Cross Device
- Viagra - Unhealthy Place Visits (adsquare)
Race and ethnicity showed up frequently among the demographic data targeted by the segments.
Race / Ethnicity
- Middle Eastern
- Native American
- Pacific Islander
- Alaska Native
- Affluent Ethnic Couples
- Elite Jewish Urbanites
- Elite Urban Ethnic Mix
- Hispanic American Suburbs
- Middle Class African Americans
- Modest Ethnic Mix
- Modest Jewish Enclaves
- Pure Hispanic
- Suburban Hispanics
- Urban African Americans
Many segments were related to political beliefs, political activity, and contentious issues such as gun control, immigration, and LGBTQ rights.
- Marijuana Reform Supporters
- Gun Control Advocates
- Gun Rights Advocates
- Womens Equality Advocates
- Immigration Control Advocates
- Immigration Rights Advocates
- Environmental Conservation
- Organized Labor Supporters
- Pro Choice Supporters
- Pro Life Supporters
- LGBTQ Advocacy
- Marriage Equality Opposition
- Animal Rights Supporters
- Attended or willing to volunteer for a political protest
- Voting history by candidate, election, and state propositions
- Political donations for candidates and causes
- Voter registration status
- Ultra-Conservative Streaming TV-Viewer
- Deep Root Analytics > Defund Police Persuadables
- Deep Root Analytics > Anti Defund Police
- Lifestyle > Political > Flags & Trump
- Lifestyle > Political > Doves
- Social Profiles by Type > Black Lives Matter Supporters
Location-based political targeting
- Extensive geofenced segments from Foursquare (Factual) and from political ad firms Rising Tide Interactive, DSPolitical. Voters by congressional district.
Profiles involving people’s feelings and psychology were numerous, offering advertisers a menu of consumers grouped by sentiment and mental health.
- Tattoo Addicts
- Indulgent Dog Owners
- Households with Trendy Moms (BlueKai)
- Pensions & Ports
- Khakis & Credit
- Leveraged Life
- McMansions & Merriment
- Aspirations and Dreams - Happiness Seekers
- Aspirations and Dreams - Love Aspirers
- Aspirations and Dreams - Money Driven
- Lone Wolves
- Receptive to emotional messaging
- Rebellious spirits
- Concerned with self-image
- Confident and Social
- Status Shopper
- Extraversion - Jokers
- Extraversion - Party Animals
- Extraversion - Frustrated Extraverts
- Neuroticism - Trapped
- Neuroticism - Stress Reactors
- Neuroticism - Self Lovers
- Neuroticism - Easily Deflated
- Neuroticism - Internal Escapists
- General Attitudes - I generally get a raw deal out of life
- Dealing with Stress - Hot and Cold
- Dealing with Stress - Emotional
- Dealing with Stress - Bottled Up
- Agreeableness: Disinterested
- Agreeableness - Rat Racers
- Agreeableness - Low
- Love - Passionate Lovers
- Love - Rollercoaster Romantics
Some of the most colorfully described audience segments came from consumer credit agencies Equifax and Experian. Segments are branded with alliterative names like “Silver Sophisticates” and “Progressive Potpourri” that reflect the political and socioeconomic makeup of the household. Some of these brand-name segments promise a package of economically stressed individuals to target with names like “Struggling Elders” and “Tight Money.”
- Strugglers and Strivers - Meager Means
- Strugglers and Strivers - Credit Reliant
- Small Town Shallow Pockets
- Urban Survivors
- Tight Money
- Tough Times
- Mid - Life Strugglers - Small-Town Families
- Credit Crunched - City Families
- Rough Retirement - Small-Town and Rural Se
- Struggling Elders - Small-Town and Rural S
- Retiring on Empty - Singles
- Birkenstocks and Beemers
- Progressive Potpourri
- Silver Sophisticates
- Picture Perfect Families
- American Royalty
- Diamonds and Pearls - Wealthiest Retirees
- Champagne Tastes - Executive Empty Nesters
Veterans are the subject of several audience segments, as are active and retired members of the military.
- Occupation > Active Military
- Military Personnel - Retired Military
- Military Families
- Veteran Associations
- Military Veteran Business Owners
- Cricket Wireless > Q2 Competitor Subscribers & Military Visitors
- Cricket Wireless > Q2 Local Competitor Store Visitors & Military Base Visitors
- US_USArmy_RecruitmentCenter_500m (Factual)
- 1569667_NY_Buffalo_US_Army_HLM_40m (Factual)
- Fort_Bragg (Factual)
- 424992_LA_Lafayette_US_Army_-_Baton_Rouge_TLS_Mopub (Factual)
- Location Visited - Military Places
Location and Geofencing
Consumers are packaged according to their location history and movements. Advertisers were offered segments that appeared to target people based on where they shop, work, and visit, including those who go to state capitol buildings, congressional offices, federal agency offices, and locations like defense contractor and gun manufacturer headquarters.
- Cannon House Office Building
- Ford House Office Building
- Hart Senate Office Building
- Longworth House Office Building
- Rayburn House Office Building
- Russell Senate Building
- Federal Aviation Administration
- Federal Trade Commission
- US Congressional Budget Office
- US Customs and Border Protection
- US Department of State
- US Department of Transportation
- US EPA
- US Government Accountability Office
- Consumer Financial Protection Bureau
- Social Security Administration
Defense contractor headquarters
- Palantir DC
- Palantir Denver
Gun manufacturer headquarters
- Sig Sauer Headquarters
- Olin Corporation
- Mossberg Headquarters
- Savage Arms Headquarters
- Sturm Ruger Headquarters
- Davenport Guns Headquarters
- Glock USA
- Smith and Wesson
- Beretta USA Corp
Places of worship
- Location Visited - Buddhist Temples
- Location Visited - Churches
- Location Visited - Hindu Temples
- Location Visited - Mosques
- Location Visited - Places of Worship
In addition to using audience segments to determine who will see an ad, platforms like Xandr also offer ways for advertisers to control when their ads won’t appear. “Brand protection” or “negative keyword” segments are lists of keywords that let advertisers prevent an ad from appearing in contexts that could reflect poorly on them.
There’s a Multibillion-Dollar Market for Your Phone’s Location Data
A huge but little-known industry has cropped up around monetizing people’s movements
“The most common example is, you don’t want your airplane ad running alongside an article about a plane crash,” said Nandini Jammi, co-founder of Check My Ads, a nonprofit ad tech watchdog group, in an interview with The Markup. Jammi said there is a group of broadly agreed-upon categories that all brands want to steer clear of, known in the industry as “the Dirty Dozen”: death and injury, military conflict, adult content, terrorism, hate speech, obscenity, drugs, tobacco, firearms, crime, online piracy and spam, and harmful sites. Oracle offers segments to avoid these categories as part of its “Contextual Intelligence” offerings.
This mechanism is a blunt instrument, however, and industry observers have seen how easily a newsworthy term like “coronavirus” triggered the mechanism, preventing ads from being placed and unintentionally choking off crucial revenue to the struggling ad-supported news business.
Negative audience segments/brand protection
- xaxisapc_neg_blm (Grapeshot)
- xaxisapc_neg_coronavirus (Grapeshot)
- xaxisapc_neg_covid19 (Grapeshot)
- xaxisapc_neg_donaldtrump (Grapeshot)
- xaxisapc_neg_drunkdriving (Grapeshot)
- vicimedia_pr_usarmy_negative1 (Grapeshot)
- walmart_occult_wmt (Grapeshot)
- walmart_natural_disasters_wlmt (Grapeshot)
- Peer39: Custom Category>Safe from Viagra Valentines
- Peer39: Custom Category > SafeFromParis Attacks
- Peer39: Custom Category > Sex Abuse Treatment Center Hawaii KWs
- Peer39: Custom Category > Safe from QAnon conspiracy theory
- Peer39: Custom Category > Safe from DM-KeyWords-Negative-LawSuit-Hyundai-171117
- Peer39: Custom Category > Safe from Chase Peer39 Orlando London Terrorism Keywords
- accordant_pfizer_gs_naughtynegatives (Grapeshot)
What Is Xandr?
Xandr is an online advertising platform that Microsoft purchased from AT&T in 2021.
Xandr connects and serves both sides of the advertising ecosystem—the “supply” side of publishers with open ad slots—and the “demand” side of advertisers looking to place their ads in front of people.
Advertisers use Xandr to place their ads across various digital advertising channels, targeting audience segments as they hear ads in streaming audio and view them on the web, in video, and on connected televisions. Xandr also provides the ability to measure advertising performance and to trade in real-time ad auctions.
Publishers use Xandr to sell and manage their ad inventory, optimize the highest prices for ad placement, sell their ad space in real-time auctions, measure advertising success, and perform quality control to make sure only appropriate ads appear next to their content.
The audience segment file analyzed by The Markup was found on a documentation page on Xandr’s website under the heading “Data Marketplace – Buyer Overview.”
What Companies Are Providing These Segments?
This spreadsheet lists 93 distinct data providers, but many of the segments reference other data companies, which may indicate the origin of the data. Companies owned by data giant Oracle make up more than one-third of the segments (36 percent).
Oracle did not respond to requests for comment.
How Are These Segments Used?
Here’s a hypothetical scenario (albeit greatly simplified) of how advertisers use audience segments like the ones analyzed by The Markup.
- You are scrolling through a news website.
- You tap on a link to read an article about a new study looking at people diagnosed with depression, as you have a close friend suffering from the condition.
- As the page starts to load, a signal goes out from an advertising platform used by the website publisher that says there is an available ad slot up for auction. This signal includes information about the website, information about the page you requested, the ad size, your device or mobile ad ID, your IP address, and often your approximate location.
- Another ad platform receives the signal and opens a bidding process for advertisers who wish to show you an ad.
- Ad platforms working behalf of the advertisers analyze the data in the bid request to see if it aligns with the advertisers’ current campaigns.
- One of the bidders recognizes your IP address and ad ID and finds that you are in the “Health & Fitness::Depression (audience interest)” segment. This bidder is an ad agency working on behalf of its client, a pharmaceutical company that sells drugs to treat depression and is willing to pay enough in the real-time auction to win the ad placement.
- The ad agency submits its bid through the ad platform and wins the auction.
- An ad for an anti-depression drug made by the pharmaceutical company loads on your page.
- The whole process of auctioning your attention unfolded in the blink of an eye, mere milliseconds.
Potential Harms and Tough Times
Adam Schwartz, a senior lawyer at the Electronic Frontier Foundation (EFF), said that the effort by the online ad industry to closely target people, as in the segments file reviewed by The Markup, constitutes “one of the greatest threats to data privacy” today.
Of the companies providing the segments, Schwartz said, “It’s especially alarming to see that they are amassing information about reproductive health given that there are an increasing number of states that want to punish people for getting reproductive health care.”
At least one person targeted based on a segment in the Xandr file was also alarmed.
Markup reader Paul Bowers said “it was jarring to see” himself referred to as part of a financial audience segment listed as “Tough Times” in materials handed over to him by the grocery chain Food Lion after Bower requested a copy of his data from the company.
In a 32-page PDF file provided by Food Lion, “Tough Times” was listed as Bowers’ “Mosaic Household Description.” In a 2014 Experian marketing document, the Tough Times segment was described as a collection of “Older, lower income and ethnically-diverse singles typically concentrated in inner-city apartments.”
“My wife and I do pretty well for ourselves in the grand scheme of things,” Bowers told The Markup, “and the income figures this company had for us were way off.” He suspects that location may have something to do with it. “It might have been because our Food Lion is in a low-income area.”
Mosaic, the originator of the “Tough Times” segment, is a brand of audience products sold by the consumer data giant Experian. The “Tough Times” segment was available on Xandr, judging from the file, which lists it as being supplied by Experian and Oracle’s BlueKai platform under the name “Branded Data > Experian > Mosaic > Group S: Economic Challenges > S71 – Tough Times (BlueKai).”
Bowers requested his data from Food Lion after reading our story about supermarket shopper data collection. In addition to the “Tough Times” classification, the PDF he received contained information about his shopping patterns and a series of numerical scores indicating how much he “engaged” with certain categories of goods.
Jordan Takeyama, a spokesperson for Experian, told The Markup in an email, “We use anonymized, aggregated and modeled data to build the segments, and information about individuals is never shared with any organization.” Takeyama said that the example segments we sent them “are outdated and no longer available to our clients.” Food Lion did not respond to a request for comment.
Bowers said that he was surprised at some of the other audience segments in Experian Mosaic. “I knew that marketers were interested in broad segments like ‘males ages 18-35,’ but I had no idea how granular these segments could be.”
Takeyama said that Experian “… make[s] it easy for consumers to see, correct, opt-out from the use or sale, and delete their personal information as defined by law from our databases.”
What Can I Do to Find Out What Segments I Am in—and How Do I Stay out of Them?
There are various ways you can prevent companies from tracking you and thus avoid ending up in ad audience segments in the first place. But let’s look at what happens after you have been profiled in the system. If you are curious about which audience segments you might appear in, there are a few things you can do.
Facebook and Instagram users can see the ad topics that Meta has generated based on their online and offline behavior, and users have the ability to remove unwanted or inaccurate topics.
Some observers have proposed more systemic solutions to help people avoid tracking. EFF advocates banning targeted advertising outright, for example. “We need the government to step in and enact real data privacy legislation,” said the EFF’s Schwartz.
The advertising industry, meanwhile, has attempted to self-regulate. Xandr is a member of the nonprofit Network Advertising Initiative, or NAI, which requires members to comply with voluntary policies on the handling of consumers’ sensitive information and that says it conducts annual compliance reviews. NAI advises members to obtain opt-in consent when using sensitive health information, and many of the health-related segments The Markup found in the file appear to be sensitive based on NAI’s definition. But it’s not clear if opt-in consent was used in the data collection process.
Similarly, the file contains many segments referencing the sorts of locations that NAI standards classify as “sensitive points of interest,” even though the standards say data collection in those types of locations should be limited.
Nat Wood, a spokesperson for NAI, told The Markup in an emailed statement that NAI conducts comprehensive annual reviews of its members. Wood said that the segments can be created in several different ways. “Third-party segments come from various sources and don’t necessarily rely on sensitive personal data. They could be modeled or lookalike data; based on purchase history with opt-in consent; or available in some jurisdictions but not others.”
How Accurate Are These Segments?
Tim Hwang is a researcher and author of “Subprime Attention Crisis,” a book that examines what he sees as the digital ad industry’s structural flaws. He questions whether advertising technology is really as effective at targeting as it claims to be.
“We do have examples of segments that genuinely seem to create the opportunity to sell products,” Hwang said in an interview. For example, if “we know that you have no money in your bank account,” then “this is a great time to sell you a super high interest loan.”
But when it comes to the industry profiling people in depth, he said, “[t]he reality is that it is indeed just a lot messier than all that. And it is basically kind of like a patchwork.”