Subscribe to Hello World
Hello World is a weekly newsletter—delivered every Saturday morning—that goes deep into our original reporting and the questions we put to big thinkers in the field. Browse the archive here.
Hello again, readers! My name is Jon Keegan, and I’m an investigative data journalist here at The Markup. You may have read my reporting on the companies that collect your personal information while you do pretty much anything in your life—whether you are driving in your car, standing in line to vote, shopping at the supermarket, or just checking to see where your kids are.
When I am researching the companies that collect, analyze, and monetize your information, I pay close attention to the marketing language they use when advertising their data stores to other companies in this vast ecosystem. All of these companies are eager to boast about how many billions of users’ data they have, what categories of information they collect, and how far back it goes. But right after these claims, there’s always a reassuring line about how rigorously they respect the privacy of the people whose data they buy, sell, and trade.
Recently, one term that I have started to see alongside these promises of privacy is “data clean room,” so let’s take a closer look and see what that means.
What Is a Data Clean Room?
While the term conjures up images of engineers working in white “bunny” suits out of a ’90s Intel Pentium commercial, data clean rooms aren’t actual facilities but rather a particular scheme of data sharing between servers.
The Interactive Advertising Bureau (IAB), a leading ad industry group, defines a data clean room as “a secure collaboration environment which allows two or more participants to leverage data assets for specific, mutually agreed upon uses, while guaranteeing enforcement of strict data access limitations.”
Privacy
There’s a Multibillion-Dollar Market for Your Phone’s Location Data
A huge but little-known industry has cropped up around monetizing people’s movements
The goal here is to ensure that personal customer data is not leaked from the first party (the company that has a direct relationship with the consumer) to the third party (the company seeking to access this customer data for insights) or any other party. By protecting the first-party data while still allowing some controlled access to it, a clean room theoretically allows third parties to safely access the data to create audience segments for advertising, identify overlapping customers in multiple datasets, perform data enrichment with other datasets, and derive insights, all while not having direct access to the personal data itself.
But there are many implementations of these clean rooms, and there isn’t one true standard. Earlier this year, the IAB published a guide to best practices for working with clean rooms. The guide describes some of the features you might find in a data clean room implementation, including the ability to limit the types of queries and level of detail available to the third party as well as which users have access to the information and for how long. To further protect privacy, a range of different “privacy enhancing technologies” may be employed, such as double-blind encryption of data and adding statistical “noise” to the query results to prevent the reidentification of individuals.
Why Are These Becoming More Popular?
Data clean rooms have been around for years, but recent changes in the broader industry have increased adoption of the technology.
The consumer data industry is operating in what some are calling a time of “data deprecation.” In the past few years, Apple and Google have greatly limited the ability of companies to slurp up your location data, browsing habits, and app usage from your phone. Google has been preparing the industry for the imminent end of the third-party cookie (for several years now).
Adding to this, data firms need to comply with increasingly tight restrictions imposed by privacy laws in the EU, California, Colorado, Connecticut, and Virginia, and six other states that have comprehensive privacy laws due to take effect in the next few years. And the specter of a potential U.S. federal privacy law looms large. Data clean rooms give the consumer data industry a defense against many of the privacy concerns that have resulted in regulatory scrutiny.
Are Clean Rooms a Silver Bullet?
“So there is no silver bullet,” said privacy lawyer Daniel Goldberg, chair of the Privacy & Data Security Group at law firm Frankfurt Kurnit Klein & Selz, in an interview with The Markup. “Every clean room operates somewhat differently. It’s all subject to what the vendor is actually doing. So just like anything, it’s a marketing buzzword in a lot of ways, and the devil is in the details.”
Goldberg, who is also a member of IAB, says he thinks that the use of clean rooms is a positive thing and believes a lot of good work has been done to integrate their use in the industry. But he cautions that simply using a clean room does not relieve companies of their regulatory responsibilities. While the use of clean rooms may reduce their risks, “it can’t just be like, ‘Oh, we use it, therefore we don’t have to comply with anything because we’re no longer subject to privacy law.’ ”
A recent IAB survey of brands, agencies, and publishers using data clean rooms found that while clean rooms have become essential tools for working with data—with two-thirds saying they were satisfied with the technology—they also require significant financial investment as well as infrastructure and staff. One-third of the companies surveyed that did use data clean rooms reported that privacy was one of their challenges when working with the technology.
Privacy Concerns
While there are many strong privacy protections that come with the use of data clean rooms, there are still areas where privacy is a concern.
IAB’s best practices guide lists several common privacy attacks that should be defended against when planning a clean room solution, including “membership inference,” “outlier injection,” “dictionary,” and “manufactured data join” attacks, techniques that could potentially allow an attacker to reidentify consumers and their personal data.
Data security also becomes a huge privacy concern when data clean rooms are in control of so much data. Of particular importance is how the data is stored in the clean room. If a breach were to happen, it is critical that the method of data encryption is robust enough to prevent any personal information from being reidentified.
“I would say the number one concern is the lack of knowledge from consumers that this is actually happening,” said Goldberg, who noted that he recommends that the use of data clean rooms should be disclosed in a company’s privacy policy. “There’s an issue where these clean rooms don’t have a direct relationship for the most part with consumers.… How could a consumer ever figure this out?”
Goldberg said consumers should not only be notified about the use of a clean room but also given a choice. “If you’re going to engage in data clean room activities, you need to make sure that you’re giving users choice about this. And if they want to be able to opt out, you should allow them to opt out,” Goldberg said.
For now, there may not be an easy way to tell if the companies you interact with are using data clean rooms. It is always a good idea to be familiar with the privacy rights you may have depending upon which state you live in.
You may be able to request a copy of the data that has been collected from you, and you may have the right to opt out of such data collection in the future. Whenever you sign up for a new app, platform, or service, it is worth taking a moment to think about what information you are comfortable sharing. And for the brave, while the experience may overwhelm you, I do recommend a quick read through of the privacy policy, as it is one of the main places companies are legally obligated to disclose some details about their data collection practices.
As always, thanks for reading.
Jon Keegan
Investigative Data Journalist
The Markup