Two years ago, when I was just starting The Markup, investigative data journalist Surya Mattu and I were chatting about how we might update the “What They Know” series about digital privacy I’d launched at The Wall Street Journal in August 2010.
The kickoff story for that series surveyed the top 50 U.S. websites and found that on average, they each installed 64 trackers onto their visitors’ computers. The piece profiled a young woman whose TV and film preferences were being bought and sold by an ad tracking company.
Every day for a week, the series unfurled revelations about the state of the surveillance economy: how Microsoft had quashed privacy protections in its web browser; web tracking that identified users age and income; how men were using cellphone software to stalk their partners; and internal debates within Google about how far they should go in tracking users across the internet.
At the time, many people did not realize that their every move was being tracked across the internet by an ever-growing army of companies whose sole business was to monitor and monetize information about people’s online behavior.
A decade later, the public has awakened to privacy issues. Most people know they are being watched online. Data privacy has become an issue on which companies compete: Microsoft has reversed course and doubled down on privacy protections in its web browser, and Apple has made privacy key to its marketing. Europe and California have passed comprehensive privacy protection laws.
And yet the web has only become more heavily surveilled since then. In 2010, the tracking technology of choice was a simple “cookie”—a snippet of data that identifies people by a unique identifier, allowing people to be tracked across the internet. Back then, tracking companies didn’t generally know users’ names and email addresses.
Today, information gleaned through tracking is much more personally identifiable—connected to you through your email or social media accounts. And the surveillance has become creepier and more difficult to stop.
This complex economy of surveillance remains a major financial underpinning for all the services we use online, from shopping to news. And yet the size and scope of the surveillance economy is not itself well scrutinized. Every once in a while, researchers release a study to capture the extent of surveillance on the internet (the most comprehensive surveys were conducted at Princeton between 2015 and 2019). But there is no consistent global measurement of the state of privacy.
And that’s where my conversation with The Markup data journalist Surya Mattu comes in. We wondered: What if we could build a tool that would instantly reveal the trackers on any website?
Today, after 18 months of a creative and persistent engineering effort led by Surya, we are launching the fruits of his work—a one-of-a-kind, real-time privacy inspection tool called Blacklight. We named it after the colloquial name for the ultraviolet light used by police to illuminate what’s normally hidden, like fingerprints and traces of blood. Similarly, our Blacklight tool illuminates parts of the internet surveillance infrastructure that are not normally visible to the public.
I like to think of Blacklight as a meat thermometer that you can stick into any website and get an instant reading on its level of creepiness. When you type a URL into Blacklight, it instantly spins up a web browser in the cloud that visits the website you typed in and runs tests that can diagnose seven different types of privacy-invasive behaviors.
Using Blacklight, we scanned more than 80,000 popular websites, and we found that third-party tracking is just as pervasive as it was 10 years ago—but it has become creepier.
We found that cookies were still widely used—87 percent of websites we surveyed loaded some type of third-party tracking technology.
of the websites we surveyed contained tracking code from Facebook.
Today, one-third of websites we surveyed contained tracking code from Facebook, which allows the social media company to see where its 3.14 billion active members travel outside of Facebook’s walled garden. Facebook’s tracking pixel can allow Facebook to identify users whether or not they are logged into Facebook, depending on how the website using the tracking pixel configures it. That means Facebook can identify visitors on up to one-third of websites—a huge change from a decade ago.
We also found a sharp increase in the use of techniques to surveil users’ mouse and keyboard movements on webpages. One of them is known as “session recording.” I think of this as the digital equivalent of someone standing by your desk videotaping you as you browse a website. Session recorders are third-party services that monitor your mouse movements, where you scroll on the page, and anything you type on a webpage—and they save those recordings for playback.
Inspectlet, a company that provides session recording, advertises it like this: “Watch individual visitors use your website as if you’re looking over their shoulders.”
Three years ago, in 2017, Princeton researchers Steven Engelhardt, Gunes Acar and Arvind Narayanan found session recording scripts on less than one percent of 50,000 popular websites. In our survey, we found session recording scripts on 15 percent of popular websites.
The High Privacy Cost of a “Free” Website
Trackers piggybacking on website tools leave some site operators in the dark about who is watching or what marketers do with the data
We also found that companies are getting creative about tracking users who block cookies. We found a technique known as “canvas fingerprinting” on 6 percent of popular websites. Canvas fingerprinting works by instructing the visitor’s web browser to draw a hidden image or text. Because each computer draws the image slightly differently, the images can be used to assign each user’s device a number that uniquely identifies it.
Rather than simply sharing these depressing statistics, we wanted to offer readers insight into how they are tracked online—a personalized report for the age of personalized tracking, if you will.
We hope that parents will use Blacklight to analyze the sites that their children are accessing, that journalists will use it in their reporting, and that policymakers will use it when evaluating the privacy claims of technology lobbyists at their doorstep.
Thanks to Blacklight, now we can all know what they know. And knowledge, as they say, is power.