In these muddled times, at least one thing is clear: Nearly everyone’s tracking you, always. But there is one area that has been largely off-limits to digital data collectors for 20 years: podcast listening.
Traditionally, the podcast ecosystem has been tracking-resistant, in part because podcasters release their shows through RSS, free technology dating back to 1999. Podcast players, also known as podcatchers, like Apple Podcasts and Castro then aggregate those episodes in easy-to-use apps. Listener behavior is strewn across a plethora of apps, many of which don’t share data with anyone, including the podcast creators.
That’s all changing. Advertisers are projected to spend more than $800 million on podcasts in 2020, and companies are devising ways to provide them with data that will persuade them to spend more. The most common tactics include using IP addresses to identify users, adding tracking URLs to ads, and abandoning RSS in favor of proprietary platforms that already track their users.
The change has provoked considerable debate, sometimes combative, within the podcasting industry.
In September 2019, Basecamp publicly dumped its podcast hosting service, Art19, after discovering that the company was building the capability for personalized ads. In December 2019, Libsyn, one of the major podcast hosting services, banned Podsights, one of the major podcast data companies, saying, “[W]e feel it is necessary to protect listener privacy.” In February, podcasting giant PRX hosted a conference on “emerging threats” to privacy in order to encourage more productive discussion. Then in August, one of the major podcast apps, Overcast, added a feature that lets users see which podcasts have enabled tracking.
“I see the storm clouds gathering,” said Michelle De Mooy, a privacy consultant who spoke at PRX’s conference. “There’s a danger that this is a moment that can change away from privacy.”
Podcast listening habits, like browsing history, can potentially reveal a lot about someone’s interests. There are podcasts about mental illness, substance abuse, sexuality, debt, and other sensitive topics. So what do your podcasts know about you?
Traditionally, Podcasts Couldn’t Track Much
The main metric of success for most of podcasting history has been the one thing the RSS infrastructure allows: how many times the file was downloaded. It’s like the early days of the web, when websites proudly displayed a “hit counter” that tracked the raw number of visits.
But the act of downloading a podcast still passes some data, and whoever hosts the file can log a download’s date and time, the podcatcher used, the IP address (which also reveals rough location), and typically, the type of device.
Advertisers, meanwhile, have tended to target listeners based on a podcast’s subject matter. If you’ve ever heard a host give out a promo code for listeners while reading an ad, it’s because that’s basically how advertisers would track which podcasts were actually driving customers. Ads were not personalized because there was no data to personalize them with, and no way to match an ad to a specific user.
This could rapidly change if the podcasting industry were to ever shift away from RSS. And there are some signs of that happening.
Spotify has been signing exclusive deals with megastars like Joe Rogan and Kim Kardashian West. Bloomberg reported that Apple is hiring an executive to lead the charge on original and exclusive podcasts, and Amazon has announced its first platform-exclusive podcast.
Needless to say, these exclusives will likely not be available via RSS.
“They are not podcasts. They’re audio shows,” said Andrew Kuklewicz, the chief technology officer of PRX.
However, when the venture-capital-funded podcast network Luminary appeared to be copying audio and rehosting it instead of using RSS, it triggered an exodus of podcasters. (Luminary clarified that it was not copying audio and was still using RSS, just rerouting traffic first.)
Meanwhile, as podcasts have grown and become more professional, the commercial industry around them has found ways to work around RSS’s limitations.
How Are You Being Tracked Now?
In 2014, the podcast “Serial” debuted, introducing the medium to mainstream listeners. (As of September 2018, “Serial” had more than 340 million downloads across its first two seasons.)
That same year, Acast, a podcasting company based in Sweden, announced “dynamic ad insertion.” Previously, ads were “baked in” to a recording, but now podcasters could mark their files with ad breaks, enabling a hosting company to switch in a different ad based on the time or location the podcast was downloaded. It also paved the way for ads to be sold programmatically, based on a bidding system that automatically matches buyers and sellers without the need for salespeople. Much of the advertising on the web, including Google Ads, is bought and sold this way. Programmatic buying requires consumer data in order for advertisers to rapidly experiment with different ways to target users and optimize ads for the best results.
“That’s probably when I certainly saw things start to change,” said Kuklewicz, the CTO of PRX. “Once you can actually dynamically inject ads, then that data that I’d be able to get from requests becomes actionable. And once that’s possible, then we start to look a lot like the rest of ad tech right now.”
Suddenly, companies on the hosting and ad tech side started to try to squeeze as much data as possible out of the limited interaction they had with listeners, using a couple of different methods: They could add an extra web address in the metadata of an ad, forcing a tiny download that allows the advertiser to register the listener’s IP address, the podcatcher they used, and usually the type of device. They could also use “prefixes” or redirects that take the user on a brief detour to another webpage so a third party can register them, similar to the way a URL shortening service like Bit.ly works.
At this point, we’re still just talking about basic data. But advertisers want more granular data: Did someone listen to my ad on a podcast, and then visit my website or buy something? And what else can I learn about podcast listeners, so I can decide whom to target with my ads?
Some companies, like Podsights, now collect basic data, such as IP address and device type, and combine it with information from third-party data collection companies.
Podsights co-founder Andy Pellett said the company uses this data only to establish “attribution”—that the same person who listened to an ad also visited a brand’s website, for example—and does not share any data that could be used to identify listeners.
“Our goal here is to make tools that will let this medium grow in general and bring more people into the space and create better content,” he said.
Other companies at least advertise the ability to do more granular targeting, though it’s not clear how many podcasts actually use such services. Megaphone says that its Megaphone Targeted Marketplace, which allows automated buying and selling of ads, allows advertisers to target users by “demographic and purchase intent.” Megaphone did not respond to a request for comment.
Your Podcast Player Could Know More About You
Podcatchers themselves have access to far more accurate data about users, such as the full list of podcasts a listener subscribes to and how far they listen in any given episode. Some of the smaller podcatchers, however, have emerged as bulwarks against anything potentially invasive. When Russell Ivanovic, head of product for the podcatcher Pocket Casts, heard that some analytics companies had figured out a way to place cookies on users’ devices when they downloaded a podcast, he immediately disabled the capability in Pocket Casts.
“They might be doing innocent things with it, but we felt like that was like a bridge too far,” Ivanovic said. “That’s not something users are expecting.”
Companies like Spotify, Apple, Amazon, and Google, which all offer apps and devices that play podcasts, are in the best position to collect user data. These companies already have “first party” data, collected through their other services, that they can combine with someone’s podcast listening habits, as well as additional insights from the device you’ve installed them on. Spotify knows if your phone is in your hand or in your pocket, for example.
Another way to collect more data would be if different layers of the ecosystem agreed to work together. National Public Radio attempted this with a project called Remote Audio Data, or RAD, essentially instructions for how podcasters and advertisers could finally measure if someone who downloaded a podcast actually listened to it, and for how long.
NPR publishes lots of podcasts and also has its own podcast player, the NPR One app, giving it access to more data about how people listen. This data was used to reassure advertisers that their ads were actually getting heard, and also to tweak show formats. The creators of the show “Pop Culture Happy Hour,” for example, noticed that their listeners were dropping off in the middle of an episode. “The team was able to pivot and release episodes more frequently, but much shorter and more focused,” said Stacey Goers, senior product manager for podcasts at NPR. “And it really gives users what they wanted.”
The ad server NPR uses, AdsWizz, also allows ads and stories to be swapped in and out of episodes. This enables NPR to experiment with its fund-raising campaigns as well as the content of the show. The first half of “Consider This,” for instance, is the same for everyone, but the second half comes from a station in the listener’s local area.
The catch with RAD is that podcatchers also have to participate, and so far, they haven’t taken it up. “RAD has not seen a large amount of adoption,” Goers said.
Part of that is because app developers would rather spend their limited resources on features that would attract new users, she said, and part of it is because of the lack of centralization.
“Podcasting still doesn’t necessarily have a big governing body that’s putting its foot down and asking for it,” she said.
The Changes Are Controversial—and It’s Not Clear How Widespread They Are
“Ultimately if podcasts are going to grow as a media type, as they emerge, the advertisers are going to require the more granular level of tracking,” said Eric Picard, a consultant and former vice president of advertising product management at Pandora.
But not everyone agrees that podcasting’s health relies on gathering more data.
“That’s bullshit,” said Rob Walch, vice president of podcaster relations at Libsyn, a major hosting service that also sells ads for podcasters. “The podcasters that make the most money are the ones that are doing host-read ads that don’t do any of this. Joe Rogan makes more money podcasting with advertising than any of these people.”
At least for now, podcasting is still a fairly private activity, in that it doesn’t produce much data on its own.
“That’s one of the things, to be honest, that’s stunted podcast growth initially,” said Brad Smith, CEO of Simplecast, a hosting and analytics company. “Advertisers want to go where the deepest, richest data lives. And by default, you’re not going to get data out of podcasting.”
Nothing has fundamentally changed, he said. “We gather the exact same information about a listening session today that we did five years ago.”
An earlier version of this story incorrectly identified the technology NPR uses for dynamic ad and content insertion. It is AdsWizz, not RAD.