Hello, friends,
The inauguration is around the corner, and the nation is holding its breath that no more violence will occur. The tech platforms have all announced stringent measures to remove incitements to violence—and yet, who is monitoring how well tech platforms enforce their rules?
At The Markup we have just launched our Citizen Browser project, which aims to audit content being promoted on Facebook by analyzing the feeds of more than 2,200 paid panelists. This week, we published an analysis of the news stories that Trump and Biden voters in our panel saw about the Capitol Hill riots. We found that the news landscape was highly polarized, with very few stories reaching both Trump and Biden voters.
We are not the only ones trying to peek inside the black box of social media algorithms. We wanted to compare notes with others in the field, so this week we convened a panel discussion with four researchers who are also working to audit social media platforms:
- Laura Edelson, who is a Ph.D. candidate in computer science at NYU’s Tandon School of Engineering, is one of the leaders of the NYU Ad Observatory, which monitors political advertising on Facebook;
- Deen Freelon is an associate professor in the Hussman School of Journalism and Media at the University of North Carolina, Chapel Hill, who studies political uses of social media and other digital technologies;
- Rebecca Weiss leads data innovation projects at Mozilla, including a project that may let millions of people contribute browsing data to different research projects;
- David Lazer is a distinguished professor of political science and computer science at Northeastern University who has been studying the spread of misinformation and disinformation across networks and its influence on politics.
The panel was moderated by Ethan Zuckerman, associate professor of public policy, information and communication at the University of Massachusetts at Amherst and director of the Institute for Digital Public Infrastructure. He is the author of “Mistrust: How Losing Trust in Institutions Provides Tools to Transform Them” (2020) and “Rewire: Digital Cosmopolitans in the Age of Connection” (2013).
It was a fascinating and wide-ranging conversation about the technical and legal challenges of this work, as well as the stakes for society. Here are some edited and condensed excerpts (you can watch the whole thing here or here):
Zuckerman: There’ve been lots of people trying to figure out how Facebook, Twitter, other social networks are influencing the world that we’re living in, and people are trying to move beyond anecdote and into real data on those conversations, and the people that we have here in this Zoom room are just some of the heaviest hitters in that world….
So when researchers study social media, they’re often desperate to get data from the platforms. If only we knew what Facebook knew about its users, we’d finally understand how social media really worked, and this panel here starts with the sort of question, What if that’s not actually true? Maybe the way in which we can actually learn the most about the relationship between online networks and the real world is by using a very old methodology: panel studies.
This means following users, with their permission, and seeing what they see. It’s the method that Nielsen traditionally has used with set-top boxes to figure out what people watch on television. It’s now coming to the surface as a very powerful way to understand what’s actually going on in social media.
So, I want to start this conversation by asking the panelists, What could we get through studying social media from the user perspective, rather than studying from the platform perspective?
Edelson: That’s a question I think about a lot. I think one of the things that is incredibly difficult is that if you do want to study the platforms themselves, you just can’t take your data from the platforms. That was actually the germ of why we started our browser extension, which asks users to contribute ads that they’ve been shown on social media, because the data that was coming from the platforms had a built-in problem. We could never understand what Facebook thought was actually over the line of political content and what wasn’t, because they were defining that line. So if we want to study platforms, users are the best way to do it because those are the people who experienced the platforms.
Lazer: I mean, I think it’s fine to ask platforms for data, and I would encourage them to share data. So I wouldn’t want to send a message that they shouldn’t play. But I also think that’s a severely limited strategy, because it’s often not in their interest to share data that would put them in a critical light.
Weiss: I trained as a methodologist…. And so I think about how classically you’re taught methodology about panel studies as you’re observing the same groups of people over time, ideally representative, because you want to be sure that the trends and the effects that you observe in the sample are generalized to the population….
Everyone can scrape web data, but that doesn’t give us a signal about whether certain sources of information are more likely to be systematically exposed to some versus others. And so for us to really get to the class of disinformation as a problem, if we really want to understand it, we really do need to measure what humans are experiencing on the web, not just the way the web looks to other computers. So panel studies give us that affordance.
Freelon: If you’re really interested in studying people, individually, panel studies are great. A lot of the research that I’ve done has looked at how ideas move through social media systems, and, you know, panel studies are only going to give you a particular impression of that. So if the unit of analysis is the individual, then panel studies are the way to go.
If the unit of analysis is the meme or the idea or the image, then you’ve got to take a different tack there. In other words, I think a lot of the research and methodological work has really gone into the scraping side. And there hasn’t really been enough on the panel side. And I think one of the reasons for this is because many of the platforms are actually retracting in terms of the amount of data that they were allowing people to access.
Facebook is a great example of this. They used to have an open API that folks could access. They shut down in the wake of the Cambridge Analytica scandal. And so now all of a sudden people are sitting back and thinking, Wow, O.K., what can I really do? Oh, panel studies. Can we collect data through the browser? Can we collect data through other means that are not necessarily even sanctioned by social media services.
And so there are lots of ethical questions that have come along with that. But I think primarily, you know, if you put those aside for a second, which obviously we’ll have to bring them back in, what you want to drive your data collection process primarily should be the research questions you want to ask and thus the specific objects of study that you’re interested in analyzing.
Angwin: It’s interesting because, I mean, I’m on this discussion with all of you, and you are all, you know, academics and have Ph.D.’s, and I’m, you know, basically stumbling into this from journalism, which is a very different place to come from.
For me, the panel approach was born out of a frustration with all other approaches. When I was at ProPublica, we built the Facebook political ad collector, which I think a generation of which is now with Laura, at NYU. That was my sort of first attempt to do this type of panel, but I realized that it wasn’t representative because we just offered it to ProPublica readers, and there were basically no conservative ads.
And so when I came to The Markup and thought about what I want to do to bring accountability to the next level, I thought, I’m going to have to build a representative panel, and that’s going to mean paying people and using a survey research firm, which is what we’re doing.
And ultimately, I’m not even sure that it’s representative enough. Facebook is 2.7 billion people, and we have a thousand panelists at any given day. And so I almost feel like what we’re doing really is just raising the sample size of a normal news article from, “I interviewed three people in the diner,” to, you know, “I’ve interviewed essentially a thousand people in the diner.”
Zuckerman: So a thousand is really interesting because it is massively larger than data sets that other people have. You know, the flip side, of course, is you’re starting to do this work in Georgia. I think you had 53 people in the set. Julia, are there, are there ambitions to increase the size of the panel. What do you hope to do with Citizen Browser?
Angwin: I would love to increase the size of the panel, but it’s extremely expensive.
The entire effort that we have spent is, you know, at least a quarter of a million dollars. Which we got a grant for, but it’s an insane effort. So we are hoping to get funding to continue on and to expand, but that’s a big ask. So anyone who’s watching and has an extra couple million, just come our way!
I think there’s an argument to be made that the tech platforms are essentially the only ones policing speech and behavior in our society at any kind of scale. And no one’s watching them, right? Just the way that reporters are tasked with making sure cops behave (although obviously we’re not succeeding on that front either), I think we have an obligation in the press to try to hold them accountable to the promises they make.
Every single day, Facebook puts out a new press release saying, We’re going to kick X off the platform to make it safer. Kick Y off the platform. Well, who checks, right? And that’s what we can do with Citizen Browser.
Zuckerman: Laura, I want to, I want to ask you to talk a little bit about the ad observatory and, knowing that we’re kind of straying into uncomfortable territory as well, a little bit about Facebook’s complaints about it.
Edelson: The ad observer collects ads that are seen by users who installed our browser extension on Facebook and YouTube. We’ve devoted a ton of time and energy in making sure that we don’t collect anything that could be personally identifiable.
But even though we’ve taken these very strong measures to protect user privacy and ensure user consent, we did receive a cease-and-desist letter from Facebook a few weeks before the election. Their claim was that users couldn’t consent to a project like ours, you know, according to Facebook’s terms and conditions.
Obviously we disagree with that. But yeah, I think it does show that platforms are willing to push back really hard against projects like mine where we are really explicitly trying to study the platform itself.
Ethan Zuckerman: And you know, in defense of Facebook, which is not something that I often find myself saying, Facebook found itself in a really miserable place through cooperation with academic research in the Cambridge Analytica situation. And so understanding their desire to limit exposure, there is some reality to it. The flip side is that a project like yours has explicit consent from the users as being incredibly careful on privacy. And it doesn’t seem like Facebook should be able to veto things sort of across the board there.
Internet health is an aspect of public health, that in this sort of toxic environment, I think many of us believe, was a contributing force in the riot on Jan. 6.
That’s an issue of civic and public health, and we probably need to fund this stuff and make space for it in the same way that we are making space for public health efforts … rather than essentially having the five of you, who are the most brilliant people on this particular issue, scraping for small amounts of money to make this possible.
As always, thanks for reading.
Best,
Julia Angwin
Editor-in-Chief
The Markup