Shopping for an Algorithm

Hello, friends,

For some reason, YouTube is constantly recommending videos to my 14-year-old son that I consider objectionable. Videos featuring men who want to kill women who won’t sleep with them, men spewing hate against immigrants, and sympathetic portrayals of mass shooters.

As a parent, I have little control over YouTube’s recommendation algorithm. I can block YouTube entirely—but that would limit my son’s consumption of amazing content: the Numberphile videos, where he learns new math concepts, and the GothamChess videos, where he discovers new chess strategies.

And so I occasionally find myself trying to retrain the YouTube algorithm using techniques the company recommends on its help pages. With my son’s permission, I visit his YouTube page and choose “Don’t recommend channel” on objectionable videos in his feed.

Hand-to-hand combat with the algorithm is, however, not an ideal situation. I would much rather be able to just choose from a menu of YouTube recommendation systems, perhaps a drop-down menu of “hate-free” YouTube recommendations, “light amount of hate,” or “full-on hate.” Or perhaps I could choose a branded recommendation system, like “Numberphile’s picks” or “GothamChess’s picks.”

There are some limited ways to shop for algorithms already. Facebook and Twitter both offer users an opportunity to choose a chronological feed instead of an algorithmically curated feed. And the European Union recently adopted a new law, the Digital Services Act, that will require large online platforms to offer users at least one algorithm that is not based on user behavior.

To understand what it might look like for users to have meaningful control of their algorithms, I spoke with Noah Giansiracusa, an assistant professor of mathematics and data science at Bentley University. He is the author of the recent book, “How Algorithms Create and Prevent Fake News” (Apress, 2021), and published an opinion article in The Boston Globe earlier this year calling for Facebook to allow users more control over their own news feed algorithms.

Our conversation, edited for clarity and brevity, is below.

Noah Giansiracusa https://s3.amazonaws.com/revue/items/images/015/694/579/original/noah.jpeg

Angwin: How did a mathematician get interested in the issues of algorithms, fake news, and more?

Giansiracusa: My background is in pure math, which means exploring math for the sake of math. However, at a certain point, I started to hunger for some connections to reality. I kept seeing news about how algorithms are impacting society, and it felt frustrating to be so close to these concepts yet also distant. At the same time, I’d changed jobs from a department where I was teaching math to one where they wanted me to teach data science, and I thought it would be a fun opportunity to teach a class that explores the role of data-driven algorithms. I was amazed at how often math comes up in the topic of misinformation. It’s never the central ingredient, but it’s almost always peripheral. For example, with misinformation, people are interested in how it spreads through networks, and there’s a whole field called graph theory in math where we study how things traverse across networks. It seemed like every topic I was encountering in this space of misinformation included math.

I decided, after teaching this course, to write a book that bridges these different topics that are centered on misinformation. My main goal was to embrace the technical side without getting lost in the details. Maybe it’s a coincidence, but I think math is perfect for this: Math is all about studying complicated things that are so complex that you’re forced to step back and see the big picture.

Angwin: What value do you think the mathematician’s perspective brings?

Giansiracusa: It is not being afraid of the technical details but also not getting lost in them. For example, one famous research article published in Science in 2018 quantified how fake news and misinformation spreads faster, deeper, and more broadly through social networks than true news. This finding made tons of headlines and was covered by dozens of newspapers. Mathematical literacy helps to understand this article’s methodology and to interpret what it really implies about the virality of fake news.

I’ve seen a lot of discussion of algorithms that are extreme in both ways: that they’re incredibly powerful and they know us better than we know ourselves and, on the other side, that they’re stupid and can barely even detect hate speech. Given the loud voices on both sides, it’s really hard to know what’s true. This is where the mathematical lens can help remind us that some things are more scientifically demonstrable than others.

Angwin: You have proposed that we should be able to tune our own algorithm. Can you walk me through your proposal?

Giansiracusa: It strikes me that there’s a pretty simple thing Facebook could do to change the way we engage with its algorithm and that this would address some, though not all, of the concerns. The idea is very simple: Let users control the variety of inputs that go into the algorithms.

Facebook’s news feed algorithm is based on external data, like the posts themselves, but also on data from the user: Your history of searches, engagements, and basically all the things you have done on Facebook are inputs to these algorithms. This algorithm sorts my friends’ posts to decide the order in which I should see them. Currently, Facebook’s algorithm uses as many predictors as it can to try to decide which posts should be ranked at the top. My proposal is to add, essentially, light switches where the user can turn on and off each data stream, or signal, that the algorithm reads. For example, I should be able to switch on and off demographics. I think that the individual should, if they want, be able to turn off all personal data and have posts ranked solely based on what the algorithm thinks about the posts themselves and how other users have treated them, rather than anything specific to the individual.

Then there is a second part to this proposal. Facebook’s algorithms use all the data sources that I mentioned before to try to predict the probability that I’ll do various things with the posts on my newsfeed—that I’ll comment on them, like them, share them, select the “ha ha” reaction. These are forms of engagement. Facebook uses machine learning to predict the probability that I will exhibit these various forms of engagement, and then Facebook assigns these values or weights that signify what each form of engagement should count for. Facebook adds up these probabilities using the weights, and the number you get at the end of the day is called the value. Then, Facebook ranks the posts based on these values.

I think we should also let users adjust these weights. Maybe I don’t like getting into political arguments with strangers on the internet. If I could control these weights, I could turn down the weight on long comments and angry reactions. Facebook has done that for us, but they’ve never let us do that individually. Facebook also tries to estimate how close you are to your friends, so I should be able to turn up the weight on closeness so that posts from my friends and family will be prioritized in my feed.

Right now Facebook has all this flexibility, but they are monolithically deciding these values for us. They decide what data inputs go into their algorithm, they decide what all these weights are, and they adjust them based on their own decisions. If they’re already doing this, why not just let us do it individually?

I don’t like the idea that one company makes all these decisions. I think individuals should be able to make these decisions. It would be amazingly simple to implement because Facebook is basically already doing this. They’re just making these decisions behind the scenes instead of letting us make the decision individually.

Angwin: How does this compare to what Frances Haugen suggested? I believe she wanted the news feed to be entirely chronological, right?

Giansiracusa: It’s very far from the chronological feed that Haugen advocated for because it is still an engagement-based feed, but the user gets to decide what engagements are prioritized. This is really a compromise to Facebook, because it’s basically letting them do exactly what they’re doing already. It’s still playing their game, it’s just letting us have more flexibility within their engagement-based world.

This brings up a valid question, which is, would this actually help? If we’re still using engagement-based algorithms, machine learning, and user data, then what’s the point? Are we still going to have all the same problems? I don’t know what would happen at an aggregate, global scale, but I think at the end of the day people are afraid of Facebook making too many decisions that affect us in either direction. Liberals think it’s not controlling misinformation enough, and conservatives think it’s censoring speech too much. And some populations are more vulnerable to hate speech than others. So why don’t we just let people have more flexibility with the algorithm—including being able to adjust the weight on the hate speech and misinformation filters?

Angwin: Would this approach just increase the polarization we already see on social networks, where everyone has their own view of reality?

Giansiracusa: It’s a very interesting question. My view is that we’re already moving in this direction where people are going to fracture their social media experiences. We’ve already lost trust in one large platform to do everything, and people are already starting to seek out alternatives. And I think from Facebook’s perspective, it would make sense for them to jump on that bandwagon.

And from my perspective as a user, it has some advantages. For instance, I would still be on the same platform as my neighbor who has a very different political taste, and we’d still see each other’s posts, but they’d be emphasized in slightly different ways. I wouldn’t be totally cut off, but it’s more of a fine-grain tuning of our experiences.

Angwin: Critics have argued that users will be daunted by the complexity of having to tune their own algorithms. Do you think that is what is holding back this idea?

Giansiracusa: To make it simpler, Facebook could include some preset settings: maximal privacy, news junkie, friends and family, etc. Each of these would be a specific configuration for the data-source light switches and the engagement weights so you don’t have to worry about the details. We have presets for our TVs and home audio systems—why not our social media platforms too?

But I’m glad you brought up the complexity of these algorithms because that raises an important issue in the background to many discussions of social media and the tech giants, which is the aura of intimidation about these algorithms stemming from their complexity. I want to really drive home that just because an industry or technology is complicated, that doesn’t mean that we can’t regulate it, we can’t inspect it, and we can’t find ways of safely interacting with it.

We all fly in airplanes. I don’t know anything about aeronautics or how an airplane works, but I trust my government to regulate airplanes. I trust the pilot to have the proper training. We need to get over this intimidation of algorithms and say, “They’re not that complicated.” Our life is full of complicated things that we’ve learned how to use safely. We’re not going to get rid of airplanes because there’s crashes; we’re going to minimize the number of crashes.

We’re not going to get rid of machine learning; we just have to learn how to make it safer, more tightly inspected, and lower the externality costs. The flexibility of letting users tune the algorithms won’t address all the problems, not even close, but I believe it’s at least a small step in the right direction. For instance, just by showing users the data-source switches, we’d have more transparency and public understanding of the algorithms because we’d see what data the algorithms actually rely on. And many experts agree that we need more data privacy measures. Even if the government can’t agree on how to mandate this, the platforms could give it to us by letting us switch off the data sources that we want to keep private. This is a great example of the clarity that a basic mathematical mindset can bring: Rather than getting lost in the details of complicated machine learning algorithms, we can step back and say that these algorithms have certain inputs, and maybe it’s time to tell users what those inputs are and let them switch off the ones they are not comfortable with.

As always, thanks for reading.

Best,
Julia Angwin
The Markup

Additional Hello World research by Eve Zelickson.