Building Ethical Artificial Intelligence

Hello, friends,

As computers get more powerful, we are increasingly using them to make predictions. The software that makes these predictions is often called artificial intelligence.

It’s interesting that we call it “intelligence,” because other tasks we assign to computers—computing huge numbers, running complex simulations—are also things that we label as “intelligence” when humans do them. For instance, my kids are graded on their intelligence at school based on their ability to do complex mathematical calculations.

When we let computers project into the future and make their own decisions about what step to take next—what chess move to make, what driving route to suggest—we seem to want to call it artificial intelligence.

Yet we are continually finding that AI systems are not so intelligent. And worse, they are often biased. For every aha moment where Gmail suggests the correct next word for you to type in your email, there is another story of how an AI system labeled Black people’s faces as gorillas or suggested only photos of women as flight attendants.

To understand the boundaries of what AI can and can’t do, I spoke with Timnit Gebru, a computer scientist who has been at the forefront of studying how to identify and confront the biases in AI, and the founder and executive director of the Distributed Artificial Intelligence Research Institute (DAIR).

Before founding DAIR, she led an Ethical AI team at Google until she was fired in December 2020 after a controversy about a paper she wrote about AI and after she had raised issues of discrimination in the workplace. Timnit also co-founded Black in AI, a nonprofit that works to increase the presence, inclusion, visibility, and health of Black people in the field of AI.

She is also the co-author with Joy Buolamwini—whom I interviewed in a previous newsletter—of a landmark study, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, on how facial recognition software frequently fails to correctly identify women of color.

Our conversation, edited for brevity and clarity, is below.

Timnit Gebru https://s3.amazonaws.com/revue/items/images/016/077/355/original/GEbru.jpg

Angwin: You left Google after a controversy about a paper you wrote about a type of AI called large language models. What are they, and what risks do they carry?

Gebru: Large language models (LLMs) are a type of language technology that, if given a set of words, tries to predict the next word based on the previous text. These models are trained using lots of data on the internet; they essentially crawl all of the internet. They’re used as intermediary steps for many different underlying tasks. For example, Google Search uses LLM to rank queries and do automated machine translation.

Currently, there is a race to create larger and larger language models for no reason. This means using more data and more computing power to see additional correlations between data. This discourse is truly a “mine is bigger than yours” kind of thing. These larger and larger models require more compute power, which means more energy. The population who is paying these energy costs—the cost of the climate catastrophe—and the population that is benefiting from these large language models, the intersection of these populations is nearly zero. That is one concern.

There’s also this underlying belief that the internet reflects society, and if we crawl the entire internet and compress it into a model, it will have diverse views. In a paper I wrote with the linguist Emily Bender and others, we make the point that size doesn’t guarantee diversity. Because of who has access to and uses the internet, you’re going to encode dominant and hegemonic views. Additionally, even when society progresses and there are social movements like Black Lives Matter, large language models will lag behind because they haven’t been trained on this new data.

We also run into an issue because these models can produce such coherent text. There was an example in the paper where a Palestinian man woke up and posted “good morning,” Facebook translated it to “attack them,” and then the man was arrested. In this case, because the translation was so coherent, there were no cues that it was translated incorrectly. The creation of coherent text at scale is alarming, and it can embed racist and sexist language it has been trained on. You can imagine how this could intensify misinformation, disinformation, and radicalization.

Angwin: The paper you mentioned, which was about the limitations of large language models, is titled “On the Dangers of Stochastic Parrots.” Can you explain this title?

Gebru: Emily Bender came up with this name, but I can explain it. Basically, if you hear a parrot saying coherent sentences, you will probably be very impressed, but the parrot is really just repeating things. This is an analogy for what these models are trained to do, which is to search text for patterns and predict words based on the likelihood of the appearances of all the prior strings of words.

This is also why I take issue with people saying that these models are intelligent. If there are certain capabilities that these models possess, let’s describe them. However, this specific language is used to anthropomorphize this technology in a way that allows these companies to relinquish their responsibility to build safe systems.

Angwin: There was an article this past April in the New York Times Magazine on large language models that you criticized. Can you talk about what the press gets wrong here?

Gebru: Researchers do a lot of work to uncover harms, and we’re extremely under-resourced. Then, here comes this article from a popular outlet, and the journalist talks to all these high-up executives at AI-startups, and the framing of the piece becomes “look at this magical thing.” This article treats large language models as if they are some magical intelligent being rather than an artifact created by humans. In my view, the press is supposed to be a check to power, but when the press is adding to the power that Silicon Valley already has, then we don’t have any checks and balances. I also want to add that my colleague Emily Bender spent her entire weekend writing a blog post with a point by point response to this article that everyone should read.

[The author of the New York Times article, Steven Johnson, replied to Timnit’s criticism on Twitter by stating that ‘taking LLMs seriously is not just ‘AI hype.’ ”]

Angwin: You have advocated for more oversight of AI models like LLMs and others. Can you talk about what you would like to see?

Gebru: I wrote a paper called Datasheets for Datasets that was based on my experience as an engineer designing systems. As a circuit designer, you design certain components into your system, and these components are really idealized tools that you learn about in school that are always supposed to work perfectly. Of course, that’s not how they work in real life.

To account for this, there are standards that say, “You can use this component for railroads, because of x, y, and z,” and “You cannot use this component for life support systems, because it has all these qualities we’ve tested.” Before you design something into your system, you look at what’s called a datasheet for the component to inform your decision. In the world of AI, there is no information on what testing or auditing you did. You build the model and you just send it out into the world. This paper proposed that datasheets be published alongside datasets. The sheets are intended to help people make an informed decision about whether that dataset would work for a specific use case. There was also a follow-up paper called Model Cards for Model Reporting that I wrote with Meg Mitchell, my former co-lead at Google, which proposed that when you design a model, you need to specify the different tests you’ve conducted and the characteristics it has.

What I’ve realized is that when you’re in an institution, and you’re recommending that instead of hiring one person, you need five people to create the model card and the datasheet, and instead of putting out a product in a month, you should actually do it in three years, it’s not going to happen. I can write all the papers I want, but it’s just not going to happen. I’m constantly grappling with the incentive structure of this industry. We can write all the papers we want, but if we don’t change the incentives of the tech industry, nothing is going to change. That is why we need regulation.

Angwin: What kind of regulation do you think we need?

Gebru: Currently, there are high-level managers and VPs at tech companies who don’t encourage employees to look into issues around misinformation or bias because they know that if issues are discovered they will have to fix them and they could be punished for them, but if they don’t find the issues there is plausible deniability. Think about if a drug company said, “We aren’t going to test this drug thoroughly for side effects because if we don’t find them, we don’t have to fix them.” That would be insane.

I want a law that makes companies prove to us that they have done impact assessment tests. I want to slow them down and believe that the burden of proof should be on them. Additionally, I want more investment into alternative research so the communities who are harmed by these tech companies can envision an alternative future.

Angwin: Can you tell me a bit about your new research institute?

Gebru: When we look at the history of AI research funding, it is often funded to support military efforts—interest in building autonomous weapons—or it’s funded by large tech companies who want to make as much money as possible for their corporation. When we think about investment in the “AI for good” space, it’s always an attempt to retrofit an existing technology. Say you’ve created this autonomous military tank and then the question becomes how do we use the tank for good?

I wanted this to change. I created a small research institute last December called DAIR, the Distributed AI Research Institute. It’s meant to be an independent—as independent as one can be—institute for AI research. If we’re going to do AI research, we need to be thoughtful about how we do it, what we work on, and how we continue to uncover harms without being persecuted. I want DAIR to be a place where researchers have the space to imagine alternative futures.

As always, thanks for reading.

Best,
Julia Angwin
The Markup

Additional Hello World research by Eve Zelickson.