In This Issue
Summer Bridge on Critical Materials
June 15, 2024 Volume 54 Issue 2
The summer issue of The Bridge discusses leveraging new and emerging technologies, infrastructure, innovative approaches, and a resilient supply chain to ensure a stable and reliable supply of critical materials far into the future.

An Interview with . . . Timnit Gebru, founder and executive director, the Distributed AI Research Institute

Wednesday, June 12, 2024

Author: Timnit Gebru

RONALD LATANISION (RML): We are delighted to have you with us today.

TIMNIT GEBRU: Thanks for having me.

RML: To begin, could you introduce yourself and tell us about where you grew up?

Gebru Interview 1.gifDR. GEBRU: I was born and raised in Ethiopia. I came to the States when I was 16, and  I studied electrical engineering. Electrical engineering is a very broad subject. I started out studying analog circuit design. Then I began getting more interested in device physics, so I ventured more into material science or solid–state physics, all still under the electrical engineering umbrella. And then I ventured more into image processing and signal processing. That is what led me to computer vision, which I guess is AI now. That is my educational background.

In the middle of that, after undergrad, I worked at Apple. I was doing analog circuit design. And then I went back to grad school to do my master’s in the same field, but then I decided to do a PhD. At some point during my PhD, I left to work at a startup for a little bit. I did some other random stuff, and then I went back to my PhD and  finished it in computer vision. I worked as a postdoc at Microsoft, and then I went to Google. Then I started my own institute called the Distributed AI Research Institute (DAIR), which is where I am now.

RML: Your history and involvement in AI are important to us, particularly given the pace at which things are changing in AI: almost daily, maybe even faster than that sometimes. My experience is that technology is at its best when it serves to help people. It does not always, and sometimes it actually does the opposite. In order to introduce a technology into the marketplace, there are usually a set of risks that people have to consider. For example, there are technical risks. Does it work? Can it be scaled up? And this applies to any new technology, whether it is AI, a new energy system, or a medical device.

There are also economic risks, which means that you want the product to be affordable for anyone. You want to make sure that there are investors interested to support the development. I think technologists have been very good at managing the technical and economic risks. But I am concerned, and I sense that you are very concerned, about the social risks.

How can new technology be introduced into the market­place so that it is meaningful and helpful to ­people, but also so that the suppliers of this technology can respond responsibly and with accountability? What kind of guardrails do you think we need for new technology going forward?

DR. GEBRU: I want to go back to what you were saying about assessing risks as technologists and just funda­mental engineering principles. I do not think what is happening right now in AI follows any scientific or engineering practices. That is one of the biggest problems. When you create technology, if you are an engineer, you build something and you say what it is supposed to be used for. You assess what the standard operating characteristics are. You do tests in which you specify the ideal condition and the nonidealities.

For example, when I am designing circuits, I say what the ideal case scenario is in my circuitry and what the nonidealities are, and what is supposed to happen in the ideal case scenario. You do all these things when you create technology. And we have all of these processes to assess those things, to test those things, and to communicate those tests. And when I came to the world of AI, there was none of that. It was a shock for me to make that transition.

About six years ago, I was working with Joy ­Buolamwini, and we were looking at these automated facial recognition systems. We looked at these application programming interfaces from three companies for a set of automated facial recognition systems that were kinds of gender-­recognition systems. We found that these are application programming interfaces that anybody could use. And we found that the error rates for darker-skinned women, in particular, were much higher than those for everybody else, and the error rates for lighter-skinned men were almost nonexistent.

The first thing I wanted to do when we were working on that is to see if they told us anything about the guidelines for what this was supposed to be used for versus not, and if there was any information about the training data or the test set, if they did any tests, and what kind of tests they did. 

Gebru Interview 2.gifWhen I was a circuit designer, for example, if I were to choose a particular component like, let’s say, a resistor, there is an idealized model of a resistor. There are equations and things like that that you learn in class. But then you know that in the real world, that is an idealized model. If you wanted to use a resistor for power generation versus audio circuitry versus nuclear power versus life support systems, those are very different kinds of resistors, even though the idealized scenario is the same. You do so many tests to see when the resistor breaks, what happens when it breaks, and so on. You have tons of pages of results on those tests. You communicate those in the form of data sheets. When I am designing in a circuit, I will look at a data sheet. I will decide whether this component is appropriate for my system or not. That is just one of the things we do in engineering.

It was so shocking for me to see that none of this was being done in the world of AI. One of the papers that I wrote was called “Data Sheets for Data Sets.” This was inspired by the data sheets in electronics. I wrote about how, similar to how we do these tests in other engineering practices, we need to do the same here. We need to document. We need to test. We need to communicate what things should be used for.

There was a follow-up paper called “Model Cards for Model Reporting.” Same thing. You ask what guardrails we should have. The most fundamental guardrail. Follow normal engineering practices that, for some reason, get a pass when someone says AI. That should not happen. That has been my approach. That is how my engineering background has grounded me.

I gave the example of face recognition or automated facial analyses. Even in that case, there are all these issues that I just mentioned of not knowing what things should be used for, documentation, and things like that.

Of course, there can still be issues with a new technology even if you write out all the documentation, endure those tests, and you don’t have any disparities and error rates by different groups. It can still be super harmful because you can use it in a harmful way. You can use it to over-police already discriminated against and marginalized communities. You can follow basic engineering practices perfectly well, and a new technology could still be harmful. Before we get to that, there are the fundamental testing and documentation practices that I mentioned.

Fast forward to now. I was talking about a specific task, automated facial analyses, and a specific model that is created for that. Now, we are in a situation where people are claiming to be creating something that seems like a machine god that is supposed to be everything for everyone.

Everybody is talking about AI now, mostly because of OpenAI. They have a chatbot. If you go to the ­README profile, it claims to be able to do literally everything. When you read how Facebook, or Meta, advertised their system called Galactica, they said that this is a system that can write scientific papers, annotate molecules and proteins, write code, and more.

Now, let’s go back to what I was talking about. How do you assess what Galactica should and shouldn’t be used for? How do you test for how it is supposed to operate, the ideal conditions, what the nonidealities are, et cetera? You can’t even do that because they are advertising it as a system that’s supposed to be able to do literally everything. We have already lost in terms of safety because the new technology hasn’t been designed to appropriately scope out what tasks it should and shouldn’t be used for and the appropriate tests haven’t been done.

RML: I hear what you are saying. But let’s take the case of the internet. The internet was launched with the idea that it would serve as a platform for information, and it would be available throughout the world. It has served that purpose very well. We use it every day, everybody, or almost everybody, even people in some of the nonlegacy nations. However, I don’t think anyone forecasted the reality that the internet could be harmful in being used to distribute misinformation and disinformation that are consumed by a public that doesn’t seem to have the capacity to distinguish between reliable and unreliable information. What guardrails should’ve been introduced with the internet? Who should be protecting people from this? Is it a case where the people should be smart enough to figure it out for themselves, or is that utopian to imagine? How should people be protected from the abuse of the internet, which is otherwise a very helpful system?

DR. GEBRU: The internet, to me, is very different from what’s going on right now. There are different protocols, Transmission Control Protocol/Internet Protocol. As someone who works in machine learning, who knows how things are trained and tested and things like that, I see how things are being advertised and how things are being created.

The very first concern right now in terms of protecting the public is the hype around AI, which is very harmful. The scientists, especially, and the academics, journalists, and politicians who are cashing in on the current hype need to be very specific about what systems are being created versus what is being advertised.

I will give an example. Sam Altman appeared in front of Congress and said that there needs to be an international agency to regulate AI akin to the one that was ­created for the atomic bomb. Now, what is this saying? This is saying that a company like OpenAI has created something so powerful that we don’t know how to regulate it thus far. That our existing federal agencies are unequipped. We need to create this new regulatory agency because we need to regulate something that will look like “super intelligence.”

The very first concern right now in terms of protecting the public is the hype around AI, which is very harmful.

Now, this statement needs to be critically analyzed because when we take this for his word, there’s other stuff we are not thinking about. When you assume that a company like OpenAI has created super intelligence, for instance, you have to come back down to earth and ask what they are actually building and how. Let’s look at one of their systems, like the image generation system. How did they get their data? Whose data are they using? Actually, we don’t know because they are not doing the documentation practices that I mentioned earlier. We don’t know whose data they are using. They are using a lot of copywritten material from artists who are not getting compensated. They are using private data, so they are breaching people’s privacy. They are using exploited labor. There are data laborers who are labeling training and evaluation data sets and filtering them out all day. They are getting traumatized, PTSD, and paid less than $2 per hour to do that work. That’s what, in reality, is happening. But we are talking about a digital super intelligence. We are not looking at these things that we already know about, which are data theft, labor exploitation, and huge energy costs. That’s what I mean.

There are things we know how to regulate, actually. There are agencies like the Federal Trade Commission that have reminded companies like OpenAI that they are within the jurisdiction of the Commission. There’s nothing new about the kinds of practices that we are talking about, and there are already agencies that can investigate these companies.

Now, Sam Altman then goes to Congress and talks about how there needs to be regulation. He’s very worried about super intelligence and there needs to be something akin to international agencies, et cetera. People hail him as this Oppenheimer who is worried about his creation. His creation that he apparently can’t stop building.

The first thing we need to do is we need to stop talking about AI as if it is something that is developing on its own like a digital god.

At the same time as he was saying this, there was the EU AI Act, which was dealing with some of the things that I just discussed: transparency in terms of data sets. How are you getting this data that is making you lots of money? We don’t even know. When he saw this kind of very specific regulation, he threatened to pull out of the EU because he said they were going to overregulate these companies.

The first thing we need to do is we need to stop talking about AI as if it is something that is developing on its own like a digital god. We need to talk about it as if it’s an artifact that is built by a bunch of different organizations using certain kinds of practices, and then we need to examine what those practices are. That’s the first thing we need to do. And we already have agencies that do that, except the problem is that the hype has made our government and even scientific bodies not question the very premise of how these things are being used.

The very first concern right now in terms of protecting the public is the hype around AI, which is very harmful.

KYLE GIPSON (KG): Could you elaborate on why the general public, or nonexperts, should be worried about the hype surrounding AI and the overpromising and overapplication of AI? Why is this a problem that we as a society should be very concerned about?

DR. GEBRU: Actually, at the Distributed AI Research Institute, we have a podcast called Mystery AI Hype Theater 3000, which every two weeks discusses an AI hype and what has resulted because of that hype. When you talk about something that is created as something that is so powerful that it has its own its own agency, we are asking the machine to be ethical. We are not asking whether the organizations who created this thing to be ethical. We are asking whether AI be ethical. We are forgetting that there are a whole set of investors, regulators, and engineers, et cetera, who built AI, and the responsibility lies with them. You’re helping them abdicate responsibility by talking about something as if it has its own agency.

Gebru Interview 3.gifNow, when you remove that discourse, which we call anthropomorphizing AI systems, we can go back and discuss exactly what is happening. For example, artists have been very vocal about what has been happening with image generators. Artists are saying that they do a lot of horrible things. For instance, style mimicry, where you can write the name of a style and say, Build me this image in the style of such and such artist. This can basically mimic the style of an artist. People who are hyping these systems will say, No. It’s learning from images just like humans learn from images. The artists are like, No. That’s not how humans learn from images. What you’re actually doing is plagiarizing, and you’re competing in the same market as me and trying to replace me, saying that you’ve created something that’s creative. You’re ascribing human-like qualities to it.

RML: Your comment reminds me of a conversation I’ve had with my wife, who is a professional artist. She would agree 100 percent with your comment, and she would add the copyright infringement issue to it too.

DR. GEBRU: Exactly. That’s just one example of what happens when we don’t believe the hype and bring the conversation back down to earth rather than if we say machines can be creative. We’re not having these discussions of exactly what is happening. That’s why I think the hype is one of the most important things to consider.

Then the other issue here is that the hype allows these organizations to claim that they are creating something that will save all of society. Literally just recently, Sam Altman said that if he had $7 trillion that he would create artificial general intelligence that would then solve the world’s problems. Just hand him $7 trillion, and he will fix the entire world. This is what we are letting happen.

But in the meantime, what happens when you try to create an everything-machine, like what they are trying to do right now, as opposed to a task-specific model? Let’s say I’m trying to build a model, and I say that the goal of this model is to identify certain kinds of diseases in plants. The scope of what I’ll do is limited. I’m not going to try to have a huge data set. I will curate my data set. I’ll look through it. I’ll document it appropriately, and I’ll see whether this model will work for this appropriate use case.

When I say that I’m trying to build a digital god, the data set is already intractable. What data set do I use to train this machine? I just ingest everything, whether it is people’s private data or what the Nazis are saying on the internet. And then I repackage that and spit it out, and I can’t even document or curate my data for a specific task.

What really worries me is that people are building on top of these models that we don’t know anything about. Let’s say, right now, OpenAI has an app store, right? ­People are building on top of their models and they don’t even know what’s in them. They can change from week to week. You can get one type of output this week, and your model can have a different kind of output next week, and you don’t even know. They are not even supposed to tell you how the version has changed or anything like that. Everything we know about version control software systems, bugs, reproducibility in engineering, is not being followed right now. We are letting it happen because of the hype. They are building a digital god, so they don’t have to follow these engineering practices.

RML: Let me take a step back for a moment. You said earlier that there are regulatory agencies that exist today that would in fact provide some degree of oversight. Are they not? And if they are not, why are they not?

DR. GEBRU: I think that some of them are, but they are heavily understaffed. For instance, I think the Federal Trade Commission is doing a fantastic job. But they are heavily understaffed. Let me give you an example. One of the biggest issues right now is that the onus is on these agencies to say, I think something is wrong. Or the onus is even on bodies like the EU, where we think that there is much more regulation. The onus is on me, as an individual, to say, I think my data has been used without my consent, and then they will investigate. You have these massively resourced corporations and then very under-resourced agencies, and the onus is on them to say there might be a problem, or the onus is even on individuals.

What I’d like to see is, before you put new technology out, you should have to pass certain standards. You should not be able to put something out into the world without first doing a certain number of things. For example, in electronics, we used to have the Restriction of ­Hazardous Substances in Electrical and Electronic Equipment Directive, which says that everything has to be lead free. Or there are certain radiation levels that you can’t have. There’s a whole bunch of these kinds of regulations where the onus is on the organization to prove that they haven’t done harm before they put things out into the world.

With research, you are able to serve as an early warning system when there are issues, and you are also able to imagine a different technological future.

Now, imagine―I am going back to the art situation. Imagine that I’m an artist who thinks that perhaps corporations have stolen a whole bunch of my data to train their systems without compensation or consent. Let’s say I wrote a book, and they want to adapt it into a movie. Nobody can just do that without telling you. There has to be some sort of negotiation for all this stuff. Right now, let’s say these companies can just take whatever script you wrote and train some text-to-video or text-to-whatever thing, and I don’t know as an author. But let me see. I asked them, I think you’ve done this. I think you’ve used something without my consent. And they’ll say, Prove it. We have five billion data points. Go through it yourself. You can’t do that. You’ll say, No. Why don’t you tell me that you haven’t actually done that? They haven’t really documented it. They don’t have to document it. There’s nobody that is saying that you have to have a whole database and make it easier for us to search your data set or anything like that. There are really very few requirements that they have to have in order to put these systems out there.

RML: We live in a very technologically intense world. That’s a testament to what you are saying. We need regulatory agencies that have some potential power to provide oversight and accountability. What do you think of the constitution of our legislative bodies? There aren’t many members of Congress who are technologists. Do you think that is a problem? We’re living in a world where technology is all around us, and yet you look at the population in Congress and technologists make up a very small fraction. Should that change? Why don’t you run for Congress?

DR. GEBRU: I’m too loud. There’s no way.

RML: I’m actually quite serious. If people who are concerned about this don’t get involved, who will?

DR. GEBRU: Two things. The number one thing that I think is a problem, even before we get to members of Congress who understand technology, is that they are too enamored with leaders of corporations. They are too enmeshed. Chuck Schumer had this AI insight forum series. It featured Elon Musk, Sam Altman. You never saw people who were negatively impacted by their systems in these kinds of forums. They are not hearing from ­people who have been negatively impacted by these systems. They are seeing press releases from leaders of these corporations. That’s the number one issue in my opinion. Even if you are not a technology expert, you can understand negative impacts when you hear them, pros and cons. You don’t have to know how a car works to see what kind of damage it could do. But if you are always just hearing from the CEO of car companies and you are not hearing from a victim of someone who was killed because of some issue or something, then obviously the way you are legislating is going to be different. That’s one.

The second thing, and I think the biggest issue, is that we really don’t have checks and balances. I would love to see an academia that’s a check to both industry and government. I’d like to see a government that’s a check to industry. We need to have checks and balances, but that’s not really what’s happening right now.

RML: What, in particular, would you like to see? This is a very important conversation to me in the sense that you have experience with the technology giants in your career, and you have insights that I think should be meaningful in terms of how this country goes forward, how the world goes forward. What would you do specifically, if not electing more members to Congress like yourself? I’m serious. I hope you will not disregard that completely. But what should we do? How do you change all this? It’s overwhelming. Most people don’t have any idea what AI is. They just know it’s affecting their lives.

DR. GEBRU: Let me give you an example. The National Science Foundation (NSF) provides research funding, and it’s from the government. I don’t think that the NSF should need to have money from any other organization except for the US government in order to give out funding for research. But I see these calls for research proposals from the NSF that are joint calls with foundations like Open Philanthropy, which was founded by an ex-Facebook person, or with a specific type of ideology about technology. There was a joint call with Amazon, which means Amazon supplied a certain portion of the money. I should be able to have funding from the US government that is separate from Amazon and any other body. These are the kinds of things I’m talking about, so that my research can just be funded by that one entity because where your money comes from is going to determine what the goals are of your research. That’s one specific thing we could do.

RML: I’m not sure I agree 100 percent with you on that. My sense would be that the NSF would put some guardrails on the money they accept from a foundation.

DR. GEBRU: If I want money from Amazon, there are fellowships at Google, or whatever. And then I want money from the NSF. I should just be able to have money from the US government.

RML: Let me give you a concrete example. I mentioned that I taught at MIT in material science. There is an initiative called the Materials Genome Initiative. The concept is to use computational materials science and engineering to develop new materials, advanced materials that meet certain advanced engineering needs. For example, if you are interested in splitting water to raise hydrogen as an energy source, you need a durable semiconductor photoelectron. MGI is federally funded. But in order to make that happen, they have to have relationships not only with universities, which produce the students and essentially the manpower going forward, but also technology companies that can implement it, that can it take from the design stage all the way through deployment. There has to be a relationship between the researchers and the folks who deploy the technology.

DR. GEBRU: The Pentagon doesn’t take money from Amazon. The US government has a lot of money. There should be an agency that gives out money and does not have to take in money from corporations. Then I know that my money is coming from the taxpayers’ money. There is a difference between having a relationship and undue influence. What I think right now is that there aren’t enough checks and balances between government agencies and philanthropy. I wrote an op-ed about that when I founded DAIR. One of the examples I gave was: Let’s say you have co-founders of Google, and then they found some philanthropic organization, and then that philanthropic organization influences government. Now, you have this one entity that is influencing everything. You can’t really get away from that influence, which means you don’t have any checks and balances. I think that right now what’s happening is that everything is too enmeshed, and we don’t have enough checks and balances.

RML: I kind of agree with that. We don’t have enough checks and balances to keep all of this under some pace of evolution that is sensible.

For AI to be used ethically, it means that first we have to start with the goals of people or organizations building it and whose needs it’s supposed to serve. And if AI is supposed to serve a specific group of people’s needs, it should be created with their input and their goals in mind.

DR. GEBRU: And then your question about running for Congress. I’m a researcher. I still want to continue to do research. I talk to all sorts of legislators, but I’m not a politician. I’m still a researcher and that’s what I do. I think with research, you are able to serve as an early warning system when there are issues, and you are also able to imagine a different technological future. Right now, we’re living in someone else’s imagination, and I have to constantly say don’t do this, don’t do that. I want to be able to institute my imagination of whatever technological future I want to live in as well. By the time we get to legislation, in some ways it’s too late.

In 2021, way before ChatGPT had come to the ­public, I wrote a paper on large language models and the ­dangers of large language models. We were able to warn people because we were in research and we were seeing the ­early research practices that are then proliferated into the world. That’s still what my skill set is, and that’s still what I want to continue to do. If I’m in legislation, I’ve waited all this time, and now I’m acting after everything has already happened. I want to influence it before it gets to that stage.

KG: To begin to get into your vision of the technological future, I’m wondering if you could talk a little bit about what it would mean for AI to be used ethically. Are there any examples of AI being used for the good, or in ethical ways, right now?

DR. GEBRU: I think that for AI to be used ethically, it means that first we have to start with the goals of people or organizations building it and whose needs it’s supposed to serve. And if AI is supposed to serve a specific group of people’s needs, it should be created with their input and their goals in mind.

I have some examples of what we at DAIR are doing. There’s a very recent article[1] on one of our works. To me, this is a very specific use of technology. We weren’t trying to build a machine god or anything like that. We were building a specific application, and the research questions kind of came from that specific application. We were trying to examine what has changed in apartheid South Africa to analyze what we call spatial apartheid. We can look at aerial images and you can see very clearly where the townships were that were mandated during apartheid and where suburbs are. The question that we were asking was, have people’s lives gotten better since the legal end to apartheid? The South African government doesn’t differentiate between townships, which were created during apartheid, and other kinds of suburbs. We wanted to use computer vision techniques to segment out the townships and then ask, how much greenery is there in these townships versus elsewhere? What about different kinds of services? People know that their lives are actually worse, but they don’t have the data to back it up. We could do that kind of stuff and then maybe try to force some sort of policy changes. Similar work has been done in the United States where people have used satellite imagery to analyze pollution near prisons to show that a lot of prisons are built in places that are highly polluted. We did this kind of work because the goals came from the people themselves. The person who is leading this research project is a computer vision researcher who grew up in a township. The goal of the project came from her experience.

There are other ways in which I think you can use automatic speech recognition to do automatic captioning of videos and things like that. Those could be helpful. But again, it has to start from an actual need that exists for people. That’s what I think. Just because things have been done a certain way in the past, it doesn’t mean they have to be continued to be done in that way in the future.

RML: We had a conversation a couple of weeks ago with Kimberly Bryant, the founder of Black Girls Code. She described herself as a social innovator. How would you describe yourself?

DR. GEBRU: I would describe myself as a scientist. I’ve always just tried to be a scientist and it’s kind of hard to do it within the environments that I’m in. All I’ve been really trying to do is that. I write a paper on large language models. I get fired. I’ve just been trying to do the job of a scientist.

RML: But you’re more than that. You are unique in the following sense. You worked for the technology giants during your very early career. You spent time at Apple, at Google, at Microsoft. These are the tech giants. You’re in a position probably unlike anyone who could look at what is evolving in a technologically intense world and say, we have to step back and think about the following things in order to make sure we’re not harming society and that we’re serving a useful social purpose. That’s a calling that’s really unique and you have the capacity.

DR. GEBRU: I say that I’m literally a scientist and an engineer, whose default is something different. It means that I am centering the communities that I come from. And that’s really the only thing that’s different in my view. I’m not necessarily following along with what has been the default. I have a different default. Chanda Prescod-Weinstein wrote a book called the Disordered Cosmos, and it’s a physics book, but it has a different default. It’s still a physics book. It’s just that the narrative is different from what we’ve seen before. To me, that’s sort of what I think. It doesn’t mean science is different. My default is doing it differently.

RML: Okay. I’ll buy it. But I also hope someday I can vote for you.

DR. GEBRU: It’s good to know I have your vote.

KG: I’m looking forward to the book. That’s what I’m looking forward to.

DR. GEBRU: I’m supposed to be writing a book, but I’ve been putting it off. I’m hoping to write a book on my vision for a positive technological future, because I think that we are stuck cleaning up, and that goes back to the question about Congress too. I think by the time we get there, a lot has happened already, and we are looking backwards. I’d rather try to influence things early on. How do we teach the students? How are we shaping people’s understandings of how technology should and shouldn’t be built, what they should do, et cetera? That’s where I want to be. I want to be able to shape technology before we get to the legislative process.

RML: I think that would be a genuine contribution. I really do.

KG: I’m excited for your book to be out in the world, even though it’s in the very early stages.

DR. LATANISION: Thank you, Timnit. This has been a wonderful and informative conversation.

DR. GEBRU: Thank you. It’s been fun.

 


[1]  Tsanni A. 2024. How satellite images and AI could help fight spatial apartheid in South Africa. MIT Technology Review, January 19.

About the Author:Timnit Gebru is founder and executive director, the Distributed AI Research Institute