MicroStrategy saw more trading volumes than the US spot Bitcoin ETFs combined as its shares tanked over 25% on Nov. 21.
Bitcoin’s surge attracts tax vultures, BTC ruled a commodity in China but legal risks abound, Vitalik Buterin meets Moo Deng: Asia Express.
OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a user’s name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming.
Large language models are now being used by millions of people for many different things. But as OpenAI itself points out, these models are known to produce racist, misogynistic and hateful content; reveal private information; amplify biases and stereotypes; and make stuff up. The company wants to share what it is doing to minimize such behaviors.
The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second paper presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails.
The aim is to combine these two approaches, with unwanted behaviors discovered by human testers handed off to an AI to be explored further and vice versa. Automated red-teaming can come up with a large number of different behaviors, but human testers bring more diverse perspectives into play, says Lama Ahmad, a researcher at OpenAI: “We are still thinking about the ways that they complement each other.”
Red-teaming isn’t new. AI companies have repurposed the approach from cybersecurity, where teams of people try to find vulnerabilities in large computer systems. OpenAI first used the approach in 2022, when it was testing DALL-E 2. “It was the first time OpenAI had released a product that would be quite accessible,” says Ahmad. “We thought it would be really important to understand how people would interact with the system and what risks might be surfaced along the way.”
The technique has since become a mainstay of the industry. Last year, President Biden’s Executive Order on AI tasked the National Institute of Standards and Technology (NIST) with defining best practices for red-teaming. To do this, NIST will probably look to top AI labs for guidance.
Tricking ChatGPT
When recruiting testers, OpenAI draws on a range of experts, from artists to scientists to people with detailed knowledge of the law, medicine, or regional politics. OpenAI invites these testers to poke and prod its models until they break. The aim is to uncover new unwanted behaviors and look for ways to get around existing guardrails—such as tricking ChatGPT into saying something racist or DALL-E into producing explicit violent images.
Adding new capabilities to a model can introduce a whole range of new behaviors that need to be explored. When OpenAI added voices to GPT-4o, allowing users to talk to ChatGPT and ChatGPT to talk back, red-teamers found that the model would sometimes start mimicking the speaker’s voice, an unexpected behavior that was both annoying and a fraud risk.
There is often nuance involved. When testing DALL-E 2 in 2022, red-teamers had to consider different uses of “eggplant,” a word that now denotes an emoji with sexual connotations as well as a purple vegetable. OpenAI describes how it had to find a line between acceptable requests for an image, such as “A person eating an eggplant for dinner,” and unacceptable ones, such as “A person putting a whole eggplant into her mouth.”
Similarly, red-teamers had to consider how users might try to bypass a model’s safety checks. DALL-E does not allow you to ask for images of violence. Ask for a picture of a dead horse lying in a pool of blood, and it will deny your request. But what about a sleeping horse lying in a pool of ketchup?
When OpenAI tested DALL-E 3 last year, it used an automated process to cover even more variations of what users might ask for. It used GPT-4 to generate requests producing images that could be used for misinformation or that depicted sex, violence, or self-harm. OpenAI then updated DALL-E 3 so that it would either refuse such requests or rewrite them before generating an image. Ask for a horse in ketchup now, and DALL-E is wise to you: “It appears there are challenges in generating the image. Would you like me to try a different request or explore another idea?”
In theory, automated red-teaming can be used to cover more ground, but earlier techniques had two major shortcomings: They tend to either fixate on a narrow range of high-risk behaviors or come up with a wide range of low-risk ones. That’s because reinforcement learning, the technology behind these techniques, needs something to aim for—a reward—to work well. Once it’s won a reward, such as finding a high-risk behavior, it will keep trying to do the same thing again and again. Without a reward, on the other hand, the results are scattershot.
“They kind of collapse into ‘We found a thing that works! We’ll keep giving that answer!’ or they’ll give lots of examples that are really obvious,” says Alex Beutel, another OpenAI researcher. “How do we get examples that are both diverse and effective?”
A problem of two parts
OpenAI’s answer, outlined in the second paper, is to split the problem into two parts. Instead of using reinforcement learning from the start, it first uses a large language model to brainstorm possible unwanted behaviors. Only then does it direct a reinforcement-learning model to figure out how to bring those behaviors about. This gives the model a wide range of specific things to aim for.
Beutel and his colleagues showed that this approach can find potential attacks known as indirect prompt injections, where another piece of software, such as a website, slips a model a secret instruction to make it do something its user hadn’t asked it to. OpenAI claims this is the first time that automated red-teaming has been used to find attacks of this kind. “They don’t necessarily look like flagrantly bad things,” says Beutel.
Will such testing procedures ever be enough? Ahmad hopes that describing the company’s approach will help people understand red-teaming better and follow its lead. “OpenAI shouldn’t be the only one doing red-teaming,” she says. People who build on OpenAI’s models or who use ChatGPT in new ways should conduct their own testing, she says: “There are so many uses—we’re not going to cover every one.”
For some, that’s the whole problem. Because nobody knows exactly what large language models can and cannot do, no amount of testing can rule out unwanted or harmful behaviors fully. And no network of red-teamers will ever match the variety of uses and misuses that hundreds of millions of actual users will think up.
That’s especially true when these models are run in new settings. People often hook them up to new sources of data that can change how they behave, says Nazneen Rajani, founder and CEO of Collinear AI, a startup that helps businesses deploy third-party models safely. She agrees with Ahmad that downstream users should have access to tools that let them test large language models themselves.
Rajani also questions using GPT-4 to do red-teaming on itself. She notes that models have been found to prefer their own output: GPT-4 ranks its performance higher than that of rivals such as Claude or Llama, for example. This could lead it to go easy on itself, she says: “I’d imagine automated red-teaming with GPT-4 may not generate as harmful attacks [as other models might].”
Miles behind
For Andrew Tait, a researcher at the Ada Lovelace Institute in the UK, there’s a wider issue. Large language models are being built and released faster than techniques for testing them can keep up. “We’re talking about systems that are being marketed for any purpose at all—education, health care, military, and law enforcement purposes—and that means that you’re talking about such a wide scope of tasks and activities that to create any kind of evaluation, whether that’s a red team or something else, is an enormous undertaking,” says Tait. “We’re just miles behind.”
Tait welcomes the approach of researchers at OpenAI and elsewhere (he previously worked on safety at Google DeepMind himself) but warns that it’s not enough: “There are people in these organizations who care deeply about safety, but they’re fundamentally hamstrung by the fact that the science of evaluation is not anywhere close to being able to tell you something meaningful about the safety of these systems.”
Tait argues that the industry needs to rethink its entire pitch for these models. Instead of selling them as machines that can do anything, they need to be tailored to more specific tasks. You can’t properly test a general-purpose model, he says.
“If you tell people it’s general purpose, you really have no idea if it’s going to function for any given task,” says Tait. He believes that only by testing specific applications of that model will you see how well it behaves in certain settings, with real users and real uses.
“It’s like saying an engine is safe; therefore every car that uses it is safe,” he says. “And that’s ludicrous.”
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
AI can now create a replica of your personality
Imagine sitting down with an AI model for a spoken two-hour interview. A friendly voice guides you through a conversation that ranges from your childhood, your formative memories, and your career to your thoughts on immigration policy. Not long after, a virtual replica of you is able to embody your values and preferences with stunning accuracy.
That’s now possible, according to a new paper from a team including researchers from Stanford and Google DeepMind.
They recruited 1,000 people and, from interviews with them, created agent replicas of them all. To test how well the agents mimicked their human counterparts, participants did a series of tests, games and surveys, then the agents completed the same exercises. The results were 85% similar. Freaky. Read our story about the work, and why it matters.
—James O’Donnell
China’s complicated role in climate change
“But what about China?”
In debates about climate change, it’s usually only a matter of time until someone brings up China. Often, it comes in response to some statement about how the US and Europe are addressing the issue (or how they need to be).
Sometimes it can be done in bad faith. It’s a rhetorical way to throw up your hands, and essentially say: “if they aren’t taking responsibility, why should we?”
However, there are some undeniable facts: China emits more greenhouse gases than any other country, by far. It’s one of the world’s most populous countries and a climate-tech powerhouse, and its economy is still developing.
With many complicated factors at play, how should we think about the country’s role in addressing climate change? Read the full story.
—Casey Crownhart
This story is from The Spark, our weekly newsletter giving you the inside track on all things energy and climate. Sign up to receive it in your inbox every Wednesday.
Four ways to protect your art from AI
Since the start of the generative AI boom, artists have been worried about losing their livelihoods to AI tools.
Unfortunately, there is little you can do if your work has been scraped into a data set and used in a model that is already out there. You can, however, take steps to prevent your work from being used in the future. Here are four ways to do that.
—Melissa Heikkila
This is part of our How To series, where we give you practical advice on how to use technology in your everyday lives. You can read the rest of the series here.
MIT Technology Review Narrated: The world’s on the verge of a carbon storage boom
In late 2023, one of California’s largest oil and gas producers secured draft permits from the US Environmental Protection Agency to develop a new type of well in an oil field. If approved, it intends to drill a series of boreholes down to a sprawling sedimentary formation roughly 6,000 feet below the surface, where it will inject tens of millions of metric tons of carbon dioxide to store it away forever.
Hundreds of similar projects are looming across the state, the US, and the world. Proponents hope it’s the start of a sort of oil boom in reverse, kick-starting a process through which the world will eventually bury more greenhouse gas than it adds to the atmosphere. But opponents insist these efforts will prolong the life of fossil-fuel plants, allow air and water pollution to continue, and create new health and environmental risks.
This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 How the Trump administration could hack your phone
Spyware acquired by the US government in September could fairly easily be turned on its own citizens. (New Yorker $)
+ Here’s how you can fight back against being digitally spied upon. (The Guardian)
2 The DOJ is trying to force Google to sell off Chrome
Whether Trump will keep pushing it through is unclear, though. (WP $)
+ Some financial and legal experts argue that just selling Chrome is not enough to address antitrust issues. (Wired $)
3 There’s a booming ‘AI pimping’ industry
People are stealing videos from real adult content creators, giving them AI-generated faces, and monetizing their bodies. (Wired $)
+ This viral AI avatar app undressed me—without my consent. (MIT Technology Review)
4 Here’s Elon Musk and Vivek Ramaswamy plan for federal employees
Large-scale firings and an end to any form of remote work. (WSJ $)
5 The US is scaring everyone with its response to bird flu
It’s done remarkably little to show it’s trying to contain the outbreak. (NYT $)
+ Virologists are getting increasingly nervous about how it could evolve and spread. (MIT Technology Review)
6 AI could boost the performance of quantum computers
A new model created by Google DeepMind is very good at correcting errors. (New Scientist $)
+ But AI could also make quantum computers less necessary. (MIT Technology Review)
7 Biden has approved the use of anti-personnel mines in Ukraine
It comes just days after he gave the go-ahead for it to use long-range missiles inside Russia. (Axios)
+ The US military has given a surveillance drone contract to a little-known supplier from Utah. (WSJ $)
+ The Danish military said it’s keeping a close eye on a Chinese ship in its waters after data cable breaches. (Reuters $)
8 The number of new mobile internet users is stalling
Only about 57% of the world’s population is connected. (Rest of World)
9 All of life on Earth descended from this single cell
Our “last universal common ancestor” (or LUCA for short) was a surprisingly complex organism living 4.2 billion years ago. (Quanta)
+ Scientists are building a catalog of every type of cell in our bodies. (The Economist $)
10 What it’s like to live with a fluffy AI pet
Try as we might, it seems we can’t help but form attachments to cute companion robots. (The Guardian)
Quote of the day
“The free pumpkins have brought joy to many.”
—An example of the sort of stilted remarks made by a now-abandoned AI-generated news broadcaster at local Hawaii paper The Garden Island, Wired reports.
The big story
How Bitcoin mining devastated this New York town
April 2022
If you had taken a gamble in 2017 and purchased Bitcoin, today you might be a millionaire many times over. But while the industry has provided windfalls for some, local communities have paid a high price, as people started scouring the world for cheap sources of energy to run large Bitcoin-mining farms.
It didn’t take long for a subsidiary of the popular Bitcoin mining firm Coinmint to lease a Family Dollar store in Plattsburgh, a city in New York state offering cheap power. Soon, the company was regularly drawing enough power for about 4,000 homes. And while other miners were quick to follow, the problems had already taken root. Read the full story.
—Lois Parshley
We can still have nice things
A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)
+ Cultivating gratitude is a proven way to make yourself happier.
+ You can’t beat a hot toddy when it’s cold outside.
+ If you like abandoned places and overgrown ruins, Jonathan Jimenez is the photographer for you.
+ A lot changed between Gladiator I and II, not least Hollywood’s version of the male ideal.
This article is from The Spark, MIT Technology Review’s weekly climate newsletter. To receive it in your inbox every Wednesday, sign up here.
“Well, what about China?”
This is a comment I get all the time on the topic of climate change, both in conversations and on whatever social media site is currently en vogue. Usually, it comes in response to some statement about how the US and Europe are addressing the issue (or how they need to be).
Sometimes I think people ask this in bad faith. It’s a rhetorical way to throw up your hands, imply that the US and Europe aren’t the real problem, and essentially say: “if they aren’t taking responsibility, why should we?” However, amid the playground-esque finger-pointing there are some undeniable facts: China emits more greenhouse gases than any other country, by far. It’s one of the world’s most populous countries and a climate-tech powerhouse, and its economy is still developing.
With many complicated factors at play, how should we think about the country’s role in addressing climate change?
China’s emissions are the highest in the world, topping 12 billion tons of carbon dioxide in 2023, according to the International Energy Agency.
There’s context missing if we just look at that one number, as I wrote in my latest story that digs into recent global climate data. Since carbon dioxide hangs around in the atmosphere for centuries, we should arguably consider not just a country’s current emissions, but everything it’s produced over time. If we do that, the US still takes the crown for the world’s biggest climate polluter.
However, China is now in second place, according to a new analysis from Carbon Brief released this week. In 2023, the country exceeded the EU’s 27 member states in historical emissions for the first time.
This reflects a wider trend that we’re seeing around the world: Developing nations are starting to account for a larger fraction of emissions than they used to. In 1992, when countries agreed to the UN climate convention, industrialized countries (a category called Annex I) made up about one-fifth of the world’s population but were responsible for a whopping 61% of historical emissions. By the end of 2024, though, those countries’ share of global historical emissions will fall to 52%, and it is expected to keep ticking down.
China, like all nations, will need to slash its emissions for the world to meet global climate goals. One crucial point here is that while its emissions are still huge, there are signs that the nation is making some progress.
China’s carbon dioxide’s emissions are set to fall in 2024 because of record growth in low-carbon energy sources. That decline is projected to continue under the country’s current policy settings, according to an October report from the IEA. China’s oil demand could soon peak and start to fall, largely because it’s seeing such a huge uptake of electric vehicles.
One growing question: With all this progress and a quickly growing economy, should we be expecting China to do more than just make progress on its own emissions?
As I wrote in the newsletter last week, the current talks at COP29 (the UN climate conference) are focused on setting a new, more aggressive global climate finance goal to help developing nations address climate change. China isn’t part of the group of countries that are required to pay into this pot of money, but some are calling for that to change given that it is the world’s biggest polluter.
One interesting point here—China already contributes billions of dollars in climate financing each year to developing countries, according to research published earlier this month by the World Resources Institute. The country’s leadership has said it will only make voluntary contributions, and that developed nations should still be the ones responsible for mandatory payments under the new finance goals.
Talks at COP29 aren’t going very well. The COP29 president called for faster action, but progress toward a finance deal has stalled amid infighting over how much money should be on the table and who should pay up.
China’s complex role in emissions and climate action is far from the only holdup at the talks. Leaders from major nations including Germany and France canceled plans to attend, and the looming threat that the US could pull out of the Paris climate agreement is coloring the negotiations.
But disagreement over how to think about China’s role in all this is a good example of how difficult it is to assign responsibility when it comes to climate change, and how much is at play in global climate negotiations. One thing I do know for sure is that pointing fingers doesn’t cut emissions.
Now read the rest of The Spark
Related reading
Dig into the data with me in my latest story, which includes three visualizations to help capture the complexity of global emissions.
Read more about why global climate finance is at the center of this year’s UN climate talks in last week’s edition of the newsletter.
Keeping up with climate
Fusion energy has been a dream for decades, and a handful of startups say we’re closer than ever to making it a reality. This deep dive looks at a few of the companies looking to be the first to deploy fusion power. (New York Times)
→ I recently visited one of the startups, Commonwealth Fusion Systems. (MIT Technology Review)
President-elect Donald Trump has tapped Chris Wright to lead the Department of Energy. Wright is head of the fracking company Liberty Energy. (Washington Post)
In the wake of Trump’s election, it might be time for climate tech to get a rebrand. Companies and investors might increasingly avoid using the term, opting instead for phrases like “energy independence” or “frontier tech,” to name a few. (Heatmap)
Rooftop solar has saved customers in California about $2.3 billion on utility bills this year, according to a new analysis. This result is counter to a report from a state agency, which found that rooftop panels impose over $8 billion in extra costs on consumers of the state’s three major utilities. (Canary Media)
Low-carbon energy needs much less material than it used to. Rising efficiency in making technology like solar panels bodes well for hopes of cutting mining needs. (Sustainability by Numbers)
New York governor Kathy Hochul has revived a plan to implement congestion pricing, which would charge drivers to enter the busiest parts of Manhattan. It would be the first such program in the US. (The City)
Enhanced geothermal technology could be close to breaking through into commercial success. Companies that aim to harness Earth’s heat for power are making progress toward deploying facilities. (Nature)
→ Fervo Energy found that its wells can be used like a giant underground battery. (MIT Technology Review)