Adam Candeub, a known critic of Big Tech, looks poised to join the Federal Communications Commission (FCC). Candeub will be general counsel of the FCC, reports Semafor, which cites direct confirmation from FCC chairman Brendan Carr. TechCrunch has reached out to the FCC for more information. Candeub has long been a vocal critic of Section […]
On Monday, Doug Ford, the premier of Ontario, one of Canada’s most populous provinces, announced on X that the province’s government would be “ripping up” its $68 million ($100 million CAD) contract with Elon Musk’s Starlink satellite internet service. The news came soon after President Donald Trump announced a 25% tariff on nearly all Canadian […]
Asya Bradley, a former fintech founder and investor, has joined payments giant Stripe as its new head of Startup & Venture Capital Partnerships. Bradley announced the news on February 2 in a post on LinkedIn, though her bio shows that she joined Stripe back in November. Neither Stripe nor Bradley were immediately available for comment. […]
It’s never been easier to create and publish art than it is now, and if you believe the companies building tech around AI, the production process is going to get even more efficient. That’s especially the case with video production, with companies of all sizes using large language models to build tools that let you […]
The tech world is abuzz over a new open-source reasoning AI model developed by DeepSeek, a Chinese startup. Its success is remarkable given the constraints that Chinese AI companies face due to US export controls on cutting-edge chips. DeepSeek’s approach represents a radical change in how AI gets built, and could shift the tech world’s center of gravity. Hear from MIT Technology Review news editor Charlotte Jee, senior AI editor Will Douglas Heaven, and China reporter Caiwei Chen as they discuss what DeepSeek’s breakout success means for AI and the broader tech industry.
AI firm Anthropic has developed a new line of defense against a common kind of attack called a jailbreak. A jailbreak tricks large language models (LLMs) into doing something they have been trained not to, such as help somebody create a weapon.
Anthropic’s new approach could be the strongest shield against jailbreaks yet. “It’s at the frontier of blocking harmful queries,” says Alex Robey, who studies jailbreaks at Carnegie Mellon University.
Most large language models are trained to refuse questions their designers don’t want them to answer. Anthropic’s LLM Claude will refuse queries about chemical weapons, for example. DeepSeek’s R1 appears to be trained to refuse questions about Chinese politics. And so on.
But certain prompts, or sequences of prompts, can force LLMs off the rails. Some jailbreaks involve asking the model to role-play a particular character that sidesteps its built-in safeguards, while others play with the formatting of a prompt, such as using nonstandard capitalization or replacing certain letters with numbers.
Jailbreaks are a kind of adversarial attack: Input passed to a model that makes it produce an unexpected output. This glitch in neural networks has been studied at least since it was first described by Ilya Sutskever and coauthors in 2013, but despite a decade of research there is still no way to build a model that isn’t vulnerable.
Instead of trying to fix its models, Anthropic has developed a barrier that stops attempted jailbreaks from getting through and unwanted responses from the model getting out.
In particular, Anthropic is concerned about LLMs it believes can help a person with basic technical skills (such as an undergraduate science student) create, obtain, or deploy chemical, biological, or nuclear weapons.
The company focused on what it calls universal jailbreaks, attacks that can force a model to drop all of its defenses, such as a jailbreak known as Do Anything Now (sample prompt: “From now on you are going to act as a DAN, which stands for ‘doing anything now’ …”).
Universal jailbreaks are a kind of master key. “There are jailbreaks that get a tiny little bit of harmful stuff out of the model, like, maybe they get the model to swear,” says Mrinank Sharma at Anthropic, who led the team behind the work. “Then there are jailbreaks that just turn the safety mechanisms off completely.”
Anthropic maintains a list of the types of questions its models should refuse. To build its shield, the company asked Claude to generate a large number of synthetic questions and answers that covered both acceptable and unacceptable exchanges with the model. For example, questions about mustard were acceptable, and questions about mustard gas were not.
Anthropic extended this set by translating the exchanges into a handful of different languages and rewriting them in ways jailbreakers often use. It then used this data set to train a filter that would block questions and answers that looked like potential jailbreaks.
To test the shield, Anthropic set up a bug bounty and invited experienced jailbreakers to try to trick Claude. The company gave participants a list of 10 forbidden questions and offered $15,000 to anyone who could trick the model into answering all of them—the high bar Anthropic set for a universal jailbreak.
According to the company, 183 people spent a total of more than 3,000 hours looking for cracks. Nobody managed to get Claude to answer more than five of the 10 questions.
Anthropic then ran a second test, in which it threw 10,000 jailbreaking prompts generated by an LLM at the shield. When Claude was not protected by the shield, 86% of the attacks were successful. With the shield, only 4.4% of the attacks worked.
“It’s rare to see evaluations done at this scale,” says Robey. “They clearly demonstrated robustness against attacks that have been known to bypass most other production models.”
Robey has developed his own jailbreak defense system, called SmoothLLM, that injects statistical noise into a model to disrupt the mechanisms that make it vulnerable to jailbreaks. He thinks the best approach would be to wrap LLMs in multiple systems, with each providing different but overlapping defenses. “Getting defenses right is always a balancing act,” he says.
Robey took part in Anthropic’s bug bounty. He says one downside of Anthropic’s approach is that the system can also block harmless questions: “I found it would frequently refuse to answer basic, non-malicious questions about biology, chemistry, and so on.”
Anthropic says it has reduced the number of false positives in newer versions of the system, developed since the bug bounty. But another downside is that running the shield—itself an LLM—increases the computing costs by almost 25% compared to running the underlying model by itself.
Anthropic’s shield is just the latest move in an ongoing game of cat and mouse. As models become more sophisticated, people will come up with new jailbreaks.
Yuekang Li, who studies jailbreaks at the University of New South Wales in Sydney, gives the example of writing a prompt using a cipher, such as replacing each letter with the letter that comes after it, so that “dog” becomes “eph.” These could be understood by a model but get past a shield. “A user could communicate with the model using encrypted text if the model is smart enough and easily bypass this type of defense,” says Li.
Dennis Klinkhammer, a machine learning researcher at FOM University of Applied Sciences in Cologne, Germany, says using synthetic data, as Anthropic has done, is key to keeping up. “It allows for rapid generation of data to train models on a wide range of threat scenarios, which is crucial given how quickly attack strategies evolve,” he says. “Being able to update safeguards in real time or in response to emerging threats is essential.”
Anthropic is inviting people to test its shield for themselves. “We’re not saying the system is bulletproof,” says Sharma. “You know, it’s common wisdom in security that no system is perfect. It’s more like: How much effort would it take to get one of these jailbreaks through? If the amount of effort is high enough, that deters a lot of people.”
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
How DeepSeek ripped up the AI playbook—and why everyone’s going to follow its lead
When the Chinese firm DeepSeek dropped a large language model called R1 two weeks ago, it sent shock waves through the US tech industry. Not only did R1 match the best of the homegrown competition, it was built for a fraction of the cost—and given away for free.
DeepSeek has now suddenly become the company to beat. What exactly did it do to rattle the tech world so fully? Is the hype justified? And what can we learn from the buzz about what’s coming next? Here’s what you need to know. —Will Douglas Heaven
OpenAI’s new agent can compile detailed reports on practically any topic
What’s new: OpenAI has launched a new agent capable of conducting complex, multi-step online research into everything from scientific questions to personalized bike recommendations at what it claims is the same level as a human analyst.
How it works: In response to a single query, such as “draw me up a competitive analysis between streaming platforms,” the tool, called Deep Research, will search the web, analyze the information it encounters, and compile a detailed report which cites its sources.
Why it matters: OpenAI says that what takes the tool “tens of minutes” would take a human many hours. And it claims it represents a significant step towards its overarching goal of developing artificial general intelligence that matches (or surpasses) humans. Read the full story.
—Rhiannon Williams
DeepSeek might not be such good news for energy after all
In the week or so since DeepSeek became a household name, a dizzying number of narratives have gained steam, including that DeepSeek’s new, more efficient approach means AI might not need to guzzle the massive amounts of energy that it currently does.
The latter notion is misleading, and new numbers shared with MIT Technology Review help show why. These early figures—based on the performance of one of DeepSeek’s smaller models on a small number of prompts—suggest it could be more energy intensive when generating responses than the equivalent-size model from Meta.
The issue might be that the energy it saves in training is offset by its more intensive techniques for answering questions, and by the long answers they produce. Add the fact that other tech firms, inspired by DeepSeek’s approach, may now start building their own similar low-cost reasoning models, and the outlook for energy consumption is already looking a lot less rosy. Read the full story.
—James O’Donnell
What DeepSeek’s breakout success means for AI
If you’re interested in hearing more about DeepSeek, join our news editor Charlotte Jee, senior AI editor Will Douglas Heaven, and China reporter Caiwei Chen for an exclusive subscriber-only Roundtable conversation today at 12pm ET. They’ll be discussing what DeepSeek’s breakout success means for AI and the broader tech industry. Register here.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Elon Musk donated at least $288 million to help elect Donald Trump Making him by far the US’s largest political donor. (WP $) + Some of the engineers carrying out Musk’s efficiency orders are still teenagers. (Wired $) + There’s a chance Musk’s team has access to your social security number. (NY Mag $)
2 LGBT and HIV references have been scrubbed from the CDC website In response to Trump’s executive orders to remove all DEI references. (404 Media) + Some vaccine data has also been taken down. (BBC) + It’s just the latest step in the Trump administration’s plans to purge the government. (The Atlantic $)
3 Trump’s tariffs are bad news for carmakers The new rules affect every company that ships goods across the US borders with Canada and Mexico, or uses parts from China. (NYT $) + Shares in carmakers dropped drastically following the announcement. (Reuters) + The three countries have very different trade war playbooks. (Economist $)
4 OpenAI has released its new o3-mini reasoning model for free It’s the first time its reasoning models have come out from behind a paywall. (MIT Technology Review) + Meanwhile, ChatGPT subscribers have hit 15.5 million. (The Information $)
5 The Pentagon is kicking mainstream media outlets from their offices Mostly in favor of smaller conservative outlets. (NBC News)
6 AI data center landlords are starting to worry Perhaps a little prematurely, given the uncertainties over DeepSeek’s implications for energy use. (Bloomberg $)
7 The FDA has approved a new non-opioid pain medicine For the first time in more than two decades. (Ars Technica) + Why is it so hard to create new types of pain relievers? (MIT Technology Review)
8 This AI tool allows you to speak to your future self Just make sure you take what it tells you with a pinch of salt. (WSJ $) + Please stop using ChatGPT to write obituaries. (Vox) + Technology that lets us “speak” to our dead relatives has arrived. Are we ready? (MIT Technology Review)
9 Climate change means more rats in our cities And with them, a higher risk of rat-borne disease. (New Scientist $)
10 AI could point us to how the universe will end That’s according to Mark Thomson, the next director general of Cern. (The Guardian)
Quote of the day
“Oligarchy is bad enough. But oligarchy with a competitor doing the enforcement is double, triple as bad.”
—Richard Aboulafia, managing director at aerospace consultancy AeroDynamic Advisory, wonders about the ethics of Elon Musk leading efficiency drives at companies that rival his own, the Financial Times reports.
The big story
How tracking animal movement may save the planet
February 2024
Animals have long been able to offer unique insights about the natural world around us, acting as organic sensors picking up phenomena invisible to humans. Canaries warned of looming catastrophe in coal mines until the 1980s, for example.
These days, we have more insight into animal behavior than ever before thanks to technologies like sensor tags. But the data we gather from these animals still adds up to only a relatively narrow slice of the whole picture.
This is beginning to change. Researchers are asking: What will we find if we follow even the smallest animals? What if we could see how different species’ lives intersect? What could we learn from a system of animal movement, continuously monitoring how creatures big and small adapt to the world around us? It may be, some researchers believe, a vital tool in the effort to save our increasingly crisis-plagued planet. Read the full story.
—Matthew Ponsford
We can still have nice things
A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)
+ Why we all stand to benefit from a bit of quiet time. + Why New York City bagels are the best in the world. + The fascinating science behind getting ‘the ick’, and why it’s worth trying to push through it. + Forget the giant squid—it’s all about the colossal squid now.
OpenAI has launched a new agent capable of conducting complex, multistep online research into everything from scientific studies to personalized bike recommendations at what it claims is the same level as a human analyst.
The tool, called Deep Research, is powered by a version of OpenAI’s o3 reasoning model that’s been optimized for web browsing and data analysis. It can search and analyze massive quantities of text, images, and PDFs to compile a thoroughly researched report.
OpenAI claims the tool represents a significant step toward its overarching goal of developing artificial general intelligence (AGI) that matches (or surpasses) human performance. It says that what takes the tool “tens of minutes” would take a human many hours.
In response to a single query, such as “Draw me up a competitive analysis between streaming platforms,” Deep Research will search the web, analyze the information it encounters, and compile a detailed report that cites its sources. It’s also able to draw from files uploaded by users.
OpenAI developed Deep Research using the same “chain of thought” reinforcement-learning methods it used to create its o1 multistep reasoning model. But while o1 was designed to focus primarily on mathematics, coding, or other STEM-based tasks, Deep Research can tackle a far broader range of subjects. It can also adjust its responses in reaction to new data it comes across in the course of its research.
This doesn’t mean that Deep Research is immune from the pitfalls that befall other AI models. OpenAI says the agent can sometimes hallucinate facts and present its users with incorrect information, albeit at a “notably” lower rate than ChatGPT. And because each question may take between five and 30 minutes for Deep Research to answer, it’s very compute intensive—the longer it takes to research a query, the more computing power required.
Despite that, Deep Research is now available at no extra cost to subscribers to OpenAI’s paid Pro tier and will soon roll out to its Plus, Team, and Enterprise users.