Ice Lounge Media

Ice Lounge Media

People working for, or with, Elon Musk are reportedly taking over the inner workings of multiple government agencies, including the Office of Personnel Management, the Treasury Department, and the General Services Administration. The Washington Post reported Friday that the highest-ranking career official at Treasury is leaving the department after “a clash” with people working for […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

Tech companies developing self-driving vehicle technology have tapped the brakes on testing on California’s public roads, according to new data from the state’s Department of Motor Vehicles. The agency reported Friday a total of 4.5 million autonomous vehicle test miles were logged in 2024, a 50% drop from the previous year. That figure covers two […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

OpenAI used the subreddit, r/ChangeMyView, to create a test for measuring the persuasive abilities of its AI reasoning models. The company revealed this in a system card — a document outlining how an AI system works — that was released along with its new “reasoning” model, o3-mini, on Friday. Millions of Reddit users are members […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

To cap off a day of product releases, OpenAI researchers, engineers, and executives, including OpenAI CEO Sam Altman, answered questions in a wide-ranging Reddit AMA on Friday. OpenAI finds itself in a bit of a precarious position. It’s battling the perception that it’s ceding ground in the AI race to Chinese companies like DeepSeek, which […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

In the week since a Chinese AI model called DeepSeek became a household name, a dizzying number of narratives have gained steam, with varying degrees of accuracy: that the model is collecting your personal data (maybe); that it will upend AI as we know it (too soon to tell—but do read my colleague Will’s story on that!); and perhaps most notably, that DeepSeek’s new, more efficient approach means AI might not need to guzzle the massive amounts of energy that it currently does.

The latter notion is misleading, and new numbers shared with MIT Technology Review help show why. These early figures—based on the performance of one of DeepSeek’s smaller models on a small number of prompts—suggest it could be more energy intensive when generating responses than the equivalent-size model from Meta. The issue might be that the energy it saves in training is offset by its more intensive techniques for answering questions, and by the long answers they produce. 

Add the fact that other tech firms, inspired by DeepSeek’s approach, may now start building their own similar low-cost reasoning models, and the outlook for energy consumption is already looking a lot less rosy.

The life cycle of any AI model has two phases: training and inference. Training is the often months-long process in which the model learns from data. The model is then ready for inference, which happens each time anyone in the world asks it something. Both usually take place in data centers, where they require lots of energy to run chips and cool servers. 

On the training side for its R1 model, DeepSeek’s team improved what’s called a “mixture of experts” technique, in which only a portion of a model’s billions of parameters—the “knobs” a model uses to form better answers—are turned on at a given time during training. More notably, they improved reinforcement learning, where a model’s outputs are scored and then used to make it better. This is often done by human annotators, but the DeepSeek team got good at automating it

The introduction of a way to make training more efficient might suggest that AI companies will use less energy to bring their AI models to a certain standard. That’s not really how it works, though. 

“⁠Because the value of having a more intelligent system is so high,” wrote Anthropic cofounder Dario Amodei on his blog, it “causes companies to spend more, not less, on training models.” If companies get more for their money, they will find it worthwhile to spend more, and therefore use more energy. “The gains in cost efficiency end up entirely devoted to training smarter models, limited only by the company’s financial resources,” he wrote. It’s an example of what’s known as the Jevons paradox.

But that’s been true on the training side as long as the AI race has been going. The energy required for inference is where things get more interesting. 

DeepSeek is designed as a reasoning model, which means it’s meant to perform well on things like logic, pattern-finding, math, and other tasks that typical generative AI models struggle with. Reasoning models do this using something called “chain of thought.” It allows the AI model to break its task into parts and work through them in a logical order before coming to its conclusion. 

You can see this with DeepSeek. Ask whether it’s okay to lie to protect someone’s feelings, and the model first tackles the question with utilitarianism, weighing the immediate good against the potential future harm. It then considers Kantian ethics, which propose that you should act according to maxims that could be universal laws. It considers these and other nuances before sharing its conclusion. (It finds that lying is “generally acceptable in situations where kindness and prevention of harm are paramount, yet nuanced with no universal solution,” if you’re curious.)

Chain-of-thought models tend to perform better on certain benchmarks such as MMLU, which tests both knowledge and problem-solving in 57 subjects. But, as is becoming clear with DeepSeek, they also require significantly more energy to come to their answers. We have some early clues about just how much more.

Scott Chamberlin spent years at Microsoft, and later Intel, building tools to help reveal the environmental costs of certain digital activities. Chamberlin did some initial tests to see how much energy a GPU uses as DeepSeek comes to its answer. The experiment comes with a bunch of caveats: He tested only a medium-size version of DeepSeek’s R-1, using only a small number of prompts. It’s also difficult to make comparisons with other reasoning models.

DeepSeek is “really the first reasoning model that is fairly popular that any of us have access to,” he says. OpenAI’s o1 model is its closest competitor, but the company doesn’t make it open for testing. Instead, he tested it against a model from Meta with the same number of parameters: 70 billion.

The prompt asking whether it’s okay to lie generated a 1,000-word response from the DeepSeek model, which took 17,800 joules to generate—about what it takes to stream a 10-minute YouTube video. This was about 41% more energy than Meta’s model used to answer the prompt. Overall, when tested on 40 prompts, DeepSeek was found to have a similar energy efficiency to the Meta model, but DeepSeek tended to generate much longer responses and therefore was found to use 87% more energy.

How does this compare with models that use regular old-fashioned generative AI as opposed to chain-of-thought reasoning? Tests from a team at the University of Michigan in October found that the 70-billion-parameter version of Meta’s Llama 3.1 averaged just 512 joules per response.

Neither DeepSeek nor Meta responded to requests for comment.

Again: uncertainties abound. These are different models, for different purposes, and a scientifically sound study of how much energy DeepSeek uses relative to competitors has not been done. But it’s clear, based on the architecture of the models alone, that chain-of-thought models use lots more energy as they arrive at sounder answers. 

Sasha Luccioni, an AI researcher and climate lead at Hugging Face, worries that the excitement around DeepSeek could lead to a rush to insert this approach into everything, even where it’s not needed. 

“If we started adopting this paradigm widely, inference energy usage would skyrocket,” she says. “If all of the models that are released are more compute intensive and become chain-of-thought, then it completely voids any efficiency gains.”

AI has been here before. Before ChatGPT launched in 2022, the name of the game in AI was extractive—basically finding information in lots of text, or categorizing images. But in 2022, the focus switched from extractive AI to generative AI, which is based on making better and better predictions. That requires more energy. 

“That’s the first paradigm shift,” Luccioni says. According to her research, that shift has resulted in orders of magnitude more energy being used to accomplish similar tasks. If the fervor around DeepSeek continues, she says, companies might be pressured to put its chain-of-thought-style models into everything, the way generative AI has been added to everything from Google search to messaging apps. 

We do seem to be heading in a direction of more chain-of-thought reasoning: OpenAI announced on January 31 that it would expand access to its own reasoning model, o3. But we won’t know more about the energy costs until DeepSeek and other models like it become better studied.

“It will depend on whether or not the trade-off is economically worthwhile for the business in question,” says Nathan Benaich, founder and general partner at Air Street Capital. “The energy costs would have to be off the charts for them to play a meaningful role in decision-making.”

Read more

On Thursday, Microsoft announced that it’s rolling OpenAI’s reasoning model o1 out to its Copilot users, and now OpenAI is releasing a new reasoning model, o3-mini, to people who use the free version of ChatGPT. This will mark the first time that the vast majority of people will have access to one of OpenAI’s reasoning models, which were formerly restricted to its paid Pro and Plus bundles.

Reasoning models use a “chain of thought” technique to generate responses, essentially working through a problem presented to the model step by step. Using this method, the model can find mistakes in its process and correct them before giving an answer. This typically results in more thorough and accurate responses, but it also causes the models to pause before answering, sometimes leading to lengthy wait times. OpenAI claims that o3-mini responds 24% faster than o1-mini.

These types of models are most effective at solving complex problems, so if you have any PhD-level math problems you’re cracking away at, you can try them out. Alternatively, if you’ve had issues with getting previous models to respond properly to your most advanced prompts, you may want to try out this new reasoning model on them. To try out o3-mini, simply select “Reason” when you start a new prompt on ChatGPT

Although reasoning models possess new capabilities, they come at a cost. OpenAI’s o1-mini is 20 times more expensive to run than its equivalent non-reasoning model, GPT-4o mini. The company says its new model, o3-mini, costs 63% less than o1-mini per input token However, at $1.10 per million input tokens, it is still about seven times more expensive to run than GPT-4o mini.

This new model is coming right after the DeepSeek release that shook the AI world less than two weeks ago. DeepSeek’s new model performs just as well as top OpenAI models, but the Chinese company claims it cost roughly $6 million to train, as opposed to the estimated cost of over $100 million for training OpenAI’s GPT-4. (It’s worth noting that a lot of people are interrogating this claim.) 

Additionally, DeepSeek’s reasoning model costs $0.55 per million input tokens, half the price of o3-mini, so OpenAI still has a way to go to bring down its costs. It’s estimated that reasoning models also have much higher energy costs than other types, given the larger number of computations they require to produce an answer.

This new wave of reasoning models present new safety challenges as well. OpenAI used a technique called deliberative alignment to train its o-series models, basically having them reference OpenAI’s internal policies at each step of its reasoning to make sure they weren’t ignoring any rules.

But the company has found that o3-mini, like the o1 model, is significantly better than non-reasoning models at jailbreaking and “challenging safety evaluations”—essentially, it’s much harder to control a reasoning model given its advanced capabilities. o3-mini is the first model to score as “medium risk” on model autonomy, a rating given because it’s better than previous models at specific coding tasks—indicating “greater potential for self-improvement and AI research acceleration,” according to OpenAI. That said, the model is still bad at real-world research. If it were better at that, it would be rated as high risk, and OpenAI would restrict the model’s release.

Read more
1 116 117 118 119 120 2,656