Deploying high-performance, energy-efficient AI

Although AI is by no means a new technology there have been massive and rapid investments in it and large language models. However, the high-performance computing that powers these rapidly growing AI tools — and enables record automation and operational efficiency — also consumes a staggering amount of energy. With the proliferation of AI comes the responsibility to deploy that AI responsibly and with an eye to sustainability during hardware and software R&D as well as within data centers.

“Enterprises need to be very aware of the energy consumption of their digital technologies, how big it is, and how their decisions are affecting it,” says corporate vice president and general manager of data center platform engineering and architecture at Intel, Zane Ball.

One of the key drivers of a more sustainable AI is modularity, says Ball. Modularity breaks down subsystems of a server into standard building blocks, defining interfaces between those blocks so they can work together. This system can reduce the amount of embodied carbon in a server’s hardware components and allows for components of the overall ecosystem to be reused, subsequently reducing R&D investments.

Downsizing infrastructure within data centers, hardware, and software can also help enterprises reach greater energy efficiency without compromising function or performance. While very large AI models require megawatts of super compute power, smaller, fine-tuned models that operate within a specific knowledge domain can maintain high performance but low energy consumption.

“You give up that kind of amazing general purpose use like when you’re using ChatGPT-4 and you can ask it everything from 17th century Italian poetry to quantum mechanics, if you narrow your range, these smaller models can give you equivalent or better kind of capability, but at a tiny fraction of the energy consumption,” says Ball.

The opportunities for greater energy efficiency within AI deployment will only expand over the next three to five years. Ball forecasts significant hardware optimization strides, the rise of AI factories — facilities that train AI models on a large scale while modulating energy consumption based on its availability — as well as the continued growth of liquid cooling, driven by the need to cool the next generation of powerful AI innovations.

“I think making those solutions available to our customers is starting to open people’s eyes how energy efficient you can be while not really giving up a whole lot in terms of the AI use case that you’re looking for.”

This episode of Business Lab is produced in partnership with Intel.

Full Transcript

Laurel Ruma: From MIT Technology Review, I’m Laurel Ruma and this is Business Lab, the show that helps business leaders make sense of new technologies coming out of the lab and into the marketplace.

Our topic is building a better AI architecture. Going green isn’t for the faint of heart, but it’s also a pressing need for many, if not all enterprises. AI provides many opportunities for enterprises to make better decisions, so how can it also help them be greener?

Two words for you: sustainable AI.

My guest is Zane Ball, corporate vice president and general manager of data center platform engineering and architecture at Intel.

This podcast is produced in partnership with Intel.

Welcome Zane.

Zane Ball: Good morning.

Laurel: So to set the stage for our conversation, let’s start off with the big topic. As AI transforms businesses across industries, it brings the benefits of automation and operational efficiency, but that high-performance computing also consumes more energy. Could you give an overview of the current state of AI infrastructure and sustainability at the large enterprise level?

Zane: Absolutely. I think it helps to just kind of really zoom out big picture, and if you look at the history of IT services maybe in the last 15 years or so, obviously computing has been expanding at a very fast pace. And the good news about that history of the last 15 years or so, is while computing has been expanding fast, we’ve been able to contain the growth in energy consumption overall. There was a great study a couple of years ago in Science Magazine that talked about how compute had grown by maybe 550% over a decade, but that we had just increased electricity consumption by a few percent. So those kind of efficiency gains were really profound. So I think the way to kind of think about it is computing’s been expanding rapidly, and that of course creates all kinds of benefits in society, many of which reduce carbon emissions elsewhere.

But we’ve been able to do that without growing electricity consumption all that much. And that’s kind of been possible because of things like Moore’s Law, Big Silicon has been improving with every couple of years and make devices smaller, they consume less power, things get more efficient. That’s part of the story. Another big part of this story is the advent of these hyperscale data centers. So really, really large-scale computing facilities, finding all kinds of economies of scale and efficiencies, high utilization of hardware, not a lot of idle hardware sitting around. That also was a very meaningful energy efficiency. And then finally this development of virtualization, which allowed even more efficient utilization of hardware. So those three things together allowed us to kind of accomplish something really remarkable. And during that time, we also had AI starting to play, I think since about 2015, AI workloads started to play a pretty significant role in digital services of all kinds.

But then just about a year ago, ChatGPT happens and we have a non-linear shift in the environment and suddenly large language models, probably not news to anyone on this listening to this podcast, has pivoted to the center and there’s just a breakneck investment across the industry to build very, very fast. And what is also driving that is that not only is everyone rushing to take advantage of this amazing large language model kind of technology, but that technology itself is evolving very quickly. And in fact also quite well known, these models are growing in size at a rate of about 10x per year. So the amount of compute required is really sort of staggering. And when you think of all the digital services in the world now being infused with AI use cases with very large models, and then those models themselves growing 10x per year, we’re looking at something that’s not very similar to that last decade where our efficiency gains and our greater consumption were almost penciling out.

Now we’re looking at something I think that’s not going to pencil out. And we’re really facing a really significant growth in energy consumption in these digital services. And I think that’s concerning. And I think that means that we’ve got to take some strong actions across the industry to get on top of this. And I think just the very availability of electricity at this scale is going to be a key driver. But of course many companies have net-zero goals. And I think as we pivot into some of these AI use cases, we’ve got work to do to square all of that together.

Laurel: Yeah, as you mentioned, the challenges are trying to develop sustainable AI and making data centers more energy efficient. So could you describe what modularity is and how a modularity ecosystem can power a more sustainable AI?

Zane: Yes, I think over the last three or four years, there’ve been a number of initiatives. Intel’s played a big part of this as well of re-imagining how servers are engineered into modular components. And really modularity for servers is just exactly as it sounds. We break different subsystems of the server down into some standard building blocks, define some interfaces between those standard building blocks so that they can work together. And that has a number of advantages. Number one, from a sustainability point of view, it lowers the embodied carbon of those hardware components. Some of these hardware components are quite complex and very energy intensive to manufacture. So imagine a 30 layer circuit board, for example, is a pretty carbon intensive piece of hardware. I don’t want the entire system, if only a small part of it needs that kind of complexity. I can just pay the price of the complexity where I need it.

And by being intelligent about how we break up the design in different pieces, we bring that embodied carbon footprint down. The reuse of pieces also becomes possible. So when we upgrade a system, maybe to a new telemetry approach or a new security technology, there’s just a small circuit board that has to be replaced versus replacing the whole system. Or maybe a new microprocessor comes out and the processor module can be replaced without investing in new power supplies, new chassis, new everything. And so that circularity and reuse becomes a significant opportunity. And so that embodied carbon aspect, which is about 10% of carbon footprint in these data centers can be significantly improved. And another benefit of the modularity, aside from the sustainability, is it just brings R&D investment down. So if I’m going to develop a hundred different kinds of servers, if I can build those servers based on the very same building blocks just configured differently, I’m going to have to invest less money, less time. And that is a real driver of the move towards modularity as well.

Laurel: So what are some of those techniques and technologies like liquid cooling and ultrahigh dense compute that large enterprises can use to compute more efficiently? And what are their effects on water consumption, energy use, and overall performance as you were outlining earlier as well?

Zane: Yeah, those are two I think very important opportunities. And let’s just take them one at a time. Emerging AI world, I think liquid cooling is probably one of the most important low hanging fruit opportunities. So in an air cooled data center, a tremendous amount of energy goes into fans and chillers and evaporative cooling systems. And that is actually a significant part. So if you move a data center to a fully liquid cooled solution, this is an opportunity of around 30% of energy consumption, which is sort of a wow number. I think people are often surprised just how much energy is burned. And if you walk into a data center, you almost need ear protection because it’s so loud and the hotter the components get, the higher the fan speeds get, and the more energy is being burned in the cooling side and liquid cooling takes a lot of that off the table.

What offsets that is liquid cooling is a bit complex. Not everyone is fully able to utilize it. There’s more upfront costs, but actually it saves money in the long run. So the total cost of ownership with liquid cooling is very favorable, and as we’re engineering new data centers from the ground up. Liquid cooling is a really exciting opportunity and I think the faster we can move to liquid cooling, the more energy that we can save. But it’s a complicated world out there. There’s a lot of different situations, a lot of different infrastructures to design around. So we shouldn’t trivialize how hard that is for an individual enterprise. One of the other benefits of liquid cooling is we get out of the business of evaporating water for cooling. A lot of North America data centers are in arid regions and use large quantities of water for evaporative cooling.

That is good from an energy consumption point of view, but the water consumption can be really extraordinary. I’ve seen numbers getting close to a trillion gallons of water per year in North America data centers alone. And then in humid climates like in Southeast Asia or eastern China for example, that evaporative cooling capability is not as effective and so much more energy is burned. And so if you really want to get to really aggressive energy efficiency numbers, you just can’t do it with evaporative cooling in those humid climates. And so those geographies are kind of the tip of the spear for moving into liquid cooling.

The other opportunity you mentioned was density and bringing higher and higher density of computing has been the trend for decades. That is effectively what Moore’s Law has been pushing us forward. And I think it’s just important to realize that’s not done yet. As much as we think about racks of GPUs and accelerators, we can still significantly improve energy consumption with higher and higher density traditional servers that allows us to pack what might’ve been a whole row of racks into a single rack of computing in the future. And those are substantial savings. And at Intel, we’ve announced we have an upcoming processor that has 288 CPU cores and 288 cores in a single package enables us to build racks with as many as 11,000 CPU cores. So the energy savings there is substantial, not just because those chips are very, very efficient, but because the amount of networking equipment and ancillary things around those systems is a lot less because you’re using those resources more efficiently with those very high dense components. So continuing, if perhaps even accelerating our path to this ultra-high dense kind of computing is going to help us get to the energy savings we need maybe to accommodate some of those larger models that are coming.

Laurel: Yeah, that definitely makes sense. And this is a good segue into this other part of it, which is how data centers and hardware as well software can collaborate to create greater energy efficient technology without compromising function. So how can enterprises invest in more energy efficient hardware such as hardware-aware software, and as you were mentioning earlier, large language models or LLMs with smaller downsized infrastructure but still reap the benefits of AI?

Zane: I think there are a lot of opportunities, and maybe the most exciting one that I see right now is that even as we’re pretty wowed and blown away by what these really large models are able to do, even though they require tens of megawatts of super compute power to do, you can actually get a lot of those benefits with far smaller models as long as you’re content to operate them within some specific knowledge domain. So we’ve often referred to these as expert models. So take for example an open source model like the Llama 2 that Meta produced. So there’s like a 7 billion parameter version of that model. There’s also, I think, a 13 and 70 billion parameter versions of that model compared to a GPT-4, maybe something like a trillion element model. So it’s far, far, far smaller, but when you fine tune that model with data to a specific use case, so if you’re an enterprise, you’re probably working on something fairly narrow and specific that you’re trying to do.

Maybe it’s a customer service application or it’s a financial services application, and you as an enterprise have a lot of data from your operations, that’s data that you own and you have the right to use to train the model. And so even though that’s a much smaller model, when you train it on that domain specific data, the domain specific results can be quite good in some cases even better than the large model. So you give up that kind of amazing general purpose use like when you’re using ChatGPT-4 and you can ask it everything from 17th century Italian poetry to quantum mechanics, if you narrow your range, these smaller models can give you equivalent or better kind of capability, but at a tiny fraction of the energy consumption.

And we’ve demonstrated a few times, even with just a standard Intel Xeon two socket server with some of the AI acceleration technologies we have in those systems, you can actually deliver quite a good experience. And that’s without even any GPUs involved in the system. So that’s just good old-fashioned servers and I think that’s pretty exciting.

That also means the technology’s quite accessible, right? So you may be an enterprise, you have a general purpose infrastructure that you use for a lot of things, you can use that for AI use cases as well. And if you’ve taken advantage of these smaller models that fit within infrastructure we already have or infrastructure that you can easily obtain. And so those smaller models are pretty exciting opportunities. And I think that’s probably one of the first things the industry will adopt to get energy consumption under control is just right sizing the model to the activity to the use case that we’re targeting. I think there’s also… you mentioned the concept of hardware-aware software. I think that the collaboration between hardware and software has always been an opportunity for significant efficiency gains.

I mentioned early on in this conversation how virtualization was one of the pillars that gave us that kind of fantastic result over the last 15 years. And that was very much exactly that. That’s bringing some deep collaboration between the operating system and the hardware to do something remarkable. And a lot of the acceleration that exists in AI today actually is a similar kind of thinking, but that’s not really the end of the hardware software collaboration. We can deliver quite stunning results in encryption and in memory utilization in a lot of areas. And I think that that’s got to be an area where the industry is ready to invest. It is very easy to have plug and play hardware where everyone programs at a super high level language, nobody thinks about the impact of their software application downstream. I think that’s going to have to change. We’re going to have to really understand how our application designs are impacting energy consumption going forward. And it isn’t purely a hardware problem. It’s got to be hardware and software working together.

Laurel: And you’ve outlined so many of these different kind of technologies. So how can enterprise adoption of things like modularity and liquid cooling and hardware aware software be incentivized to actually make use of all these new technologies?

Zane: A year ago, I worried a lot about that question. How do we get people who are developing new applications to just be aware of the downstream implications? One of the benefits of this revolution in the last 12 months is I think just availability of electricity is going to be a big challenge for many enterprises as they seek to adopt some of these energy intensive applications. And I think the hard reality of energy availability is going to bring some very strong incentives very quickly to attack these kinds of problems.

But I do think beyond that like a lot of areas in sustainability, accounting is really important. There’s a lot of good intentions. There’s a lot of companies with net-zero goals that they’re serious about. They’re willing to take strong actions against those goals. But if you can’t accurately measure what your impact is either as an enterprise or as a software developer, I think you have to kind of find where the point of action is, where does the rubber meet the road where a micro decision is being made. And if the carbon impact of that is understood at that point, then I think you can see people take the actions to take advantage of the tools and capabilities that are there to get a better result. And so I know there’s a number of initiatives in the industry to create that kind of accounting, and especially for software development, I think that’s going to be really important.

Laurel: Well, it’s also clear there’s an imperative for enterprises that are trying to take advantage of AI to curb that energy consumption as well as meet their environmental, social, and governance or ESG goals. So what are the major challenges that come with making more sustainable AI and computing transformations?

Zane: It’s a complex topic, and I think we’ve already touched on a couple of them. Just as I was just mentioning, definitely getting software developers to understand their impact within the enterprise. And if I’m an enterprise that’s procuring my applications and software, maybe cloud services, I need to make sure that accounting is part of my procurement process, that in some cases that’s gotten easier. In some cases, there’s still work to do. If I’m operating my own infrastructure, I really have to look at liquid cooling, for example, an adoption of some of these more modern technologies that let us get to significant gains in energy efficiency. And of course, really looking at the use cases and finding the most energy efficient architecture for that use case. For example, like using those smaller models that I was talking about. Enterprises need to be very aware of the energy consumption of their digital technologies, how big it is and how their decisions are affecting it.

Laurel: So could you offer an example or use case of one of those energy efficient AI driven architectures and how AI was subsequently deployed for it?

Zane: Yes. I think that some of the best examples I’ve seen in the last year were really around these smaller models where Intel did an example that we published around financial services, and we found that something like three hours of fine-tuning training on financial services data allowed us to create a chatbot solution that performed in an outstanding manner on a standard Xeon processor. And I think making those solutions available to our customers is starting to open people’s eyes how energy efficient you can be while not really giving up a whole lot in terms of the AI use case that you’re looking for. And so I think we need to just continue to get those examples out there. We have a number of collaborations such as with Hugging Face with open source models, enabling those solutions on our products like our Gaudi2 accelerator has also performed very well from a performance per watt point of view, the Xeon processor itself. So those are great opportunities.

Laurel: And then how do you envision the future of AI and sustainability in the next three to five years? There seems like so much opportunity here.

Zane: I think there’s going to be so much change in the next three to five years. I hope no one holds me to what I’m about to say, but I think there are some pretty interesting trends out there. One thing, I think, to think about is the trend of AI factories. So training a model is a little bit of an interesting activity that’s distinct from what we normally think of as real time digital services. You have real time digital service like Vinnie, the app on your iPhone that’s connected somewhere in the cloud, and that’s a real time experience. And it’s all about 99.999% uptime, short latencies to deliver that user experience that people expect. But AI training is different. It’s a little bit more like a factory. We produce models as a product and then the models are used to create the digital services. And that I think becomes an important distinction.

So I can actually build some giant gigawatt facility somewhere that does nothing but train models on a large scale. I can partner with the infrastructure of the electricity providers and utilities much like an aluminum plant or something would do today where I actually modulate my energy consumption with its availability. Or maybe I take advantage of solar or wind power’s ability, I can modulate when I’m consuming power, not consuming power. And so I think if we’re going to see some really large scale kinds of efforts like that, and those AI factories could be very, very efficient, they can be liquid cooled and they can be closely coupled to the utility infrastructure. I think that’s a pretty exciting opportunity. And while that’s kind of an acknowledgement that there’s going to be gigawatts and gigawatts of AI training going on. Second opportunity, I think in this three to five years, I do think liquid cooling will become far more pervasive.

I think that will be driven by the need to cool the next generation of accelerators and GPUs will make it a requirement, but then that will be able to build that technology out and scale it more ubiquitously for all kinds of infrastructure. And that will let us shave huge amounts of gigawatts out of the infrastructure, save hundreds of billions of gallons of water annually. I think that’s incredibly exciting. And if I just… the innovation on the model size as well, so much has changed with just the last five years with large language models like ChatGPT, let’s not assume there’s not going to be even bigger change in the next three to five years. What are the new problems that are going to be solved, new innovations? So I think as the costs and impact of AI are being felt more substantively, there’ll be a lot of innovation on the model side and people will come up with new ways of cracking some of these problems and there’ll be new exciting use cases that come about.

Finally, I think on the hardware side, there will be new AI architectures. From an acceleration point of view today, a lot of AI performance is limited by memory bandwidth, memory bandwidth and networking bandwidth between the various accelerator components. And I don’t think we’re anywhere close to having an optimized AI training system or AI inferencing systems. I think the discipline is moving faster than the hardware and there’s a lot of opportunity for optimization. So I think we’ll see significant differences in networking, significant differences in memory solutions over the next three to five years, and certainly over the next 10 years that I think can open up a substantial set of improvements.

And of course, Moore’s Law itself continues to advance advanced packaging technologies, new transistor types that allow us to build ever more ambitious pieces of silicon, which will have substantially higher energy efficiency. So all of those things I think will be important. Whether we can keep up with our energy efficiency gains with the explosion in AI functionality, I think that’s the real question and it’s just going to be a super interesting time. I think it’s going to be a very innovative time in the computing industry over the next few years.

Laurel: And we’ll have to see. Zane, thank you so much for joining us on the Business Lab.

Zane: Thank you.

Laurel: That was Zane Ball, corporate vice president and general manager of data center platform engineering and architecture at Intel, who I spoke with from Cambridge, Massachusetts, the home of MIT and MIT Technology Review.

That’s it for this episode of Business Lab. I’m your host, Laurel Ruma. I’m the director of Insights, the custom publishing division of MIT Technology Review. We were founded in 1899 at the Massachusetts Institute of Technology, and you can also find us in print, on the web, and at events each year around the world. For more information about us and the show, please check out our website at technologyreview.com.

This show is available wherever you get your podcasts. If you enjoyed this episode, we hope you’ll take a moment to rate and review us. Business Lab is a production of MIT Technology Review. This episode was produced by Giro Studios. Thanks for listening.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.