Inside the messy ethics of making war with machines

In a near-future war—one that might begin tomorrow, for all we know—a soldier takes up a shooting position on an empty rooftop. His unit has been fighting through the city block by block. It feels as if enemies could be lying in silent wait behind every corner, ready to rain fire upon their marks the moment they have a shot.

Through his gunsight, the soldier scans the windows of a nearby building. He notices fresh laundry hanging from the balconies. Word comes in over the radio that his team is about to move across an open patch of ground below. As they head out, a red bounding box appears in the top left corner of the gunsight. The device’s computer vision system has flagged a potential target—a silhouetted figure in a window is drawing up, it seems, to take a shot.

The soldier doesn’t have a clear view, but in his experience the system has a superhuman capacity to pick up the faintest tell of an enemy. So he sets his crosshair upon the box and prepares to squeeze the trigger.

In different war, also possibly just over the horizon, a commander stands before a bank of monitors. An alert appears from a chatbot. It brings news that satellites have picked up a truck entering a certain city block that has been designated as a possible staging area for enemy rocket launches. The chatbot has already advised an artillery unit, which it calculates as having the highest estimated “kill probability,” to take aim at the truck and stand by.

According to the chatbot, none of the nearby buildings is a civilian structure, though it notes that the determination has yet to be corroborated manually. A drone, which had been dispatched by the system for a closer look, arrives on scene. Its video shows the truck backing into a narrow passage between two compounds. The opportunity to take the shot is rapidly coming to a close.

For the commander, everything now falls silent. The chaos, the uncertainty, the cacophony—all reduced to the sound of a ticking clock and the sight of a single glowing button:

“APPROVE FIRE ORDER.”

To pull the trigger—or, as the case may be, not to pull it. To hit the button, or to hold off. Legally—and ethically—the role of the soldier’s decision in matters of life and death is preeminent and indispensable. Fundamentally, it is these decisions that define the human act of war.

It should be of little surprise, then, that states and civil society have taken up the question of intelligent autonomous weapons—weapons that can select and fire upon targets without any human input—as a matter of serious concern. In May, after close to a decade of discussions, parties to the UN’s Convention on Certain Conventional Weapons agreed, among other recommendations, that militaries using them probably need to “limit the duration, geographical scope, and scale of the operation” to comply with the laws of war. The line was nonbinding, but it was at least an acknowledgment that a human has to play a part—somewhere, sometime—in the immediate process leading up to a killing.

But intelligent autonomous weapons that fully displace human decision-making have (likely) yet to see real-world use. Even the “autonomous” drones and ships fielded by the US and other powers are used under close human supervision. Meanwhile, intelligent systems that merely guide the hand that pulls the trigger have been gaining purchase in the warmaker’s tool kit. And they’ve quietly become sophisticated enough to raise novel questions—ones that are trickier to answer than the well-covered wrangles over killer robots and, with each passing day, more urgent: What does it mean when a decision is only part human and part machine? And when, if ever, is it ethical for that decision to be a decision to kill?

For a long time, the idea of supporting a human decision by computerized means wasn’t such a controversial prospect. Retired Air Force lieutenant general Jack Shanahan says the radar on the F4 Phantom fighter jet he flew in the 1980s was a decision aid of sorts. It alerted him to the presence of other aircraft, he told me, so that he could figure out what to do about them. But to say that the crew and the radar were coequal accomplices would be a stretch.

That has all begun to change. “What we’re seeing now, at least in the way that I see this, is a transition to a world [in] which you need to have humans and machines … operating in some sort of team,” says Shanahan.

The rise of machine learning, in particular, has set off a paradigm shift in how militaries use computers to help shape the crucial decisions of warfare—up to, and including, the ultimate decision. Shanahan was the first director of Project Maven, a Pentagon program that developed target recognition algorithms for video footage from drones. The project, which kicked off a new era of American military AI, was launched in 2017 after a study concluded that “deep learning algorithms can perform at near-human levels.” (It also sparked controversy—in 2018, more than 3,000 Google employees signed a letter of protest against the company’s involvement in the project.)

With machine-learning-based decision tools, “you have more apparent competency, more breadth” than earlier tools afforded, says Matt Turek, deputy director of the Information Innovation Office at the Defense Advanced Research Projects Agency. “And perhaps a tendency, as a result, to turn over more decision-making to them.”

A soldier on the lookout for enemy snipers might, for example, do so through the Assault Rifle Combat Application System, a gunsight sold by the Israeli defense firm Elbit Systems. According to a company spec sheet, the “AI-powered” device is capable of “human target detection” at a range of more than 600 yards, and human target “identification” (presumably, discerning whether a person is someone who could be shot) at about the length of a football field. Anna Ahronheim-Cohen, a spokesperson for the company, told MIT Technology Review, “The system has already been tested in real-time scenarios by fighting infantry soldiers.”

Another gunsight, built by the company Smartshooter, is advertised as having similar capabilities. According to the company’s website, it can also be packaged into a remote-controlled machine gun like the one that Israeli agents used to assassinate the Iranian nuclear scientist Mohsen Fakhrizadeh in 2021.

Decision support tools that sit at a greater remove from the battlefield can be just as decisive. The Pentagon appears to have used AI in the sequence of intelligence analyses and decisions leading up to a potential strike, a process known as a kill chain—though it has been cagey on the details. In response to questions from MIT Technology Review, Laura McAndrews, an Air Force spokesperson, wrote that the service “is utilizing a human-machine teaming approach.”

The range of judgment calls that go into military decision-making is vast. And it doesn’t always take artificial super-intelligence to dispense with them by automated means.

Other countries are more openly experimenting with such automation. Shortly after the Israel-Palestine conflict in 2021, the Israel Defense Forces said it had used what it described as AI tools to alert troops of imminent attacks and to propose targets for operations.

The Ukrainian army uses a program, GIS Arta, that pairs each known Russian target on the battlefield with the artillery unit that is, according to the algorithm, best placed to shoot at it. A report by The Times, a British newspaper, likened it to Uber’s algorithm for pairing drivers and riders, noting that it significantly reduces the time between the detection of a target and the moment that target finds itself under a barrage of firepower. Before the Ukrainians had GIS Arta, that process took 20 minutes. Now it reportedly takes one.

Russia claims to have its own command-and-control system with what it calls artificial intelligence, but it has shared few technical details. Gregory Allen, the director of the Wadhwani Center for AI and Advanced Technologies and one of the architects of the Pentagon’s current AI policies, told me it’s important to take some of these claims with a pinch of salt. He says some of Russia’s supposed military AI is “stuff that everyone has been doing for decades,” and he calls GIS Arta “just traditional software.”

The range of judgment calls that go into military decision-making, however, is vast. And it doesn’t always take artificial super-intelligence to dispense with them by automated means. There are tools for predicting enemy troop movements, tools for figuring out how to take out a given target, and tools to estimate how much collateral harm is likely to befall any nearby civilians.

None of these contrivances could be called a killer robot. But the technology is not without its perils. Like any complex computer, an AI-based tool might glitch in unusual and unpredictable ways; it’s not clear that the human involved will always be able to know when the answers on the screen are right or wrong. In their relentless efficiency, these tools may also not leave enough time and space for humans to determine if what they’re doing is legal. In some areas, they could perform at such superhuman levels that something ineffable about the act of war could be lost entirely.

Eventually militaries plan to use machine intelligence to stitch many of these individual instruments into a single automated network that links every weapon, commander, and soldier to every other. Not a kill chain, but—as the Pentagon has begun to call it—a kill web.

In these webs, it’s not clear whether the human’s decision is, in fact, very much of a decision at all. Rafael, an Israeli defense giant, has already sold one such product, Fire Weaver, to the IDF (it has also demonstrated it to the US Department of Defense and the German military). According to company materials, Fire Weaver finds enemy positions, notifies the unit that it calculates as being best placed to fire on them, and even sets a crosshair on the target directly in that unit’s weapon sights. The human’s role, according to one video of the software, is to choose between two buttons: “Approve” and “Abort.”

Let’s say that the silhouette in the window was not a soldier, but a child. Imagine that the truck was not delivering warheads to the enemy, but water pails to a home.

Of the DoD’s five “ethical principles for artificial intelligence,” which are phrased as qualities, the one that’s always listed first is “Responsible.” In practice, this means that when things go wrong, someone—a human, not a machine—has got to hold the bag.

Of course, the principle of responsibility long predates the onset of artificially intelligent machines. All the laws and mores of war would be meaningless without the fundamental common understanding that every deliberate act in the fight is always on someone. But with the prospect of computers taking on all manner of sophisticated new roles, the age-old precept has newfound resonance.

“Now for me, and for most people I ever knew in uniform, this was core to who we were as commanders: that somebody ultimately will be held responsible,” says Shanahan, who after Maven became the inaugural director of the Pentagon’s Joint Artificial Intelligence Center and oversaw the development of the AI ethical principles.

This is why a human hand must squeeze the trigger, why a human hand must click “Approve.” If a computer sets its sights upon the wrong target, and the soldier squeezes the trigger anyway, that’s on the soldier. “If a human does something that leads to an accident with the machine—say, dropping a weapon where it shouldn’t have—that’s still a human’s decision that was made,” Shanahan says.

But accidents happen. And this is where things get tricky. Modern militaries have spent hundreds of years figuring out how to differentiate the unavoidable, blameless tragedies of warfare from acts of malign intent, misdirected fury, or gross negligence. Even now, this remains a difficult task. Outsourcing a part of human agency and judgment to algorithms built, in many cases, around the mathematical principle of optimization will challenge all this law and doctrine in a fundamentally new way, says Courtney Bowman, global director of privacy and civil liberties engineering at Palantir, a US-headquartered firm that builds data management software for militaries, governments, and large companies.

“It’s a rupture. It’s disruptive,” Bowman says. “It requires a new ethical construct to be able to make sound decisions.”

This year, in a move that was inevitable in the age of ChatGPT, Palantir announced that it is developing software called the Artificial Intelligence Platform, which allows for the integration of large language models into the company’s military products. In a demo of AIP posted to YouTube this spring, the platform alerts the user to a potentially threatening enemy movement. It then suggests that a drone be sent for a closer look, proposes three possible plans to intercept the offending force, and maps out an optimal route for the selected attack team to reach them.

And yet even with a machine capable of such apparent cleverness, militaries won’t want the user to blindly trust its every suggestion. If the human presses only one button in a kill chain, it probably should not be the “I believe” button, as a concerned but anonymous Army operative once put it in a DoD war game in 2019.

In a program called Urban Reconnaissance through Supervised Autonomy (URSA), DARPA built a system that enabled robots and drones to act as forward observers for platoons in urban operations. After input from the project’s advisory group on ethical and legal issues, it was decided that the software would only ever designate people as “persons of interest.” Even though the purpose of the technology was to help root out ambushes, it would never go so far as to label anyone as a “threat.”

This, it was hoped, would stop a soldier from jumping to the wrong conclusion. It also had a legal rationale, according to Brian Williams, an adjunct research staff member at the Institute for Defense Analyses who led the advisory group. No court had positively asserted that a machine could legally designate a person a threat, he says. (Then again, he adds, no court had specifically found that it would be illegal, either, and he acknowledges that not all military operators would necessarily share his group’s cautious reading of the law.) According to Williams, DARPA initially wanted URSA to be able to autonomously discern a person’s intent; this feature too was scrapped at the group’s urging.

Bowman says Palantir’s approach is to work “engineered inefficiencies” into “points in the decision-making process where you actually do want to slow things down.” For example, a computer’s output that points to an enemy troop movement, he says, might require a user to seek out a second corroborating source of intelligence before proceeding with an action (in the video, the Artificial Intelligence Platform does not appear to do this).

“If people of interest are identified on a screen as red dots, that’s going to have a different subconscious implication than if people of interest are identified on a screen as little happy faces.”

Rebecca Crootof, law professor at the University of Richmond

In the case of AIP, Bowman says the idea is to present the information in such a way “that the viewer understands, the analyst understands, this is only a suggestion.” In practice, protecting human judgment from the sway of a beguilingly smart machine could come down to small details of graphic design. “If people of interest are identified on a screen as red dots, that’s going to have a different subconscious implication than if people of interest are identified on a screen as little happy faces,” says Rebecca Crootof, a law professor at the University of Richmond, who has written extensively about the challenges of accountability in human-in-the-loop autonomous weapons.

In some settings, however, soldiers might only want an “I believe” button. Originally, DARPA envisioned URSA as a wrist-worn device for soldiers on the front lines. “In the very first working group meeting, we said that’s not advisable,” Williams told me. The kind of engineered inefficiency necessary for responsible use just wouldn’t be practicable for users who have bullets whizzing by their ears. Instead, they built a computer system that sits with a dedicated operator, far behind the action.

But some decision support systems are definitely designed for the kind of split-second decision-making that happens right in the thick of it. The US Army has said that it has managed, in live tests, to shorten its own 20-minute targeting cycle to 20 seconds. Nor does the market seem to have embraced the spirit of restraint. In demo videos posted online, the bounding boxes for the computerized gunsights of both Elbit and Smartshooter are blood red.

Other times, the computer will be right and the human will be wrong.

If the soldier on the rooftop had second-guessed the gunsight, and it turned out that the silhouette was in fact an enemy sniper, his teammates could have paid a heavy price for his split second of hesitation.

This is a different source of trouble, much less discussed but no less likely in real-world combat. And it puts the human in something of a pickle. Soldiers will be told to treat their digital assistants with enough mistrust to safeguard the sanctity of their judgment. But with machines that are often right, this same reluctance to defer to the computer can itself become a point of avertable failure.

Aviation history has no shortage of cases where a human pilot’s refusal to heed the machine led to catastrophe. These (usually perished) souls have not been looked upon kindly by investigators seeking to explain the tragedy. Carol J. Smith, a senior research scientist at Carnegie Mellon University’s Software Engineering Institute who helped craft responsible AI guidelines for the DoD’s Defense Innovation Unit, doesn’t see an issue: “If the person in that moment feels that the decision is wrong, they’re making it their call, and they’re going to have to face the consequences.”

For others, this is a wicked ethical conundrum. The scholar M.C. Elish has suggested that a human who is placed in this kind of impossible loop could end up serving as what she calls a “moral crumple zone.” In the event of an accident—regardless of whether the human was wrong, the computer was wrong, or they were wrong together—the person who made the “decision” will absorb the blame and protect everyone else along the chain of command from the full impact of accountability.

In an essay, Smith wrote that the “lowest-paid person” should not be “saddled with this responsibility,” and neither should “the highest-paid person.” Instead, she told me, the responsibility should be spread among everyone involved, and the introduction of AI should not change anything about that responsibility.

In practice, this is harder than it sounds. Crootof points out that even today, “there’s not a whole lot of responsibility for accidents in war.” As AI tools become larger and more complex, and as kill chains become shorter and more web-like, finding the right people to blame is going to become an even more labyrinthine task.

Those who write these tools, and the companies they work for, aren’t likely to take the fall. Building AI software is a lengthy, iterative process, often drawing from open-source code, which stands at a distant remove from the actual material facts of metal piercing flesh. And barring any significant changes to US law, defense contractors are generally protected from liability anyway, says Crootof.

Any bid for accountability at the upper rungs of command, meanwhile, would likely find itself stymied by the heavy veil of government classification that tends to cloak most AI decision support tools and the manner in which they are used. The US Air Force has not been forthcoming about whether its AI has even seen real-world use. Shanahan says Maven’s AI models were deployed for intelligence analysis soon after the project launched, and in 2021 the secretary of the Air Force said that “AI algorithms” had recently been applied “for the first time to a live operational kill chain,” with an Air Force spokesperson at the time adding that these tools were available in intelligence centers across the globe “whenever needed.” But Laura McAndrews, the Air Force spokesperson, saidthat in fact these algorithms “were not applied in a live, operational kill chain” and declined to detail any other algorithms that may, or may not, have been used since.

The real story might remain shrouded for years. In 2018, the Pentagon issued a determination that exempts Project Maven from Freedom of Information requests. Last year, it handed the entire program to the National Geospatial-Intelligence Agency,which is responsible for processing America’s vast intake of secret aerial surveillance. Responding to questions about whether the algorithms are used in kill chains, Robbin Brooks, an NGA spokesperson, told MIT Technology Review, “We can’t speak to specifics of how and where Maven is used.”

In one sense, what’s new here is also old. We routinely place our safety—indeed, our entire existence as a species—in the hands of other people. Those decision-makers defer, in turn, to machines that they do not entirely comprehend.

In an exquisite essay on automation published in 2018, at a time when operational AI-enabled decision support was still a rarity, former Navy secretary Richard Danzig pointed out that if a president “decides” to order a nuclear strike, it will not be because anyone has looked out the window of the Oval Office and seen enemy missiles raining down on DC but, rather, because those missiles have been detected, tracked, and identified—one hopes correctly—by algorithms in the air defense network.

As in the case of a commander who calls in an artillery strike on the advice of a chatbot, or a rifleman who pulls the trigger at the mere sight of a red bounding box, “the most that can be said is that ‘a human being is involved,’” Danzig wrote.

“This is a common situation in the modern age,” he wrote. “Human decisionmakers are riders traveling across obscured terrain with little or no ability to assess the powerful beasts that carry and guide them.”

There can be an alarming streak of defeatism among the people responsible for making sure that these beasts don’t end up eating us. During a number of conversations I had while reporting this story, my interlocutor would land on a sobering note of acquiescence to the perpetual inevitability of death and destruction that, while tragic, cannot be pinned on any single human. War is messy, technologies fail in unpredictable ways, and that’s just that.

“In warfighting,” says Bowman of Palantir, “[in] the application of any technology, let alone AI, there is some degree of harm that you’re trying to—that you have to accept, and the game is risk reduction.”

It is possible, though not yet demonstrated, that bringing artificial intelligence to battle may mean fewer civilian casualties, as advocates often claim. But there could be a hidden cost to irrevocably conjoining human judgment and mathematical reasoning in those ultimate moments of war—a cost that extends beyond a simple, utilitarian bottom line. Maybe something just cannot be right, should not be right, about choosing the time and manner in which a person dies the way you hail a ride from Uber.

To a machine, this might be suboptimal logic. But for certain humans, that’s the point. “One of the aspects of judgment, as a human capacity, is that it’s done in an open world,” says Lucy Suchman, a professor emerita of anthropology at Lancaster University, who has been writing about the quandaries of human-machine interaction for four decades.

The parameters of life-and-death decisions—knowing the meaning of the fresh laundry hanging from a window while also wanting your teammates not to die—are “irreducibly qualitative,” she says. The chaos and the noise and the uncertainty, the weight of what is right and what is wrong in the midst of all that fury—not a whit of this can be defined in algorithmic terms. In matters of life and death, there is no computationally perfect outcome. “And that’s where the moral responsibility comes from,” she says. “You’re making a judgment.”

The gunsight never pulls the trigger. The chatbot never pushes the button. But each time a machine takes on a new role that reduces the irreducible, we may be stepping a little closer to the moment when the act of killing is altogether more machine than human, when ethics becomes a formula and responsibility becomes little more than an abstraction. If we agree that we don’t want to let the machines take us all the way there, sooner or later we will have to ask ourselves: Where is the line?

Arthur Holland Michel writes about technology. He is based in Barcelona and can be found, occasionally, in New York.