This AI-powered “black box” could make surgery safer

The first time Teodor Grantcharov sat down to watch himself perform surgery, he wanted to throw the VHS tape out the window.

“My perception was that my performance was spectacular,” Grantcharov says, and then pauses—“until the moment I saw the video.” Reflecting on this operation from 25 years ago, he remembers the roughness of his dissection, the wrong instruments used, the inefficiencies that transformed a 30-minute operation into a 90-minute one. “I didn’t want anyone to see it.”

This reaction wasn’t exactly unique. The operating room has long been defined by its hush-hush nature—what happens in the OR stays in the OR—because surgeons are notoriously bad at acknowledging their own mistakes. Grantcharov jokes that when you ask “Who are the top three surgeons in the world?” a typical surgeon “always has a challenge identifying who the other two are.”

But after the initial humiliation over watching himself work, Grantcharov started to see the value in recording his operations. “There are so many small details that normally take years and years of practice to realize—that some surgeons never get to that point,” he says. “Suddenly, I could see all these insights and opportunities overnight.”

There was a big problem, though: it was the ’90s, and spending hours playing back grainy VHS recordings wasn’t a realistic quality improvement strategy. It would have been nearly impossible to determine how often his relatively mundane slipups happened at scale—not to mention more serious medical errors like those that kill some 22,000 Americans each year. Many of these errors happen on the operating table, from leaving surgical sponges inside patients’ bodies to performing the wrong procedure altogether.

While the patient safety movement has pushed for uniform checklists and other manual fail-safes to prevent such mistakes, Grantcharov believes that “as long as the only barrier between success and failure is a human, there will be errors.” Improving safety and surgical efficiency became something of a personal obsession. He wanted to make it challenging to make mistakes, and he thought developing the right system to create and analyze recordings could be the key.

It’s taken many years, but Grantcharov, now a professor of surgery at Stanford, believes he’s finally developed the technology to make this dream possible: the operating room equivalent of an airplane’s black box. It records everything in the OR via panoramic cameras, microphones, and anesthesia monitors before using artificial intelligence to help surgeons make sense of the data.

Grantcharov’s company, Surgical Safety Technologies, is not the only one deploying AI to analyze surgeries. Many medical device companies are already in the space—including Medtronic with its Touch Surgery platform, Johnson & Johnson with C-SATS, and Intuitive Surgical with Case Insights.

But most of these are focused solely on what’s happening inside patients’ bodies, capturing intraoperative video alone. Grantcharov wants to capture the OR as a whole, from the number of times the door is opened to how many non-case-related conversations occur during an operation. “People have simplified surgery to technical skills only,” he says. “You need to study the OR environment holistically.”

Teodor Grantcharov in a procedure that is being recorded by Surgical Safety Technologies’ AI-powered black-box system.

Success, however, isn’t as simple as just having the right technology. The idea of recording everything presents a slew of tricky questions around privacy and could raise the threat of disciplinary action and legal exposure. Because of these concerns, some surgeons have refused to operate when the black boxes are in place, and some of the systems have even been sabotaged. Aside from those problems, some hospitals don’t know what to do with all this new data or how to avoid drowning in a deluge of statistics.

Grantcharov nevertheless predicts that his system can do for the OR what black boxes did for aviation. In 1970, the industry was plagued by 6.5 fatal accidents for every million flights; today, that’s down to less than 0.5. “The aviation industry made the transition from reactive to proactive thanks to data,” he says—“from safe to ultra-safe.”

Grantcharov’s black boxes are now deployed at almost 40 institutions in the US, Canada, and Western Europe, from Mount Sinai to Duke to the Mayo Clinic. But are hospitals on the cusp of a new era of safety—or creating an environment of confusion and paranoia?

Shaking off the secrecy

The operating room is probably the most measured place in the hospital but also one of the most poorly captured. From team performance to instrument handling, there is “crazy big data that we’re not even recording,” says Alexander Langerman, an ethicist and head and neck surgeon at Vanderbilt University Medical Center. “Instead, we have post hoc recollection by a surgeon.”

Indeed, when things go wrong, surgeons are supposed to review the case at the hospital’s weekly morbidity and mortality conferences, but these errors are notoriously underreported. And even when surgeons enter the required notes into patients’ electronic medical records, “it’s undoubtedly—and I mean this in the least malicious way possible—dictated toward their best interests,” says Langerman. “It makes them look good.”

The operating room wasn’t always so secretive.

In the 19th century, operations often took place in large amphitheaters—they were public spectacles with a general price of admission. “Every seat even of the top gallery was occupied,” recounted the abdominal surgeon Lawson Tait about an operation in the 1860s. “There were probably seven or eight hundred spectators.”

However, around the 1900s, operating rooms became increasingly smaller and less accessible to the public—and its germs. “Immediately, there was a feeling that something was missing, that the public surveillance was missing. You couldn’t know what happened in the smaller rooms,” says Thomas Schlich, a historian of medicine at McGill University.

And it was nearly impossible to go back. In the 1910s a Boston surgeon, Ernest Codman, suggested a form of surveillance known as the end-result system, documenting every operation (including failures, problems, and errors) and tracking patient outcomes. Massachusetts General Hospital didn’t accept it, says Schlich, and Codman resigned in frustration.

Students watch a surgery performed at the former Philadelphia General Hospital around the turn of the century.

Such opacity was part of a larger shift toward medicine’s professionalization in the 20th century, characterized by technological advancements, the decline of generalists, and the bureaucratization of health-care institutions. All of this put distance between patients and their physicians. Around the same time, and particularly from the 1960s onward, the medical field began to see a rise in malpractice lawsuits—at least partially driven by patients trying to find answers when things went wrong.

This battle over transparency could theoretically be addressed by surgical recordings. But Grantcharov realized very quickly that the only way to get surgeons to use the black box was to make them feel protected. To that end, he has designed the system to record the action but hide the identity of both patients and staff, even deleting all recordings within 30 days. His idea is that no individual should be punished for making a mistake. “We want to know what happened, and how we can build a system that makes it difficult for this to happen,” Grantcharov says. Errors don’t occur because “the surgeon wakes up in the morning and thinks, ‘I’m gonna make some catastrophic event happen,’” he adds. “This is a system issue.”

AI that sees everything

Grantcharov’s OR black box is not actually a box at all, but a tablet, one or two ceiling microphones, and up to four wall-mounted dome cameras that can reportedly analyze more than half a million data points per day per OR. “In three days, we go through the entire Netflix catalogue in terms of video processing,” he says.

The black-box platform utilizes a handful of computer vision models and ultimately spits out a series of short video clips and a dashboard of statistics—like how much blood was lost, which instruments were used, and how many auditory disruptions occurred. The system also identifies and breaks out key segments of the procedure (dissection, resection, and closure) so that instead of having to watch a whole three- or four-hour recording, surgeons can jump to the part of the operation where, for instance, there was major bleeding or a surgical stapler misfired.

Critically, each person in the recording is rendered anonymous; an algorithm distorts people’s voices and blurs out their faces, transforming them into shadowy, noir-like figures. “For something like this, privacy and confidentiality are critical,” says Grantcharov, who claims the anonymization process is irreversible. “Even though you know what happened, you can’t really use it against an individual.”

Another AI model works to evaluate performance. For now, this is done primarily by measuring compliance with the surgical safety checklist—a questionnaire that is supposed to be verbally ticked off during every type of surgical operation. (This checklist has long been associated with reductions in both surgical infections and overall mortality.) Grantcharov’s team is currently working to train more complex algorithms to detect errors during laparoscopic surgery, such as using excessive instrument force, holding the instruments in the wrong way, or failing to maintain a clear view of the surgical area. However, assessing these performance metrics has proved more difficult than measuring checklist compliance. “There are some things that are quantifiable, and some things require judgment,” Grantcharov says.

Each model has taken up to six months to train, through a labor-intensive process relying on a team of 12 analysts in Toronto, where the company was started. While many general AI models can be trained by a gig worker who labels everyday items (like, say, chairs), the surgical models need data annotated by people who know what they’re seeing—either surgeons, in specialized cases, or other labelers who have been properly trained. They have reviewed hundreds, sometimes thousands, of hours of OR videos and manually noted which liquid is blood, for instance, or which tool is a scalpel. Over time, the model can “learn” to identify bleeding or particular instruments on its own, says Peter Grantcharov, Surgical Safety Technologies’ vice president of engineering, who is Teodor Grantcharov’s son.

For the upcoming laparoscopic surgery model, surgeon annotators have also started to label whether certain maneuvers were correct or mistaken, as defined by the Generic Error Rating Tool—a standardized way to measure technical errors.

While most algorithms operate near perfectly on their own, Peter Grantcharov explains that the OR black box is still not fully autonomous. For example, it’s difficult to capture audio through ceiling mikes and thus get a reliable transcript to document whether every element of the surgical safety checklist was completed; he estimates that this algorithm has a 15% error rate. So before the output from each procedure is finalized, one of the Toronto analysts manually verifies adherence to the questionnaire. “It will require a human in the loop,” Peter Grantcharov says, but he gauges that the AI model has made the process of confirming checklist compliance 80% to 90% more efficient. He also emphasizes that the models are constantly being improved.

In all, the OR black box can cost about $100,000 to install, and analytics expenses run $25,000 annually, according to Janet Donovan, an OR nurse who shared with MIT Technology Review an estimate given to staff at Brigham and Women’s Faulkner Hospital in Massachusetts. (Peter Grantcharov declined to comment on these numbers, writing in an email: “We don’t share specific pricing; however, we can say that it’s based on the product mix and the total number of rooms, with inherent volume-based discounting built into our pricing models.”)

“Big brother is watching”

Long Island Jewish Medical Center in New York, part of the Northwell Health system, was the first hospital to pilot OR black boxes, back in February 2019. The rollout was far from seamless, though not necessarily because of the tech.

“In the colorectal room, the cameras were sabotaged,” recalls Northwell’s chair of urology, Louis Kavoussi—they were turned around and deliberately unplugged. In his own OR, the staff fell silent while working, worried they’d say the wrong thing. “Unless you’re taking a golf or tennis lesson, you don’t want someone staring there watching everything you do,” says Kavoussi, who has since joined the scientific advisory board for Surgical Safety Technologies.

Grantcharov’s promises about not using the system to punish individuals have offered little comfort to some OR staff. When two black boxes were installed at Faulkner Hospital in November 2023, they threw the department of surgery into crisis. “Everybody was pretty freaked out about it,” says one surgical tech who asked not to be identified by name since she wasn’t authorized to speak publicly. “We were being watched, and we felt like if we did something wrong, our jobs were going to be on the line.”

It wasn’t that she was doing anything illegal or spewing hate speech; she just wanted to joke with her friends, complain about the boss, and be herself without the fear of administrators peeking over her shoulder. “You’re very aware that you’re being watched; it’s not subtle at all,” she says. The early days were particularly challenging, with surgeons refusing to work in the black-box-equipped rooms and OR staff boycotting those operations: “It was definitely a fight every morning.”

“In the colorectal room, the cameras were sabotaged,” recalls Louis Kavoussi. “Unless you’re taking a golf or tennis lesson, you don’t want someone staring there watching everything you do.”

At some level, the identity protections are only half measures. Before 30-day-old recordings are automatically deleted, Grantcharov acknowledges, hospital administrators can still see the OR number, the time of operation, and the patient’s medical record number, so even if OR personnel are technically de-identified, they aren’t truly anonymous. The result is a sense that “Big Brother is watching,” says Christopher Mantyh, vice chair of clinical operations at Duke University Hospital, which has black boxes in seven ORs. He will draw on aggregate data to talk generally about quality improvement at departmental meetings, but when specific issues arise, like breaks in sterility or a cluster of infections, he will look to the recordings and “go to the surgeons directly.”

In many ways, that’s what worries Donovan, the Faulkner Hospital nurse. She’s not convinced the hospital will protect staff members’ identities and is worried that these recordings will be used against them—whether through internal disciplinary actions or in a patient’s malpractice suit. In February 2023, she and almost 60 others sent a letter to the hospital’s chief of surgery objecting to the black box. She’s since filed a grievance with the state, with arbitration proceedings scheduled for October.

The legal concerns in particular loom large because, already, over 75% of surgeons report having been sued at least once, according to a 2021 survey by Medscape, an online resource hub for health-care professionals. To the layperson, any surgical video “looks like a horror show,” says Vanderbilt’s Langerman. “Some plaintiff’s attorney is going to get ahold of this, and then some jury is going to see a whole bunch of blood, and then they’re not going to know what they’re seeing.” That prospect turns every recording into a potential legal battle.

From a purely logistical perspective, however, the 30-day deletion policy will likely insulate these recordings from malpractice lawsuits, according to Teneille Brown, a law professor at the University of Utah. She notes that within that time frame, it would be nearly impossible for a patient to find legal representation, go through the requisite conflict-of-interest checks, and then file a discovery request for the black-box data. While deleting data to bypass the judicial system could provoke criticism, Brown sees the wisdom of Surgical Safety Technologies’ approach. “If I were their lawyer, I would tell them to just have a policy of deleting it because then they’re deleting the good and the bad,” she says. “What it does is orient the focus to say, ‘This is not about a public-facing audience. The audience for these videos is completely internal.’”

A data deluge

When it comes to improving quality, there are “the problem-first people, and then there are the data-first people,” says Justin Dimick, chair of the department of surgery at the University of Michigan. The latter, he says, push “massive data collection” without first identifying “a question of ‘What am I trying to fix?’” He says that’s why he currently has no plans to use the OR black boxes in his hospital.

Mount Sinai’s chief of general surgery, Celia Divino, echoes this sentiment, emphasizing that too much data can be paralyzing. “How do you interpret it? What do you do with it?” she asks. “This is always a disease.”

At Northwell, even Kavoussi admits that five years of data from OR black boxes hasn’t been used to change much, if anything. He says that hospital leadership is finally beginning to think about how to use the recordings, but a hard question remains: OR black boxes can collect boatloads of data, but what does it matter if nobody knows what to do with it?

Grantcharov acknowledges that the information can be overwhelming. “In the early days, we let the hospitals figure out how to use the data,” he says. “That led to a big variation in how the data was operationalized. Some hospitals did amazing things; others underutilized it.” Now the company has a dedicated “customer success” team to help hospitals make sense of the data, and it offers a consulting-type service to work through surgical errors. But ultimately, even the most practical insights are meaningless without buy-in from hospital leadership, Grantcharov suggests.

Getting that buy-in has proved difficult in some centers, at least partly because there haven’t yet been any large, peer-reviewed studies showing how OR black boxes actually help to reduce patient complications and save lives. “If there’s some evidence that a comprehensive data collection system—like a black box—is useful, then we’ll do it,” says Dimick. “But I haven’t seen that evidence yet.”

screenshot of clips recorded by Black Box — A screenshot of the analytics produced by the black box.

The best hard data thus far is from a 2022 study published in the Annals of Surgery, in which Grantcharov and his team used OR black boxes to show that the surgical checklist had not been followed in a fifth of operations, likely contributing to excess infections. He also says that an upcoming study, scheduled to be published this fall, will show that the OR black box led to an improvement in checklist compliance and reduced ICU stays, reoperations, hospital readmissions, and mortality.

On a smaller scale, Grantcharov insists that he has built a steady stream of evidence showing the power of his platform. For example, he says, it’s revealed that auditory disruptions—doors opening, machine alarms and personal pagers going off—happen every minute in gynecology ORs, that a median 20 intraoperative errors are made in each laparoscopic surgery case, and that surgeons are great at situational awareness and leadership while nurses excel at task management.

Meanwhile, some hospitals have reported small improvements based on black-box data. Duke’s Mantyh says he’s used the data to check how often antibiotics are given on time. Duke and other hospitals also report turning to this data to help decrease the amount of time ORs sit empty between cases. By flagging when “idle” times are unexpectedly long and having the Toronto analysts review recordings to explain why, they’ve turned up issues ranging from inefficient communication to excessive time spent bringing in new equipment.

That can make a bigger difference than one might think, explains Ra’gan Laventon, clinical director of perioperative services at Texas’s Memorial Hermann Sugar Land Hospital: “We have multiple patients who are depending on us to get to their care today. And so the more time that’s added in some of these operational efficiencies, the more impactful it is to the patient.”

The real world

At Northwell, where some of the cameras were initially sabotaged, it took a couple of weeks for Kavoussi’s urology team to get used to the black boxes, and about six months for his colorectal colleagues. Much of the solution came down to one-on-one conversations in which Kavoussi explained how the data was automatically de-identified and deleted.

During his operations, Kavoussi would also try to defuse the tension, telling the OR black box “Good morning, Toronto,” or jokingly asking, “How’s the weather up there?” In the end, “since nothing bad has happened, it has become part of the normal flow,” he says.

The reality is that no surgeon wants to be an average operator, “but statistically, we’re mostly average surgeons, and that’s okay,” says Vanderbilt’s Langerman. “I’d hate to be a below-average surgeon, but if I was, I’d really want to know about it.” Like athletes watching game film to prepare for their next match, surgeons might one day review their recordings, assessing their mistakes and thinking about the best ways to avoid them—but only if they feel safe enough to do so.

“Until we know where the guardrails are around this, there’s such a risk—an uncertain risk—that no one’s gonna let anyone turn on the camera,” Langerman says. “We live in a real world, not a perfect world.”

Simar Bajaj is an award-winning science journalist and 2024 Marshall Scholar. He has previously written for the Washington Post, Time magazine, the Guardian, NPR, and the Atlantic, as well as the New England Journal of Medicine, Nature Medicine, and The Lancet. He won Science Story of the Year from the Foreign Press Association in 2022 and the top prize for excellence in science communications from the National Academies of Science, Engineering, and Medicine in 2023. Follow him on X at @SimarSBajaj.