Describing a decision-making system as an “algorithm” is often a way to deflect accountability for human decisions. For many, the term implies a set of rules based objectively on empirical evidence or data. It also suggests a system that is highly complex—perhaps so complex that a human would struggle to understand its inner workings or anticipate its behavior when deployed.
But is this characterization accurate? Not always.
For example, in late December Stanford Medical Center’s misallocation of covid-19 vaccines was blamed on a distribution “algorithm” that favored high-ranking administrators over frontline doctors. The hospital claimed to have consulted with ethicists to design its “very complex algorithm,” which a representative said “clearly didn’t work right,” as MIT Technology Review reported at the time. While many people interpreted the use of the term to mean that AI or machine learning was involved, the system was in fact a medical algorithm, which is functionally different. It was more akin to a very simple formula or decision tree designed by a human committee.
This disconnect highlights a growing issue. As predictive models proliferate, the public becomes more wary of their use in making critical decisions. But as policymakers begin to develop standards for assessing and auditing algorithms, they must first define the class of decision-making or decision support tools to which their policies will apply. Leaving the term “algorithm” open to interpretation could place some of the models with the biggest impact beyond the reach of policies designed to ensure that such systems don’t hurt people.
How to ID an algorithm
So is Stanford’s “algorithm” an algorithm? That depends how you define the term. While there’s no universally accepted definition, a common one comes from a 1971 textbook written by computer scientist Harold Stone, who states: “An algorithm is a set of rules that precisely define a sequence of operations.” This definition encompasses everything from recipes to complex neural networks: an audit policy based on it would be laughably broad.
In statistics and machine learning, we usually think of the algorithm as the set of instructions a computer executes to learn from data. In these fields, the resulting structured information is typically called a model. The information the computer learns from the data via the algorithm may look like “weights” by which to multiply each input factor, or it may be much more complicated. The complexity of the algorithm itself may also vary. And the impacts of these algorithms ultimately depend on the data to which they are applied and the context in which the resulting model is deployed. The same algorithm could have a net positive impact when applied in one context and a very different effect when applied in another.
In other domains, what’s described above as a model is itself called an algorithm. Though that’s confusing, under the broadest definition it is also accurate: models are rules (learned by the computer’s training algorithm instead of stated directly by humans) that define a sequence of operations. For example, last year in the UK, the media described the failure of an “algorithm” to assign fair scores to students who couldn’t sit for their exams because of covid-19. Surely, what these reports were discussing was the model—the set of instructions that translated inputs (a student’s past performance or a teacher’s evaluation) into outputs (a score).
What seems to have happened at Stanford is that humans—including ethicists—sat down and determined what series of operations the system should use to determine, on the basis of inputs such as an employee’s age and department, whether that person should be among the first to get a vaccine. From what we know, this sequence wasn’t based on an estimation procedure that optimized for some quantitative target. It was a set of normative decisions about how vaccines should be prioritized, formalized in the language of an algorithm. This approach qualifies as an algorithm in medical terminology and under the broad definition, even though the only intelligence involved was that of humans.
Focus on impact, not input
Lawmakers are also weighing in on what an algorithm is. Introduced in the US Congress in 2019, HR2291, or the Algorithmic Accountability Act, uses the term “automated decisionmaking system” and defines it as “a computational process, including one derived from machine learning, statistics, or other data processing or artificial intelligence techniques, that makes a decision or facilitates human decision making, that impacts consumers.”
Similarly, New York City is considering Int 1894, a law that would introduce mandatory audits of “automated employment decision tools,” defined as “any system whose function is governed by statistical theory, or systems whose parameters are defined by such systems.” Notably, both bills mandate audits but provide only high-level guidelines on what an audit is.
As decision-makers in both government and industry create standards for algorithmic audits, disagreements about what counts as an algorithm are likely. Rather than trying to agree on a common definition of “algorithm” or a particular universal auditing technique, we suggest evaluating automated systems primarily based on their impact. By focusing on outcome rather than input, we avoid needless debates over technical complexity. What matters is the potential for harm, regardless of whether we’re discussing an algebraic formula or a deep neural network.
Impact is a critical assessment factor in other fields. It’s built into the classic DREAD framework in cybersecurity, which was first popularized by Microsoft in the early 2000s and is still used at some corporations. The “A” in DREAD asks threat assessors to quantify “affected users” by asking how many people would suffer the impact of an identified vulnerability. Impact assessments are also common in human rights and sustainability analyses, and we’ve seen some early developers of AI impact assessments create similar rubrics. For example, Canada’s Algorithmic Impact Assessment provides a score based on qualitative questions such as “Are clients in this line of business particularly vulnerable? (yes or no).”
There are certainly difficulties to introducing a loosely defined term such as “impact” into any assessment. The DREAD framework was later supplemented or replaced by STRIDE, in part because of challenges with reconciling different beliefs about what threat modeling entails. Microsoft stopped using DREAD in 2008.
In the AI field, conferences and journals have already introduced impact statements with varying degrees of success and controversy. It’s far from foolproof: impact assessments that are purely formulaic can easily be gamed, while an overly vague definition can lead to arbitrary or impossibly lengthy assessments.
Still, it’s an important step forward. The term “algorithm,” however defined, shouldn’t be a shield to absolve the humans who designed and deployed any system of responsibility for the consequences of its use. This is why the public is increasingly demanding algorithmic accountability—and the concept of impact offers a useful common ground for different groups working to meet that demand.
Kristian Lum is an assistant research professor in the Computer and Information Science Department at the University of Pennsylvania.
Rumman Chowdhury is the director of the Machine Ethics, Transparency, and Accountability (META) team at Twitter. She was previously the CEO and founder of Parity, an algorithmic audit platform, and global lead for responsible AI at Accenture.