We all want to be able to speak our minds online—to be heard by our friends and talk (back) to our opponents. At the same time, we don’t want to be exposed to speech that is inappropriate or crosses a line. Technology companies address this conundrum by setting standards for free speech, a practice protected under federal law. They hire in-house moderators to examine individual pieces of content and remove them if posts violate predefined rules set by the platforms.
The approach clearly has problems: harassment, misinformation about topics like public health, and false descriptions of legitimate elections run rampant. But even if content moderation were implemented perfectly, it would still miss a whole host of issues that are often portrayed as moderation problems but really are not. To address those non-speech issues, we need a new strategy: treat social media companies as potential polluters of the social fabric, and directly measure and mitigate the effects their choices have on human populations. That means establishing a policy framework—perhaps through something akin to an Environmental Protection Agency or Food and Drug Administration for social media—that can be used to identify and evaluate the societal harms generated by these platforms. If those harms persist, that group could be endowed with the ability to enforce those policies. But to transcend the limitations of content moderation, such regulation would have to be motivated by clear evidence and be able to have a demonstrable impact on the problems it purports to solve.
Moderation (whether automated or human) can potentially work for what we call “acute” harms: those caused directly by individual pieces of content. But we need this new approach because there are also a host of “structural” problems—issues such as discrimination, reductions in mental health, and declining civic trust—that manifest in broad ways across the product rather than through any individual piece of content. A famous example of this kind of structural issue is Facebook’s 2012 “emotional contagion” experiment, which showed that users’ affect (their mood as measured by their behavior on the platform) shifted measurably depending on which version of the product they were exposed to.
In the blowback that ensued after the results became public, Facebook (now Meta) ended this type of deliberate experimentation. But just because they stopped measuring such effects does not mean product decisions don’t continue to have them.
Structural problems are direct outcomes of product choices. Product managers at technology companies like Facebook, YouTube, and TikTok are incentivized to focus overwhelmingly on maximizing time and engagement on the platforms. And experimentation is still very much alive there: almost every product change is deployed to small test audiences via randomized controlled trials. To assess progress, companies implement rigorous management processes to foster their central missions (known as Objectives and Key Results, or OKRs), even using these outcomes to determine bonuses and promotions. The responsibility for addressing the consequences of product decisions is often placed on other teams that are usually downstream and have less authority to address root causes. Those teams are generally capable of responding to acute harms—but often cannot address problems caused by the products themselves.
With attention and focus, this same product development structure could be turned to the question of societal harms. Consider Frances Haugen’s congressional testimony last year, along with media revelations about Facebook’s alleged impact on the mental health of teens. Facebook responded to criticism by explaining that it had studied whether teens felt that the product had a negative effect on their mental health and whether that perception caused them to use the product less, and not whether the product actually had a detrimental effect. While the response may have addressed that particular controversy, it illustrated that a study aiming directly at the question of mental health—rather than its impact on user engagement—would not be a big stretch.
Incorporating evaluations of systemic harm won’t be easy. We would have to sort out what we can actually measure rigorously and systematically, what we would require of companies, and what issues to prioritize in any such assessments.
Companies could implement protocols themselves, but their financial interests too often run counter to meaningful limitations on product development and growth. That reality is a standard case for regulation that operates on behalf of the public. Whether through a new legal mandate from the Federal Trade Commission or harm mitigation guidelines from a new governmental agency, the regulator’s job would be to work with technology companies’ product development teams to design implementable protocols measurable during the course of product development to assess meaningful signals of harm.
That approach may sound cumbersome, but adding these types of protocols should be straightforward for the largest companies (the only ones to which regulation should apply), because they have already built randomized controlled trials into their development process to measure their efficacy. The more time-consuming and complex part would be defining the standards; the actual execution of the testing would not require regulatory participation at all. It would only require asking diagnostic questions alongside normal growth-related questions and then making that data accessible to external reviewers. Our forthcoming paper at the 2022 ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization will explain this procedure in more detail and outline how it could effectively be established.
When products that reach tens of millions are tested for their ability to boost engagement, companies would need to ensure that those products—at least in aggregate—also abide by a “don’t make the problem worse” principle. Over time, more aggressive standards could be established to roll back existing effects of already-approved products.
There are many methods that might be suitable for this type of process. These include protocols like the photographic affect meter, which has been used diagnostically to assess how exposure to products and services affects mood. Technology platforms are already using surveys to assess product changes; according to reporters Cecilia Kang and Sheera Frankel, Mark Zuckerberg looks at survey-based growth metrics for most every product decision, the results of which were part of his choice to roll back the “nicer” version of Facebook’s news feed algorithm after the 2020 election.
It would be reasonable to ask whether the technology industry sees this approach as feasible and whether companies would fight against it. While any potential regulation might engender such a response, we have received positive feedback from early conversations about this framework—perhaps because under our approach, most product decisions would pass muster. (Causing measureable harms of the sort described here is a very high bar, one that most product choices would clear.) And unlike other proposals, this strategy sidesteps direct regulation of speech, at least outside the most extreme cases.
At the same time, we don’t have to wait for regulators to take action. Companies could readily implement these procedures on their own. Establishing the case for change, however, is difficult without first starting to collect the sort of high-quality data we’re describing here. That is because one cannot prove the existence of these types of harms without real-time measurement, creating a chicken-and-egg challenge. Proactively monitoring structural harms won’t resolve platforms’ content issues. But it could allow us to meaningfully and continuously verify whether the public interest is being subverted.
The US Environmental Protection Agency is an apt analogy. The original purpose of the agency was not to legislate environmental policy, but to enact standards and protocols so that policies with actionable outcomes could be made. From that point of view, the EPA’s lasting impact was not to resolve environmental policy debates (it hasn’t), but to make them possible. Likewise, the first step for fixing social media is to create the infrastructure that we’ll need in order to examine outcomes in speech, mental well-being, and civic trust in real time. Without that, we will be prevented from addressing many of the most pressing problems these platforms create.
Nathaniel Lubin is a fellow at the Digital Life Initiative at Cornell Tech and the former director of the Office of Digital Strategy at the White House under President Barack Obama. Thomas Krendl Gilbert is a postdoctoral fellow at Cornell Tech and received an interdisciplinary PhD in machine ethics and epistemology at UC Berkeley.