On November 4, just hours after Elon Musk fired half of the 7,500 employees previously working at Twitter, some people began to see small signs that something was wrong with everyone’s favorite hellsite. And they saw it through retweets.
Twitter introduced retweets in 2009, turning an organic thing people were already doing—pasting someone else’s username and tweet, preceded by the letters RT—into a software function. In the years since, the retweet and its distant cousin the quote tweet (which launched in April 2015) have become two of the most common mechanics on Twitter.
But on Friday, a few users who pressed the retweet button saw the years roll back to 2009. Manual retweets, as they were called, were back.
The return of the manual retweet wasn’t Elon Musk’s latest attempt to appease users. Instead, it was the first public crack in the edifice of Twitter’s code base—a blip on the seismometer that warns of a bigger earthquake to come.
A massive tech platform like Twitter is built upon very many interdependent parts. “The larger catastrophic failures are a little more titillating, but the biggest risk is the smaller things starting to degrade,” says Ben Krueger, a site reliability engineer who has more than two decades of experience in the tech industry. “These are very big, very complicated systems.” Krueger says one 2017 presentation from Twitter staff includes a statistic suggesting that more than half the back-end infrastructure was dedicated to storing data.
While many of Musk’s detractors may hope the platform goes through the equivalent of thermonuclear destruction, the collapse of something like Twitter happens gradually. For those who know, gradual breakdowns are a sign of concern that a larger crash could be imminent. And that’s what’s happening now.
It’s the small things
Whether it’s manual RTs appearing for a moment before retweets slowly morph into their standard form, ghostly follower counts that race ahead of the number of people actually following you, or replies that simply refuse to load, small bugs are appearing at Twitter’s periphery. Even Twitter’s rules, which Musk linked to on November 7, went offline temporarily under the load of millions of eyeballs. In short, it’s becoming unreliable.
“Sometimes you’ll get notifications that are a little off,” says one engineer currently working at Twitter, who’s concerned about the way the platform is reacting after vast swathes of his colleagues who were previously employed to keep the site running smoothly were fired. (That last sentence is why the engineer has been granted anonymity to talk for this story.) After struggling with downtime during its “Fail Whale” days, Twitter eventually became lauded for its team of site reliability engineers, or SREs. Yet this team has been decimated in the aftermath of Musk’s takeover. “It’s small things, at the moment, but they do really add up as far as the perception of stability,” says the engineer.
The small suggestions of something wrong will amplify and multiply as time goes on, he predicts—in part because the skeleton staff remaining to handle these issues will quickly burn out. “Round-the-clock is detrimental to quality, and we’re already kind of seeing this,” he says.
Twitter’s remaining engineers have largely been tasked with keeping the site stable over the last few days, since the new CEO decided to get rid of a significant chunk of the staff maintaining its code base. As the company tries to return to some semblance of normalcy, more of their time will be spent addressing Musk’s (often taxing) whims for new products and features, rather than keeping what’s already there running.
This is particularly problematic, says Krueger, for a site like Twitter, which can have unforeseen spikes in user traffic and interest. Krueger contrasts Twitter with online retail sites, where companies can prepare for big traffic events like Black Friday with some predictability. “When it comes to Twitter, they have the possibility of having a Black Friday on any given day at any time of the day,” he says. “At any given day, some news event can happen that can have significant impact on the conversation.” Responding to that is harder to do when you lay off up to 80% of your SREs—a figure Krueger says has been bandied about within the industry but which MIT Technology Review has been unable to confirm. The Twitter engineer agreed that the percentage sounded “plausible.”
That engineer doesn’t see a route out of the issue—other than reversing the layoffs (which the company has reportedly already attempted to roll back somewhat). “If we’re going to be pushing at a breakneck pace, then things will break,” he says. “There’s no way around that. We’re accumulating technical debt much faster than before—almost as fast as we’re accumulating financial debt.”
The list grows longer
He presents a dystopian future where issues pile up as the backlog of maintenance tasks and fixes grows longer and longer. “Things will be broken. Things will be broken more often. Things will be broken for longer periods of time. Things will be broken in more severe ways,” he says. “Everything will compound until, eventually, it’s not usable.”
Twitter’s collapse into an unusable wreck is some time off, the engineer says, but the telltale signs of process rot are already there. It starts with the small things: “Bugs in whatever part of whatever client they’re using; whatever service in the back end they’re trying to use. They’ll be small annoyances to start, but as the back-end fixes are being delayed, things will accumulate until people will eventually just give up.”
Krueger says that Twitter won’t blink out of life, but we’ll start to see a greater number of tweets not loading, and accounts coming into and out of existence seemingly at a whim. “I would expect anything that’s writing data on the back end to possibly have slowness, timeouts, and a lot more subtle types of failure conditions,” he says. “But they’re often more insidious. And they also generally take a lot more effort to track down and resolve. If you don’t have enough engineers, that’s going to be a significant problem.”
The juddering manual retweets and faltering follower counts are indications that this is already happening. Twitter engineers have designed fail-safes that the platform can fall back on so that the functionality doesn’t go totally offline but cut-down versions are provided instead. That’s what we’re seeing, says Krueger.
Alongside the minor malfunctions, the Twitter engineer believes that there’ll be significant outages on the horizon, thanks in part to Musk’s drive to reduce Twitter’s cloud computing server load in an attempt to claw back up to $3 million a day in infrastructure costs. Reuters reports that this project, which came from Musk’s war room, is called the “Deep Cuts Plan.” One of Reuters’s sources called the idea “delusional,” while Alan Woodward, a cybersecurity professor at the University of Surrey, says that “unless they’ve massively overengineered the current system, the risk of poorer capacity and availability seems a logical conclusion.”
Brain drain
Meanwhile, when things do go kaput, there’s no longer the institutional knowledge to quickly fix issues as they arise. “A lot of the people I saw who were leaving after Friday have been there nine, 10, 11 years, which is just ridiculous for a tech company,” says the Twitter engineer. As those individuals walked out of Twitter offices, decades of knowledge about how its systems worked disappeared with them. (Those within Twitter, and those watching from the sidelines, have previously argued that Twitter’s knowledge base is overly concentrated in the minds of a handful of programmers, some of whom have been fired.)
Unfortunately, teams stripped back to their bare bones (according to those remaining at Twitter) include the tech writers’ team. “We had good documentation because of [that team],” says the engineer. No longer. When things go wrong, it’ll be harder to find out what has happened.
Getting answers will be harder externally as well. The communications team has been cut down from between 80 and 100 to just two people, according to one former team member who MIT Technology Review spoke to. “There’s too much for them to do, and they don’t speak enough languages to deal with the press as they need to,” says the engineer.
When MIT Technology Review reached out to Twitter for this story, the email went unanswered.
Musk’s recent criticism of Mastodon, the open-source alternative to Twitter that has piled on users in the days since the entrepreneur took control of the platform, invites the suggestion that those in glass houses shouldn’t throw stones. The Twitter CEO tweeted, then quickly deleted, a post telling users, “If you don’t like Twitter anymore, there is awesome site [sic] called Masterbatedone [sic].” Accompanying the words was a physical picture of his laptop screen open on Paul Krugman’s Mastodon profile, showing the economics columnist trying multiple times to post. Despite Musk’s attempt to highlight Mastodon’s unreliability, its success has been remarkable: nearly half a million people have signed up since Musk took over Twitter.
It’s happening at the same time that the first cracks in Twitter’s edifice are starting to show. It’s just the beginning, expects Krueger. “I would expect to start seeing significant public-facing problems with the technology within six months,” he says. “And I feel like that’s a generous estimate.”