Larry Harmon laundered 350,000 BTC, but he was treated leniently for his help in jailing Roman Sterlingov.
This article first appeared in The Checkup, MIT Technology Review’s weekly biotech newsletter. To receive it in your inbox every Thursday, and read articles like this first, sign up here.
Every journalist has favorite topics. Regular Checkup readers might already know some of mine, which include the quest to delay or reverse human aging, and new technologies for reproductive health and fertility. So when I saw trailers for The Substance, a film centered on one middle-aged woman’s attempt to reexperience youth, I had to watch it.
I won’t spoil the movie for anyone who hasn’t seen it yet (although I should warn that it is not for the squeamish, or anyone with an aversion to gratuitous close-ups of bums and nipples). But a key premise of the film involves harmful attitudes toward female aging.
“Hey, did you know that a woman’s fertility starts to decrease by the age of 25?” a powerful male character asks early in the film. “At 50, it just stops,” he later adds. He never explains what stops, exactly, but to the viewer the message is pretty clear: If you’re a woman, your worth is tied to your fertility. Once your fertile window is over, so are you.
The insidious idea that women’s bodies are, above all else, vessels for growing children has plenty of negative consequences for us all. But it has also set back scientific research and health policy.
Earlier this week, I chatted about this with Alana Cattapan, a political scientist at the University of Waterloo in Ontario, Canada. Cattapan has been exploring the concept of “women of reproductive age”—a descriptor that is ubiquitous in health research and policy.
The idea for the research project came to her when the Zika virus was making headlines around eight years ago. “I was planning on going to the Caribbean for a trip related to my partner’s research, and I kept getting advice that women of reproductive age shouldn’t go,” she told me. At the time, Zika was being linked to microcephaly—unusually small heads—in newborn babies. It was thought that the virus was affecting key stages of fetal development.
Cattapan wasn’t pregnant. And she wasn’t planning on becoming pregnant at the time. So why was she being advised to stay away from areas with the virus?
The experience got her thinking about the ways in which attitudes toward our bodies are governed by the idea of potential pregnancy. Take, for example, biomedical research on the causes and treatment of disease. Women’s health has lagged behind men’s as a focus of such work, for multiple reasons. Male bodies have long been considered the “default” human form, for example. And clinical trials have historically been designed in ways that make them less accessible for women.
Fears about the potential effects of drugs on fetuses have also played a significant role in keeping people who have the potential to become pregnant out of studies. “Scientific research has excluded women of ‘reproductive age,’ or women who might potentially conceive, in a blanket way,” says Cattapan. “The research that we have on many, many drugs does not include women and certainly doesn’t include women in pregnancy.”
This lack of research goes some way to explaining why women are much more likely to experience side effects from drugs—some of them fatal. Over the last couple of decades, greater effort has been made to include people with ovaries and uteruses in clinical research. But we still have a long way to go.
Women are also often subjected to medical advice designed to protect a potential fetus, whether they are pregnant or not. Official guidelines on how much mercury-containing fish it is safe to eat can be different for “women of childbearing age,” according to the US Environmental Protection Agency, for example. And in 2021, the World Health Organization used the same language to describe people who should be a focus of policies to reduce alcohol consumption.
The takeaway message is that it’s women who should be thinking about fetal health, says Cattapan. Not the industries producing these chemicals or the agencies that regulate them. Not even the men who contribute to a pregnancy. Just women who stand a chance of getting pregnant, whether they intend to or not. “It puts the onus of the health of future generations squarely on the shoulders of women,” she says.
Another problem is the language itself. The term “women of reproductive age” typically includes women between 15 and 44. Women at one end of that spectrum will have very different bodies and a very different set of health risks from those at the other. And the term doesn’t account for people who might be able to get pregnant but don’t necessarily identify as female.
In other cases it is overly broad. In the context of the Zika virus, for example, it was not all women between the ages of 15 and 44 who should have considered taking precautions. The travel advice didn’t apply to people who’d had hysterectomies or did not have sex with men, for example, says Cattapan. “Precision here matters,” she says.
More nuanced health advice would be helpful in cases like these. Guidelines often read as though they’re written for people assumed to be stupid, she adds. “I don’t think that needs to be the case.”
Another thing
On Thursday, president-elect Donald Trump said that he will nominate Robert F. Kennedy Jr. to lead the US Department of Health and Human Services. The news was not entirely a surprise, given that Trump had told an audience at a campaign rally that he would let Kennedy “go wild” on health, “the foods,” and “the medicines.”
The role would give Kennedy some control over multiple agencies, including the Food and Drug Administration, which regulates medicines in the US, and the Centers for Disease Control and Prevention, which coordinates public health advice and programs.
That’s extremely concerning to scientists, doctors, and health researchers, given Kennedy’s positions on evidence-based medicine, including his antivaccine stance. A few weeks ago, in a post on X, he referred to the FDA’s “aggressive suppression of psychedelics, peptides, stem cells, raw milk, hyperbaric therapies, chelating compounds, ivermectin, hydroxychloroquine, vitamins, clean foods, sunshine, exercise, nutraceuticals and anything else that advances human health and can’t be patented by Pharma.”
“If you work for the FDA and are part of this corrupt system, I have two messages for you,” continued the post. “1. Preserve your records, and 2. Pack your bags.”
There’s a lot to unpack here. But briefly, we don’t yet have good evidence that mind-altering psychedelic drugs are the mental-health cure-alls some claim they are. There’s not enough evidence to support the many unapproved stem-cell treatments sold by clinics throughout the US and beyond, either. These “treatments” can be dangerous.
Health agencies are currently warning against the consumption of raw unpasteurized milk, because it might carry the bird flu virus that has been circulating in US dairy farms. And it’s far too simplistic to lump all vitamins together—some might be of benefit to some people, but not everyone needs supplements, and high doses can be harmful.
Kennedy’s 2021 book The Real Anthony Fauci has already helped spread misinformation about AIDS. Here at MIT Technology Review, we’ll continue our work reporting on whatever comes next. Watch this space.
Now read the rest of The Checkup
Read more from MIT Technology Review’s archive
The tech industry has a gender problem, as the Gamergate and various #MeToo scandals made clear. A new generation of activists is hoping to remedy it.
Male and female immune systems work differently. Which is another reason why it’s vital to study both women and female animals as well as males.
Both of the above articles were published in the Gender issue of MIT Technology Review magazine. You can read more from that issue online here.
Women are more likely to receive abuse online. My colleague Charlotte Jee spoke to the technologists working on an alternative way to interact online: a feminist internet.
From around the web
The scientific community and biopharma investors are reacting to the news of Robert F. Kennedy Jr.’s nomination to lead the Department of Health and Human Services. “It’s hard to see HHS functioning,” said one biotech analyst. (STAT)
Virologist Beata Halassy successfully treated her own breast cancer with viruses she grew in the lab. She has no regrets. (Nature)
Could diet influence the growth of endometriosis lesions? Potentially, according to research in mice fed high-fat, low-fiber “Western” diets. (BMC Medicine)
Last week, 43 female rhesus macaque monkeys escaped from a lab in South Carolina. The animals may have a legal claim to freedom. (Vox)
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
How this grassroots effort could make AI voices more diverse
We are on the cusp of a voice AI boom, as tech companies roll out the next generation of artificial-intelligence-powered assistants. But the default voices for these assistants are often white American—British, if you’re lucky—and most definitely speak English. And if you’re one of the billions of people who don’t speak English, bad luck: These tools don’t sound nearly as good in other languages.
This is because the data that has gone into training these models is limited. In AI research, most data used to train models is extracted from the English-language internet, which reflects Anglo-American culture. But there is a massive grassroots effort underway to change this status quo and bring more transparency and diversity to what AI sounds like. Read the full story.
—Melissa Heikkilä
Azalea: a science-fiction story
Fancy something fiction to read this weekend? If you enjoy Sci-Fi, check out this story written by Paolo Bacigalupi, featured in the latest edition of our print magazine. It imagines a future shaped by climate change—read it for yourself here.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 Cruise has admitted to falsifying a crash report
The report failed to mention that its robotaxi dragged a pedestrian after striking her. (San Francisco Chronicle)
+ The firm has been fined $500,000 to resolve the criminal charges. (WP $)
2 The US plans to investigate Microsoft’s cloud business
As the Biden administration prepares to hand over power to Donald Trump’s team. (FT $)
3 Silicon Valley hates regulation. So does Trump.
AI and energy ventures could be the first to prosper under lighter-touch governance. (WP $)
+ Peter Thiel claims the tech industry is fed up with ‘wokeness.’ (Insider $)
4 Elon Musk’s cost-cutting team will be working 80+ hours a week
And you’ll need to subscribe to X to apply. (WSJ $)
+ As if that wasn’t appealing enough, the positions are also unpaid. (NBC News)
+ The ‘lucky’ workers can expect a whole lot of meetings. (Bloomberg $)
5 The trolls are in charge now
And it’s increasingly unclear what’s a joke and what’s an actual threat. (The Atlantic $)
+ It’s possible, but not guaranteed, that Trump’s more controversial cabinet picks will be defeated in the Senate. (New Yorker $)
6 How to keep abortion plans private in the age of Trump
Reproductive rights are under threat. Here’s how to protect them. (The Markup)
7 The first mechanical Qubit is here
And mechanical quantum computers could be the first to benefit. (IEEE Spectrum)
+ Quantum computing is taking on its biggest challenge: noise. (MIT Technology Review)
8 Can Bluesky recapture the old Twitter’s magic?
No algorithms, no interfering billionaires. (Vox)
+ More than one million new users joined the platform earlier this week. (TechCrunch)
9 Weight-loss drugs could help to treat chronic pain
And could present a safer alternative to opioids. (New Scientist $)
+ Weight-loss injections have taken over the internet. But what does this mean for people IRL? (MIT Technology Review)
10 These are the most expensive photographs ever taken
The first human-taken pictures from space are truly awe-inspiring. (The Guardian)
Quote of the day
“It feels like it’s a platform for and by real people.”
—US politician Alexandria Ocasio-Cortez tells the Washington Post about the appeal of Bluesky as users join the social network after abandoning X.
The big story
How environmental DNA is giving scientists a new way to understand our world
February 2024
Environmental DNA is a relatively inexpensive, widespread, potentially automated way to observe the diversity and distribution of life.
Unlike previous techniques, which could identify DNA from, say, a single organism, the method also collects the swirling cloud of other genetic material that surrounds it. It can serve as a surveillance tool, offering researchers a means of detecting the seemingly undetectable.
By sampling eDNA, or mixtures of genetic material in water, soil, ice cores, cotton swabs, or practically any environment imaginable, even thin air, it is now possible to search for a specific organism or assemble a snapshot of all the organisms in a given place.
It offers a thrilling — and potentially chilling — way to collect information about organisms, including humans, as they go about their everyday business. Read the full story.
—Peter Andrey Smith
We can still have nice things
A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or tweet ’em at me.)
+ Smells like punk spirit.
+ If you’ve been feeling creaky lately (and who hasn’t), give these mobility exercises a go.
+ Talk about a glow up—these beautiful locations really do emanate light.
+ It’s the truly chilling collab we never knew we needed: Bon Jovi has joined forces with Mr Worldwide himself, Pitbull.
We are on the cusp of a voice AI boom, with tech companies such as Apple and OpenAI rolling out the next generation of artificial-intelligence-powered assistants. But the default voices for these assistants are often white American—British, if you’re lucky—and most definitely speak English. They represent only a tiny proportion of the many dialects and accents in the English language, which spans many regions and cultures. And if you’re one of the billions of people who don’t speak English, bad luck: These tools don’t sound nearly as good in other languages.
This is because the data that has gone into training these models is limited. In AI research, most data used to train models is extracted from the English-language internet, which reflects Anglo-American culture. But there is a massive grassroots effort underway to change this status quo and bring more transparency and diversity to what AI sounds like: Mozilla’s Common Voice initiative.
The data set Common Voice has created over the past seven years is one of the most useful resources for people wanting to build voice AI. It has seen a massive spike in downloads, partly thanks to the current AI boom; it recently hit the 5 million mark, up from 38,500 in 2020. Creating this data set has not been easy, mainly because the data collection relies on an army of volunteers. Their numbers have also jumped, from just under 500,000 in 2020 to over 900,000 in 2024. But by giving its data away, some members of this community argue, Mozilla is encouraging volunteers to effectively do free labor for Big Tech.
Since 2017, volunteers for the Common Voice project have collected a total of 31,000 hours of voice data in around 180 languages as diverse as Russian, Catalan, and Marathi. If you’ve used a service that uses audio AI, it’s likely been trained at least partly on Common Voice.
Mozilla’s cause is a noble one. As AI is integrated increasingly into our lives and the ways we communicate, it becomes more important that the tools we interact with sound like us. The technology could break down communication barriers and help convey information in a compelling way to, for example, people who can’t read. But instead, an intense focus on English risks entrenching a new colonial world order and wiping out languages entirely.
“It would be such an own goal if, rather than finally creating truly multimodal, multilingual, high-performance translation models and making a more multilingual world, we actually ended up forcing everybody to operate in, like, English or French,” says EM Lewis-Jong, a director for Common Voice.
Common Voice is open source, which means anyone can see what has gone into the data set, and users can do whatever they want with it for free. This kind of transparency is unusual in AI data governance. Most large audio data sets simply aren’t publicly available, and many consist of data that has been scraped from sites like YouTube, according to research conducted by a team from the University of Washington, and Carnegie Mellon andNorthwestern universities.
The vast majority of language data is collected by volunteers such as Bülent Özden, a researcher from Turkey. Since 2020, he has been not only donating his voice but also raising awareness around the project to get more people to donate. He recently spent two full-time months correcting data and checking for typos in Turkish. For him, improving AI models is not the only motivation to do this work.
“I’m doing it to preserve cultures, especially low-resource [languages],” Özden says. He tells me he has recently started collecting samples of Turkey’s smaller languages, such as Circassian and Zaza.
However, as I dug into the data set, I noticed that the coverage of languages and accents is very uneven. There are only 22 hours of Finnish voices from 231 people. In comparison, the data set contains 3,554 hours of English from 94,665 speakers. Some languages, such as Korean and Punjabi, are even less well represented. Even though they have tens of millions of speakers, they account for only a couple of hours of recorded data.
This imbalance has emerged because data collection efforts are started from the bottom up by language communities themselves, says Lewis-Jong.
“We’re trying to give communities what they need to create their own AI training data sets. We have a particular focus on doing this for language communities where there isn’t any data, or where maybe larger tech organizations might not be that interested in creating those data sets,” Lewis-Jong says. They hope that with the help of volunteers and various bits of grant funding, the Common Voice data set will have close to 200 languages by the end of the year.
Common Voice’s permissive license means that many companies rely on it—for example, the Swedish startup Mabel AI, which builds translation tools for health-care providers. One of the first languages the company used was Ukrainian; it built a translation tool to help Ukrainian refugees interact with Swedish social services, says Karolina Sjöberg, Mabel AI’s founder and CEO. The team has since expanded to other languages, such as Arabic and Russian.
The problem with a lot of other audio data is that it consists of people reading from books or texts. The result is very different from how people really speak, especially when they are distressed or in pain, Sjöberg says. Because anyone can submit sentences to Common Voice for others to read aloud, Mozilla’s data set also includes sentences that are more colloquial and feel more natural, she says.
Not that it is perfectly representative. The Mabel AI team soon found out that most voice data in the languages it needed was donated by younger men, which is fairly typical for the data set.
“The refugees that we intended to use the app with were really anything but younger men,” Sjöberg says. “So that meant that the voice data that we needed did not quite match the voice data that we had.” The team started collecting its own voice data from Ukrainian women, as well as from elderly people.
Unlike other data sets, Common Voice asks participants to share their gender and details about their accent. Making sure different genders are represented is important to fight bias in AI models, says Rebecca Ryakitimbo, a Common Voice fellow who created the project’s gender action plan. More diversity leads not only to better representation but also to better models. Systems that are trained on narrow and homogenous data tend to spew stereotyped and harmful results.
“We don’t want a case where we have a chatbot that is named after a woman but does not give the same response to a woman as it would a man,” she says.
Ryakitimbo has collected voice data in Kiswahili in Tanzania, Kenya, and the Democratic Republic of Congo. She tells me she wanted to collect voices from a socioeconomically diverse set of Kiswahili speakers and has reached out to women young and old living in rural areas, who might not always be literate or even have access to devices.
This kind of data collection is challenging. The importance of collecting AI voice data can feel abstract to many people, especially if they aren’t familiar with the technologies. Ryakitimbo and volunteers would approach women in settings where they felt safe to begin with, such as presentations on menstrual hygiene, and explain how the technology could, for example, help disseminate information about menstruation. For women who did not know how to read, the team read out sentences that they would repeat for the recording.
The Common Voice project is bolstered by the belief that languages form a really important part of identity. “We think it’s not just about language, but about transmitting culture and heritage and treasuring people’s particular cultural context,” says Lewis-Jong. “There are all kinds of idioms and cultural catchphrases that just don’t translate,” they add.
Common Voice is the only audio data set where English doesn’t dominate, says Willie Agnew, a researcher at Carnegie Mellon University who has studied audio data sets. “I’m very impressed with how well they’ve done that and how well they’ve made this data set that is actually pretty diverse,” Agnew says. “It feels like they’re way far ahead of almost all the other projects we looked at.”
I spent some time verifying the recordings of other Finnish speakers on the Common Voice platform. As their voices echoed in my study, I felt surprisingly touched. We had all gathered around the same cause: making AI data more inclusive, and making sure our culture and language was properly represented in the next generation of AI tools.
But I had some big questions about what would happen to my voice if I donated it. Once it was in the data set, I would have no control about how it might be used afterwards. The tech sector isn’t exactly known for giving people proper credit, and the data is available for anyone’s use.
“As much as we want it to benefit the local communities, there’s a possibility that also Big Tech could make use of the same data and build something that then comes out as the commercial product,” says Ryakitimbo. Though Mozilla does not share who has downloaded Common Voice, Lewis-Jong tells me Meta and Nvidia have said that they have used it.
Open access to this hard-won and rare language data is not something all minority groups want, says Harry H. Jiang, a researcher at Carnegie Mellon University, who was part of the team doing audit research. For example, Indigenous groups have raised concerns.
“Extractivism” is something that Mozilla has been thinking about a lot over the past 18 months, says Lewis-Jong. Later this year the project will work with communities to pilot alternative licenses including Nwulite Obodo Open Data License, which was created by researchers at the University of Pretoria for sharing African data sets more equitably. For example, people who want to download the data might be asked to write a request with details on how they plan to use it, and they might be allowed to license it only for certain products or for a limited time. Users might also be asked to contribute to community projects that support poverty reduction, says Lewis-Jong.
Lewis-Jong says the pilot is a learning exercise to explore whether people will want data with alternative licenses, and whether they are sustainable for communities managing them. The hope is that it could lead to something resembling “open source 2.0.”
In the end, I decided to donate my voice. I received a list of phrases to say, sat in front of my computer, and hit Record. One day, I hope, my effort will help a company or researcher build voice AI that sounds less generic, and more like me.
This story has been updated.
Meta’s now looking to separate your Threads and IG interest graphs.