Meta is giving researchers more access to Facebook and Instagram data

Meta is releasing a new transparency product called the Meta Content Library and API, according to an announcement from the company today. The new tools will allow select researchers to access publicly available data on Facebook and Instagram in an effort to give a more overarching view of what’s happening on the platforms.

The move comes as social media companies are facing public and regulatory pressure to increase transparency about how their products—specifically recommendation algorithms—work and what impact they have. Academic researchers have long been calling for better access to data from social media platforms, including Meta. This new library is a step toward increased visibility about what is happening on its platforms and the effect that Meta’s products have on online conversations, politics, and society at large.

In an interview, Meta’s president of global affairs, Nick Clegg, said the tools “are really quite important” in that they provide, in a lot of ways, “the most comprehensive access to publicly available content across Facebook and Instagram of anything that we’ve built to date.” The Content Library will also help the company meet new regulatory requirements and obligations on data sharing and transparency, as the company notes in a blog post Tuesday.

The library and associated API were first released as a beta version several months ago and allow researchers to access near-real-time data about pages, posts, groups, and events on Facebook and creator and business accounts on Instagram, as well as the associated numbers of reactions, shares, comments, and post view counts. While all this data is publicly available—as in, anyone can see public posts, reactions, and comments on Facebook—the new library makes it easier for researchers to search and analyze this content at scale.

Meta says that to protect user privacy, this data will be accessible only through a virtual “clean room” and not downloadable. And access will be limited to approved researchers, who will be required to apply via an independent third-party organization.

In addition to the new library and API, Meta announced new partnerships to expand on research from 2022 on the connections between social networks and economic mobility.

The announcements come just days after The Information reported that the company was disbanding its Responsible AI team and distributing researchers throughout other parts of the organization, sparking skepticism about its commitment to user safety. Clegg had no comment on the restructuring of the AI team.

Hopes for “meaningful” research

Researchers have had a fraught relationship with social media companies in the past, particularly when it comes to accessing data that platforms might not want public. (In 2021, for instance, Facebook sent a cease-and-desist letter to researchers at New York University’s Transparency Project, which was investigating political ad targeting on the platform through web scraping, which the company said violated user privacy.)

Clegg said he wants the product to enable research that, first and foremost, is “meaningful,” and he highlighted the current lack of consensus among researchers about the exact impacts of social media—research that has undoubtedly been made more difficult by the lack of public data from social media companies.

The new library is primarily a database that can be accessed either through a web interface similar to a search engine or through an API where researchers can code their own queries to return large amounts of data. Researchers could, for example, ask to see all public posts in English about generative AI on February 14, 2023, sorted by most viewed to least viewed.

Recent moves by regulators, particularly in the European Union, may have pushed Meta’s hand with mandates for greater transparency. The EU’s Digital Services Act (DSA), which went into effect in August, requires that big platforms the size of Meta provide access to real-time data for researchers investigating “the detection, identification, and understanding of systemic risks in the Union.” Other regulatory efforts in Australia, Brazil, the US, and elsewhere have attempted to mimic these requirements. In what’s known as the Brussels effect, tech companies often comply with the strictest standards, usually set by the EU, in every country they operate to avoid fragmentation in their products.

Policy efforts have struggled to balance demands for greater transparency with concerns about privacy protections. Clegg said that Meta has attempted to strike such a balance, in part through the application process.

Researchers looking to access the Content Library and API have to submit information about their institution and research questions to the Inter-university Consortium for Political and Social Research, an independent organization at the University of Michigan. Meta says the screening is primarily intended to provide a security check about the groups using the data and their financial interests, not to scrutinize the research questions.

The application process, though, has already raised some eyebrows. Smitha Milli, a postdoctoral researcher at Cornell Tech who studies the impact of social media, says, “My main question is, Why isn’t this accessible to everyone?”—especially since the library only contains publicly available data. Milli adds that it’s important to consider the amount of time the application process will add to the research cycle, saying it could be “super limiting.”

(Meta said access to the Content Library was limited to protect user privacy: “There’s a big difference between data being publicly available on the platform versus being able to access it programmatically in a way where you can get access to a large volume of that data,” said Kiran Jagadeesh, a Meta product manager.)

Milli notes that researchers really want access to information about how recommendation algorithms work and what people are seeing on their individual feeds, as well as ways to run experiments on the platforms. It’s not clear how the latest product will make progress on those fronts, though Clegg said researchers can pair the Content Library with other projects, like recommendation system cards, which combined will give “a much, much richer picture than was ever possible.”

Lena Frischlich, a professor at the Digital Democracy Centre at the University of Southern Denmark, tested the beta version of the Content Library and said her team found the access to multimedia content like reels on Instagram and events on Facebook particularly useful, as well as the new data it provides about view counts.

Frischlich also says that while the new product is “an important next step toward more transparency,” it is just a step. “Data access is still somehow restricted,” since not every country is included in the database and only researchers at qualifying academic or non-profit research institutions are granted access.

Clegg said he hopes that the new tool ultimately leads to better research about the role of social media in society, for multiple reasons. “I think there’s a sort of societal sense of responsibility here,” he said, “but also a self-interest in seeking to dispel some of the hyperbole that surrounds social media and to have the debate more grounded in fact.”

This story has been updated to clarify that non-profit research institutions and academic institutions may be granted access to the Content Library and API.