Ice Lounge Media

Ice Lounge Media

Shopify took down Kanye West’s online store after the musician sold T-shirts with the swastika symbol. West, who also goes by Ye, advertised his online store in a Super Bowl commercial on Sunday, directing viewers to his website, where the only item listed was the swastika T-shirt. Though Shopify removed a policy banning sellers from […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

Founders Fund is on track to conclude fundraising of its third growth fund at the end of March, according to people close to the firm. The Peter Thiel-founded outfit is raising $3 billion, a source told TechCrunch and Axios also reported. The fund, which is intended primarily for additional investments in its successful late-stage portfolio companies, […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

Apple Maps will soon rename the Gulf of Mexico to the Gulf of America, following similar changes made by Google this week, in order to comply with U.S. President Donald Trump’s executive order that officially changed the name. U.S.-based Apple users may see the “Gulf of America” as soon as Tuesday, according to Bloomberg, and […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

ChatGPT, OpenAI’s chatbot platform, may not be as power-hungry as once assumed. But its appetite largely depends on how ChatGPT is being used and the AI models that are answering the queries, according to a new study. A recent analysis by Epoch AI, a nonprofit AI research institute, attempted to calculate how much energy a […]

© 2024 TechCrunch. All rights reserved. For personal use only.

Read more

We often take the internet for granted. It’s an ocean of information at our fingertips—and it simply works. But this system relies on swarms of “crawlers”—bots that roam the web, visit millions of websites every day, and report what they see. This is how Google powers its search engines, how Amazon sets competitive prices, and how Kayak aggregates travel listings. Beyond the world of commerce, crawlers are essential for monitoring web security, enabling accessibility tools, and preserving historical archives. Academics, journalists, and civil societies also rely on them to conduct crucial investigative research.  

Crawlers are endemic. Now representing half of all internet traffic, they will soon outpace human traffic. This unseen subway of the web ferries information from site to site, day and night. And as of late, they serve one more purpose: Companies such as OpenAI use web-crawled data to train their artificial intelligence systems, like ChatGPT. 

Understandably, websites are now fighting back for fear that this invasive species—AI crawlers—will help displace them. But there’s a problem: This pushback is also threatening the transparency and open borders of the web, that allow non-AI applications to flourish. Unless we are thoughtful about how we fix this, the web will increasingly be fortified with logins, paywalls, and access tolls that inhibit not just AI but the biodiversity of real users and useful crawlers.

A system in turmoil 

To grasp the problem, it’s important to understand how the web worked until recently, when crawlers and websites operated together in relative symbiosis. Crawlers were largely undisruptive and could even be beneficial, bringing people to websites from search engines like Google or Bing in exchange for their data. In turn, websites imposed few restrictions on crawlers, even helping them navigate their sites. Websites then and now use machine-readable files, called robots.txt files, to specify what content they wanted crawlers to leave alone. But there were few efforts to enforce these rules or identify crawlers that ignored them. The stakes seemed low, so sites didn’t invest in obstructing those crawlers.

But now the popularity of AI has thrown the crawler ecosystem into disarray.

As with an invasive species, crawlers for AI have an insatiable and undiscerning appetite for data, hoovering up Wikipedia articles, academic papers, and posts on Reddit, review websites, and blogs. All forms of data are on the menu—text, tables, images, audio, and video. And the AI systems that result can (but not always will) be used in ways that compete directly with their sources of data. News sites fear AI chatbots will lure away their readers; artists and designers fear that AI image generators will seduce their clients; and coding forums fear that AI code generators will supplant their contributors. 

In response, websites are starting to turn crawlers away at the door. The motivator is largely the same: AI systems, and the crawlers that power them, may undercut the economic interests of anyone who publishes content to the web—by using the websites’ own data. This realization has ignited a series of crawler wars rippling beneath the surface.

The fightback

Web publishers have responded to AI with a trifecta of lawsuits, legislation, and computer science. What began with a litany of copyright infringement suits, including one from the New York Times, has turned into a wave of restrictions on use of websites’ data, as well as legislation such as the EU AI Act to protect copyright holders’ ability to opt out of AI training. 

However, legal and legislative verdicts could take years, while the consequences of AI adoption are immediate. So in the meantime, data creators have focused on tightening the data faucet at the source: web crawlers. Since mid-2023, websites have erected crawler restrictions to over 25% of the highest-quality data. Yet many of these restrictions can be simply ignored, and while major AI developers like OpenAI and Anthropic do claim to respect websites’ restrictions, they’ve been accused of ignoring them or aggressively overwhelming websites (the major technical support forum iFixit is among those making such allegations).

Now websites are turning to their last alternative: anti-crawling technologies. A plethora of new startups (TollBit, ScalePost, etc), and web infrastructure companies like Cloudflare (estimated to support 20% of global web traffic), have begun to offer tools to detect, block, and charge nonhuman traffic. These tools erect obstacles that make sites harder to navigate or require crawlers to register.

These measures still offer immediate protection. After all, AI companies can’t use what they can’t obtain, regardless of how courts rule on copyright and fair use. But the effect is that large web publishers, forums, and sites are often raising the drawbridge to all crawlers—even those that pose no threat. This is even the case once they ink lucrative deals with AI companies that want to preserve exclusivity over that data. Ultimately, the web is being subdivided into territories where fewer crawlers are welcome.

How we stand to lose out

As this cat-and-mouse game accelerates, big players tend to outlast little ones.  Large websites and publishers will defend their content in court or negotiate contracts. And massive tech companies can afford to license large data sets or create powerful crawlers to circumvent restrictions. But small creators, such as visual artists, YouTube educators, or bloggers, may feel they have only two options: hide their content behind logins and paywalls, or take it offline entirely. For real users, this is making it harder to access news articles, see content from their favorite creators, and navigate the web without hitting logins, subscription demands, and captchas each step of the way.

Perhaps more concerning is the way large, exclusive contracts with AI companies are subdividing the web. Each deal raises the website’s incentive to remain exclusive and block anyone else from accessing the data—competitor or not. This will likely lead to further concentration of power in the hands of fewer AI developers and data publishers. A future where only large companies can license or crawl critical web data would suppress competition and fail to serve real users or many of the copyright holders.

Put simply, following this path will shrink the biodiversity of the web. Crawlers from academic researchers, journalists, and non-AI applications may increasingly be denied open access. Unless we can nurture an ecosystem with different rules for different data uses, we may end up with strict borders across the web, exacting a price on openness and transparency. 

While this path is not easily avoided, defenders of the open internet can insist on laws, policies, and technical infrastructure that explicitly protect noncompeting uses of web data from exclusive contracts while still protecting data creators and publishers. These rights are not at odds. We have so much to lose or gain from the fight to get data access right across the internet. As websites look for ways to adapt, we mustn’t sacrifice the open web on the altar of commercial AI.

Shayne Longpre is a PhD Candidate at MIT, where his research focuses on the intersection of AI and policy. He leads the Data Provenance Initiative.

Read more

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

The dream of offshore rocket launches is finally blasting off

Want to send something to space? Get in line. The demand for rides off Earth is skyrocketing, with launches more than doubling over the past four years, from about 100 to 250 annually. That number is projected to spiral further up, fueled by an epic growth spurt in the commercial space sector.

To relieve the congestion, some mission planners are looking to the ocean as the next big gateway to space. But sea-based launches come with some unique regulatory, geopolitical, and environmental trade-offs. They also offer a glimpse of new technologies and industries, enabled by a potentially limitless launch capacity, that could profoundly reshape our lives. Read the full story. 

—Becky Ferreira

Can AI help DOGE slash government budgets? It’s complex.

No tech leader before has played the role in a new presidential administration that Elon Musk is playing now. Under his leadership, DOGE has entered offices in a half-dozen agencies and counting, accessed various payment systems, had its access to the Treasury halted by a federal judge, and sparked lawsuits questioning the legality of the group’s activities.  

The stated goal of DOGE’s actions is “slashing waste, fraud, and abuse.” So where is fraud happening, and could AI models fix it, as DOGE staffers hope? Read our story to find out

—James O’Donnell

This story is from The Algorithm, our weekly newsletter giving you the inside track on all things AI. Sign up to receive it in your inbox every Monday.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Elon Musk is leading an unsolicited bid to buy OpenAI for $97.4 billion
This is an escalation in his long-running feud with CEO Sam Altman, but it may well come to nothing. (WSJ $)
The timing is annoying for Altman, as he’s in the middle of complex restructuring negotiations. (FT $)
Still, he says he’s confident OpenAI’s board is going to reject Musk’s offer. (The Information $)

2 What we’re learning from the AI Action Summit in Paris
As tech companies ship AI products relentlessly, policymakers still haven’t got a clue how to respond. (NYT $)

3 A federal judge blocked NIH cuts to research grants
A hearing has been set for February 21. (STAT $)
Why the cuts would be so devastating, according to the scientists who’d be affected. (Scientific American $)

4 AI chatbots cannot accurately summarize news
A study of leading models found 51% of their answers to questions about the news had ‘significant issues’. (BBC)
The tendency to make things up is holding chatbots back. But that’s just what they do. (MIT Technology Review)

5 BYD is bringing advanced self-driving to its cars
Including even the cheapest models. (FT $)
Analysts expect this to solidify the company’s position as China’s top EV maker. (South China Morning Post)
Why the world’s biggest EV maker is getting into shipping. (MIT Technology Review)
+ Meanwhile in the US, the next big robotaxi push is underway. (Quartz)

6 Trump is imposing 25% tariffs on foreign steel
You may recall he did this during his last term, and ended up having to roll it back. (NYT $)

7 A Silicon Valley job isn’t as desirable as it used to be 
Multiple rounds of layoffs have really broken employees’ trust in their superiors. (WP $)

8 Google Maps now shows the ‘Gulf of America’
Unless you live in Mexico! (The Verge)

9 Can the human body endure a voyage to Mars? 🧑‍🚀
Space travel exacts an extremely high physical toll on even the fittest astronauts. (New Yorker $)
Space travel is dangerous. Could genetic testing and gene editing make it safer? (MIT Technology Review)

10 Thinking of re-playing the Sims? Maybe don’t.
25 years on, it feels a bit like a psyop to prepare millennials for the capitalist grind. (The Guardian)

Quote of the day

“No thank you but we will buy twitter for $9.74 billion if you want.”

—Sam Altman responds on X to news that Elon Musk is leading an unsolicited bid to buy OpenAI for $97.4 billion.

The big story

This sci-fi blockchain game could help create a metaverse that no one owns

screenshot from Dark Forest game
DARK FOREST VIA DFWIKI

November 2022

Dark Forest is a vast universe, and most of it is shrouded in darkness. Your mission, should you choose to accept it, is to venture into the unknown, avoid being destroyed by opposing players who may be lurking in the dark, and build an empire of the planets you discover and can make your own.

But while the video game seemingly looks and plays much like other online strategy games, it doesn’t rely on the servers running other popular online strategy games. And it may point to something even more profound: the possibility of a metaverse that isn’t owned by a big tech company. Read the full story.

—Mike Orcutt

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Tiles can be such a beautiful artform. Just ask the Portuguese!
+ Here are some quick ways to jumpstart your energy levels.
+ I’m obsessed with spicy smacked cucumbers. Turns out, they’re easy to make at home.
+ Aww… This little boy and his Dad managed to visit every city in England by train last year.

Read more
1 35 36 37 38 39 2,591