In Graphic Detail: New data shows publishers face growing AI bot, third-party scraper activity - Digiday

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
for the Digiday Programmatic Marketing Summit, May 6-8 in Palm Springs.
A third-party scraper economy is emerging beneath the big AI companies, making it harder than ever for publishers to know who is taking their content, let alone stop them.
“What’s changed is the separation of roles. The entity extracting the data is often not the one using or monetizing it. That abstraction layer is what’s enabling third-party resale markets to scale,” Brent Maynard, senior director of security technology and strategy at content delivery network company Akamai, said. “We’re hearing this directly from publishers. One large publisher described it as: We’re not being scraped by one company anymore, we’re being harvested by an ecosystem.”
Digiday has compiled four graphs revealing how AI web scraping is evolving, and what it means for publishers:
Publishers have spent years fighting the big AI companies over scraping. The smaller operators — dozens of third-party vendors crawling the web and selling content to enterprise clients — have largely escaped scrutiny. Media analyst Matthew Scott Goldstein’s report on the “scraper economy”, presented at a recent meeting he regularly hosts between publishers and tech companies and media analysts, aims to change that.
Publishers make no money from these relationships, according to Goldstein. He said this is a $1 billion industry, citing Mordor Intelligence data.
“Organized publisher supply into a licensed content marketplace doesn’t need to create new demand. Every enterprise in this report has demonstrated willingness to pay. The infrastructure exists. The buyers are named, funded, and growing. The only missing piece is organized publisher supply at scale — and the urgency to move before the compliance window closes,” Goldstein wrote in his report, which he shared with Digiday.
The report identified 21 vendors doing this, including Firecrawl, Exa, Tavily, Brave, You.com, Perplexity Sonar and Bright Data. (TollBit also has a running index on third-party scrapers, identifying nearly 40 vendors.)
Over 70 companies were found to be paying for publisher content from these vendors, including BCG, IBM, Cohere, AWS, Salesforce, Apple, Latham & Watkins, Zoom, PwC, Shopify and Alibaba.
What’s exacerbating the problem is the rise in AI bot activity.
Akamai, which handles more than one-third of all global internet traffic, saw a 300 percent surge in AI bot activity in 2025, according to its recent “Protecting Publishing: Navigating the AI Bot Era” report.
While most of this targeted the commerce industry (48 percent), media was in second place (13 percent), the report said. Specifically, publishers represented 40 percent of all media-related AI bot activity.
OpenAI, Meta, and ByteDance were the top three culprits between July and December last year. Of those, OpenAI generated the highest volume of AI bot traffic targeting media companies. Publishers accounted for 40 percent of that traffic, per the report.
Worse still, the methods are getting more sophisticated. Akami found that “AI fetchers” — or bots that grab specific web pages in real time to answer specific user requests on AI assistants like chatbots –accounted for 24 percent of AI bot types in the media industry in the second half of last year. Publishers represented 43 percent of that. Meanwhile, AI training crawlers, which scan and collect large amounts of data from websites to train LLMs, made up 63 percent of the AI bot activity Akamai tracked.
“We’re seeing growth in real-time fetchers that pull content dynamically to answer queries,” Maynard said. “The risk isn’t just that content is taken, it’s that the visit never happens. We’re already seeing publishers connect this to declining referral traffic and changing user behavior… AI bot traffic is persistent and growing, but it doesn’t create value for publishers. It consumes infrastructure and content without contributing to revenue.”
Cybersecurity company Human Security found automated traffic is now growing eight times faster than human traffic, according to its recently published “2026 State of AI Traffic & Cyberthreat Benchmarks” report. AI scraper traffic grew 597 percent from January to December 2025, and AI-driven traffic overall grew 187% in 2025, nearly tripling year over year, per the report.
Despite the volume of bot activity, publishers are seeing little in return. ChatGPT is driving less than 0.2% of traffic to Raptive’s network of 6,000 independent publishers, according to a recent Raptive guide on AI bot blocking — an issue Digiday has previously reported on. In short, the more AI companies take, the less they send back.
“As a publisher, or content creator, you don’t get anything from this,” Paul Bannister, chief strategy officer at Raptive, said. “It’s minuscule. It grew fast last year and then it’s flat lined for a while. They’re not sending more traffic out. And until there’s a better quid pro quo here, whether it’s traffic or money or something else, what’s in it for you? What’s in it for any of us?”.
The solution to all of this, according to Goldstein, is easy: “Block the shit out of every single one of these companies. Block, block, block, and then block some more.”
The problem is that blocking is harder than it sounds. Most publishers rely on a single cybersecurity tool or bot detection system. AI companies have dozens of scraping tools to choose from, many designed specifically to circumvent bot blocking mechanisms.
Tollbit’s latest “State of the Bots” report underscored the point. It found that about 30 percent of AI bot scrapes violate explicit instructions in robots.txt to “disallow” web crawlers to access their content.
“Robots.txt… is as useful as a chocolate teapot right now,” one publishing exec told Digiday, under the condition of anonymity.
There is some hope. Content delivery networks like Akamai, Fastly and Cloudflare have more sophisticated ways to identify AI bots and block them. That’s helped publishers like People Inc. recently take the route of blocking all bots and only allowing the ones they’ve deemed “permissible” to access their content, Jon Roberts, chief innovation officer at People Inc, said onstage at the Digiday Publishing Summit in Vail, Colorado last month. Mark Howard, Time’s chief operating officer, also said at the same event that his company was planning on implementing a more aggressive bot-blocking approach.
“The organizations that get ahead of this will treat AI access as both a security problem and a business model decision. If your content is fueling AI, you should have a say in how it’s accessed, used and monetized,” Maynard said.
The Atlantic has done similar, allowing only bots from companies that it has a commercial or strategic deal with to access its content.
News Creators have become preferred sources for younger viewers, but how do they grow and sustain their independent endeavors?
The fine print in creator contracts has never mattered more. A succession of high-profile creator scandals is putting morality clauses at the center of how brands manage risk.
A growing network of third-party web scrapers is fueling an AI content licensing market, where publisher content is scraped and sold.
Get access to tools and analysis to stay ahead of the trends transforming media and marketing
Visit your account page to make changes and renew.
Get Digiday's top stories every morning in your email inbox.
Follow @Digiday for the latest news, insider access to events and more.

source

ZoomYourWeb3

In Graphic Detail: New data shows publishers face growing AI bot, third-party scraper activity – Digiday

Contact Us

Quick Links