MEDIANAMA
Technology and policy in India
Cloudflare, a cloud-based internet infrastructure company, recently released its 2025 internet trends report, which can offer insights into how traditional search and browsing behaviour is evolving as AI bots are slowly becoming the go-to resource, not only in everyday use but also in professional workflows.
To assess AI’s impact on internet search and bot crawling behaviour, MediaNama examined key data points from Cloudflare’s report. The following points highlight the most significant aspects of the report, which are further explained in detail along with their relevance.
Crawl-to-refer ratio measures how much automated crawling by AI and search platforms translates into actual human traffic being sent back to the original websites. The metric can be inherently volatile as it fluctuates with daily changes in crawling and browsing behaviour, but it provides a useful lens to assess how different platforms are interacting with the open web– which various AI companies use as a defence for their unethical practices.
Among the popular AI companies, Anthropic showed the most extreme imbalance, with crawl-to-refer ratios peaking near 500,000:1 early in the year before settling at still-elevated levels between 25,000:1 and 100,000:1, driven largely by minimal referral traffic.
It means that for every one visit sent back to an originally crawled source website, Anthropic’s bots crawled that site roughly 5,00,000 times. In simple terms, the Anthropic bots took far more from the site than they actually returned to the website through real-human traffic.
Why this metric matters is that it can show the deliberate design choices of the AI chatbots and helps us to determine if the UI and UX are optimised to nudge the user to visit the original source site or whether they are just designed to cater to the engagement rates within the AI platform. If the AI companies really want to promote and stick to the the open web principles, then they ought to design the platforms that can give the users a visible choice to explore the original source, like a search engine index page, rather than just be an answer-giving machine that may neither value nor give credit the original sources, let alone paying the licensing fee back to the original sources that they trained their datasets on.
The crawl-to-refer ratio of other AI companies is as follows:
Dual-purpose web crawling has become standard practice among major tech companies, with Googlebot and Microsoft’s Bingbot both scanning websites for search indexing while also collecting data for AI training, a model that Cloudflare itself has previously cautioned against by urging AI bots to declare a single, clear purpose.
Cloudflare also revealed that crawling for AI model training still accounted for the vast majority of AI crawler traffic, peaking at about 7-8 times the volume of search-related crawling and roughly 32 times that of user-action crawling. The report noted that OpenAI’s GPTBot drove much of this training-related traffic, causing it to follow a similar activity pattern throughout the year.
One notable trend shows user-triggered bots growing faster than AI-training bots. AI crawling triggered directly by user actions increased more than 15 times in this year alone, according to Cloudflare’s annual report. Cloudflare’s data indicates that the strongest growth came from bots responding directly to user queries rather than from training crawlers, signalling a shift from foundational model data collection toward real-time web access that could challenge traditional search engines.
The so-called user-driven bots are relevant because of their violative nature of the robot.txt protocol. Addressing this issue, in August 2025, Cloudflare blocked Perplexity’s bots as they violated the Robot.txt protocol, which websites use to communicate with web crawlers and AI bots about which parts of the website they can or cannot access. However, elsewhere, Cloudflare also said that not all bots follow the instructions mentioned in the robots.txt file.
It is important to note that Reddit caught Perplexity’s bots in a deliberate trap in October 2025, strengthening Reddit’s claim that Perplexity may have relied on third-party tools to scrape Google’s search results pages, which it then may have used for indexing and training purposes. Reddit then filed a lawsuit in the New York-based US District Court.
Cloudflare’s data shows more measured growth among smaller AI companies, with Anthropic’s ClaudeBot roughly doubling its crawling activity in the first half of the year before tapering off and ending 2025 only slightly above its starting level, while Perplexity’s PerplexityBot followed a step-wise growth pattern marked by sharp increases rather than steady scaling, finishing the year at around 3.5 times its initial traffic.
While major AI bots are seeing an upward trend, China-based ByteDance’s Bytespider, once among the most active AI crawlers, continued to lose momentum this year.
Google’s automated bot systems remained the single largest source of internet traffic, as well as AI training traffic, as observed by Cloudflare’s 2025 report. It said that Googlebot, the tool Google uses to scan websites so they appear in search results and to train its AI systems, generated more requests to Cloudflare’s network than any other source for the third year in a row.
The report noted that one block of internet addresses used by Google accounted for nearly four times as much traffic as the next largest source, which belonged to Rackspace, a cloud hosting company and also a competitor to Cloudflare. Since Rackspace serves many different customers and services, unlike Google, Cloudflare said it could not identify a single reason for that traffic.
Additionally, among the verified bots, Google’s bots, like Google AdsBot or Google Image Proxy, are the most crawled bots. OpenAI’s GPTBot secures the next position with over 7.5% of Verified Bot traffic. Clouflare also commented that OpenAI’s bots had “fairly volatile crawling activity during the first half of the year.” Whereas, Microsoft-owned BingBot generated only 6% of Verified Bot traffic throughout the year and reportedly showed relatively stable activity.
Cloudflare’s data shows that AI and search crawling are heavily concentrated in a narrow set of industries, led primarily by the retail industry, which alone accounted for about 25.5% of all crawling activity. Computer software followed at 15.6%, reflecting the high value AI systems place on product documentation, developer resources and technical content.
By contrast, sectors that form the backbone of the digital economy, such as telecommunications and IT services, each accounted for just over 5%, while the internet category stood at only 4.4%. Content-driven and regulated sectors, including media, financial services, gambling and adult entertainment, saw comparatively lower crawl shares, suggesting tighter access controls by the original owners.
Bot-driven internet traffic over the past 12 months has disproportionately originated from smaller or less prominent jurisdictions, including Gibraltar, Dominica and Turkmenistan, as well as regions such as the Middle East, Ireland and Singapore. By contrast, India recorded significantly lower bot-led traffic at about 13%, compared with roughly 42% each in the United States and Russia and about 41% in China, underscoring a widening gap as automated traffic continues to rise globally alongside the expansion of AI-driven systems.
Unlike bot-led traffic, India has higher mobile-based internet usage and continues to exhibit a mobile-first internet usage pattern, with about 64.6% of its internet traffic originating from mobile devices over the past 12 months. India’s share of mobile internet is high when compared with lower mobile traffic shares in major developed and large digital markets such as the United States, at just over 49%, China at slightly above 50%, and Russia at about 45%, indicating a heavier reliance on desktop or non-mobile access in those countries.
By contrast, several developing and conflict-affected regions record even higher levels of mobile dependence, including Sudan at more than 78%, Syria at about 77%, Bangladesh at roughly 76%, and Switzerland and comparable smaller markets at around the mid-70% range, underscoring sharp global differences in how users access the internet.
Satellite-based internet connectivity is gaining momentum globally, with traffic from SpaceX’s Starlink service rising sharply over the past year, according to Cloudflare’s annual report. Cloudflare said total request volume from Starlink more than doubled in 2025, growing about 2.3 times over the year, reflecting increasing reliance on satellite broadband to serve unserved and underserved regions, as well as users in transit, such as those on aircraft and maritime vessels.
The report also noted that Starlink traffic typically surges soon after the service launches in a new market, a pattern that persisted in 2025 across more than 20 newly added countries and regions, including Armenia, Niger, Sri Lanka and Sint Maarten, where usage spiked within days of availability, underscoring the rapid expansion of satellite-based internet communications.
Also Read:
Support our journalism by subscribing
- Read Reasoned by Nikhil Pahwa: Opinion & analysis on Tech business & policy.
- Sign up for MediaNama’s Daily Newsletter to receive regular updates
- Stay informed about MediaNama events
- Have something to tell us? Leave an Anonymous Tip
- Ask us to File an RTI
- Sponsor a MediaNama Event
Delhi is set to roll out Aadhaar-linked Pink Saheli Smart Cards for women commuters, replacing the paper-based pink ticket system on Delhi Transport Corporation buses from January 2026.
MediaNama is the premier source of information and analysis on Technology Policy in India. More about MediaNama, and contact information, here.
© 2024 Mixed Bag Media Pvt. Ltd.
source