Hot AI News
Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox
ChatGPT Health turns OpenAI's chatbot into a personal health assistant
Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance
9 Bold AI Predictions From Nvidia's Jensen Huang: How AI Will Reshape Wealth, Jobs, and Industry
NVIDIA RTX PRO 5000 72GB Blackwell: Supercharging agentic AI on your desktop
Aiholics: Your Source for AI News and Trends
  • News
    NewsShow More
    gmail gemini ai 2026
    Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox
    January 9, 2026
    chatgpt-health-2026-openai-available-rollout
    ChatGPT Health turns OpenAI's chatbot into a personal health assistant
    January 8, 2026
    Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance
    January 6, 2026
    nvidia ceo jensen huang
    9 Bold AI Predictions From Nvidia's Jensen Huang: How AI Will Reshape Wealth, Jobs, and Industry
    January 6, 2026
    workstation rtx pro blackwell gpu nvidia agentic ai desktop
    NVIDIA RTX PRO 5000 72GB Blackwell: Supercharging agentic AI on your desktop
    December 20, 2025
  • AI Tools and Reviews
    AI Tools and ReviewsShow More
    Intelligent agents in AI: how agents make decisions in artificial systems
    Intelligent agents in AI: How agents make decisions in artificial intelligence systems
    December 20, 2025
    Emergent AI review
    ElevenLabs review
    magictrips ai review
    MagicTrips AI review
    AI tool identifies structural heart disease with 88% accuracy using smartwatch data
    November 3, 2025
  • AI assistants
    AI assistantsShow More
    gmail gemini ai 2026
    Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox
    January 9, 2026
    chatgpt-health-2026-openai-available-rollout
    ChatGPT Health turns OpenAI's chatbot into a personal health assistant
    January 8, 2026
    chatgpt 5.2
    GPT-5.2 arrives as OpenAI races to keep pace with Google's Gemini 3
    December 12, 2025
    ai overviews summary google search
    EU investigates Google over AI summaries: what this means for creators and tech innovation
    December 9, 2025
    chatgpt-5
    GPT-5.2 release: Features, upgrades and OpenAI's urgent ‘code red' response
    December 6, 2025
  • Safety
    SafetyShow More
    How AI helped solve the mystery of a missing mountaineer
    January 9, 2026
    ai overviews summary google search
    EU investigates Google over AI summaries: what this means for creators and tech innovation
    December 9, 2025
    smart ai radar camera speed car big brother
    Spain's new AI occupancy cameras: How stealth tech fines solo drivers
    November 23, 2025
    tik tok manage topics ai content manage filter
    New TikTok features make it easier to spot AI – and choose how much of it you see
    November 23, 2025
    ai vegans antiai movement
    Meet the ‘AI vegans': Young users cutting AI out of their daily lives
    November 22, 2025
  • Research
    ResearchShow More
    How AI helped solve the mystery of a missing mountaineer
    January 9, 2026
    Polytechnic artificial intelligence: how AI diploma programs transform vocational education
    AI in polytechnic education: Diploma programs bringing artificial intelligence to vocational studies
    December 20, 2025
    How our brain processes speech: A layered approach like AI models
    December 14, 2025
    mit ai self learning notes
    MIT researchers unveil a method that lets AI models learn from their own notes
    December 13, 2025
    artificial intelligence agi vs ai myths
    From AI to AGI: Debunking myths and setting real expectations
    December 8, 2025
  • Companies
    • OpenAI
    • Google
    • Meta
    • Apple
    • Nvidia
    • Microsoft
    • ByteDance
    • Other companies
    CompaniesShow More
    gmail gemini ai 2026
    Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox
    January 9, 2026
    chatgpt-health-2026-openai-available-rollout
    ChatGPT Health turns OpenAI's chatbot into a personal health assistant
    January 8, 2026
    Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance
    January 6, 2026
    workstation rtx pro blackwell gpu nvidia agentic ai desktop
    NVIDIA RTX PRO 5000 72GB Blackwell: Supercharging agentic AI on your desktop
    December 20, 2025
    chatgpt 5.2
    GPT-5.2 arrives as OpenAI races to keep pace with Google's Gemini 3
    December 12, 2025
  • AI futurology
    AI futurologyShow More
    artificial intelligence agi vs ai myths
    From AI to AGI: Debunking myths and setting real expectations
    December 8, 2025
    Why synthetic data is becoming the most valuable resource in AI
    December 6, 2025
    How AI is quietly changing the way we grieve and remember loved ones
    December 3, 2025
    ai post writing articles content
    More articles are written by AI than humans: What that means for content creators
    November 24, 2025
    Why landing a first job is getting harder – and how AI plays a role
    November 23, 2025
  • Events
  • Sustainability
    SustainabilityShow More
    sustainability ai green technology environment ecology
    AI's climate impact: why it's not the environmental villain you think
    December 6, 2025
    Thermodynamic computing Extropic superconducting chips ai energy
    Extropic's superconducting chips could change everything about AI's power problem
    November 2, 2025
    Google's first carbon capture project: A new path to clean, reliable energy
    November 2, 2025
    Japan's AI-generated video shows what a Mount Fuji eruption could really look like
    November 2, 2025
    How NASA's new AI model is changing the way we predict solar storms
    November 2, 2025
  • Finance
    FinanceShow More
    OpenAI headquarters
    OpenAI reportedly preparing for a $1 trillion stock market debut by 2026
    November 2, 2025
    Meta's AI gamble: Why Zuckerberg's massive spending is spooking investors
    November 2, 2025
    nvidia_most_valuable_stock_market_cap
    Nvidia reaches $5 trillion valuation as AI demand explodes. Can rivals keep up?
    November 2, 2025
    Perplexity AI makes a bold $34.5 billion bid for Google Chrome
    November 2, 2025
    How a 23-year-old raised $1.5 billion for an AI hedge fund
    November 2, 2025
  • AI Tutorials and Prompts

Archives

  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • May 2025
  • August 2024
  • July 2024
  • June 2024

Categories

  • AI Apps and Tools
  • AI assistants
  • AI futurology
  • AI Tools and Reviews
  • AI Tutorials and Prompts
  • Anthropic
  • Apple
  • ByteDance
  • Companies
  • Events
  • Finance
  • Free Prompts
  • Google
  • Meta
  • Microsoft
  • News
  • Nvidia
  • OpenAI
  • Other companies
  • Research
  • Safety
  • Sustainability
  • Uncategorized
Reading: Perplexity accused of scraping websites despite explicit blocks
Search AI news & posts
Font ResizerAa
Aiholics: Your Source for AI News and TrendsAiholics: Your Source for AI News and Trends
  • News
  • Companies
  • AI assistants
  • Sustainability
  • Safety
  • Research
Search
  • News
  • Companies
    • Google
    • Meta
    • Microsoft
    • Nvidia
    • Apple
  • AI assistants
  • Sustainability
  • Safety
  • Research
  • AI futurology
gpt 4o mini openai assistant chatbot

OpenAI launches GPT-4o mini: A smaller, cheaper AI model

By Alex Carter
November 2, 2025
FacebookLike
InstagramFollow
YoutubeSubscribe
TiktokFollow
  • About us
  • Advertise with us
  • Privacy Policy
  • Terms and Conditions
  • Affiliate links Disclaimer
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
AI assistants / Perplexity accused of scraping websites despite explicit blocks
AI assistantsNewsSafety

Perplexity accused of scraping websites despite explicit blocks

AI startups like Perplexity may bypass explicit website restrictions to scrape data, raising ethical concerns.

Alex Carter
ByAlex Carter
AI News & Big Tech Correspondent
Alex Carter writes for Aiholics, keeping readers updated on the fast-paced world of AI and Big Tech. He breaks down important news and developments from the...
- AI News & Big Tech Correspondent
Published: August 4, 2025
5 Min Read
Share
SHARE

It turns out that some AI startups might be pushing the boundaries — or outright ignoring the rules — when it comes to gathering data online. I recently discovered that Perplexity, an AI startup, has been accused of scraping content from websites that explicitly asked not to be crawled. According to a report from internet infrastructure giant Cloudflare, Perplexity’s bots have been circumventing restrictions set by site owners, including ignoring Robots.txt files that tell crawlers where they’re allowed to go.

This discovery shines a light on an ongoing issue in the AI world: how companies collect the massive amounts of data needed to power their large language models and other AI products without clear permission.

Advertisements

Here’s what Cloudflare observed

Cloudflare’s researchers noticed that Perplexity didn’t just scrape content; they actively hid their crawling activities. Instead of transparently identifying themselves as a bot, Perplexity’s systems reportedly masked their identity by changing their “user agent” — a piece of information websites use to figure out who’s visiting. They even switched the network routes, known as autonomous system numbers (ASNs), to avoid detection. Essentially, they wore disguises to sneak into websites that explicitly said, “Don’t crawl here.”

Cloudflare found these tactics happening across tens of thousands of domains, sending millions of requests every day. By combining machine learning techniques with network data, they were able to fingerprint the crawler linked to Perplexity.

“We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked.”

In response, Perplexity’s spokesperson dismissed these findings, suggesting the data didn’t prove any unauthorized access. They even claimed the bot in question wasn’t theirs. However, Cloudflare had received complaints from its customers, who had put up blocks and rules to stop Perplexity’s bots — only to still see them crawling the sites.

Why is this such a big deal?

AI models rely fundamentally on huge datasets to learn — scraping text, images, and videos from the web is a common way they build those datasets. But scraping data without permission, especially when site owners clearly block it, raises serious ethical, legal, and business model questions.

More Read

How AI helped solve the mystery of a missing mountaineer
gmail gemini ai 2026
Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox
chatgpt-health-2026-openai-available-rollout
ChatGPT Health turns OpenAI’s chatbot into a personal health assistant
Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance

Many websites use the Robots.txt standard to communicate their preferences about being indexed or scraped, and these standards are widely respected by traditional search engines. But AI crawlers are disrupting that respect for boundaries — and it’s upsetting the balance many rely on to make money, especially publishers.

Cloudflare itself has recently been vocal about how AI is breaking the internet’s business model, particularly for content creators and publishers who struggle to monetize their work when AI scrapes and reuses it without compensation. In fact, Cloudflare has even launched a marketplace for website owners to start charging AI scrapers, signaling just how serious this issue has become.

Advertisements

Perplexity and the bigger picture

This isn’t the first time Perplexity has been under the spotlight for allegedly scraping content without authorization. Last year, some news outlets accused the startup of plagiarism — a charge that their CEO didn’t fully address when pressed at a major tech conference. Given how much AI depends on web data, and how many content creators rely on clear rules and protections, this ongoing tension will shape the debate around AI’s growth and responsibility.

What’s clear is that AI startups face a tough balancing act: they need data to innovate, but they also have to respect the wishes of those who create that content. The ways companies like Perplexity handle this challenge will probably influence how the web itself evolves in the coming years.

Key takeaways

  • Robots.txt and other web standards are increasingly ignored by some AI crawlers, complicating data ethics.
  • Tech giants like Cloudflare are stepping in to help protect websites and publishers from unauthorized scraping.
  • The tension between AI innovation and respecting content ownership is a defining issue for the future of the internet.

At the end of the day, no one wants an internet where AI companies freely raid content without permission — but they also can’t advance without data. The big question is: how will the ecosystem evolve to ensure everyone’s interests are balanced? I’ll be watching closely as this story unfolds.

TAGGED:AIAI ethicsAI infrastructureAI Modelsmachine learningNewsPerplexityreportstartups
SOURCES:Cloudflare

Sign Up for the Daily AI Pulse

One email a day. All the stories that matter.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Flipboard Whatsapp Whatsapp LinkedIn Reddit Telegram Email Copy Link
ByAlex Carter
AI News & Big Tech Correspondent
Alex Carter writes for Aiholics, keeping readers updated on the fast-paced world of AI and Big Tech. He breaks down important news and developments from the industry's top players, including OpenAI, Google, Meta, Microsoft, and NVIDIA. His goal is to present these updates in a straightforward way that’s easy to understand and genuinely helpful. What makes Alex different is that he's focused on technology that matters to real people's lives, not just for flashy headlines. He demonstrates why each news is important, answering the most important questions for readers: "Why should I care?" From major AI models to big acquisitions and new tools, Alex examines what it means for businesses, society, and end-users. When not reporting, Alex enjoys covering the trends in AI competition, tech ethics, and what's next in digital.
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Trending

FacebookLike
XFollow
TiktokFollow

Your may also like!

artificial intelligence stages
AI futurology

The 10 stages of Artificial Intelligence

european union ai act artificial intelligence regulation 2024
NewsSafety

EU's groundbreaking AI Act: A new era for artificial intelligence regulation

AI Tools and Reviews

ElevenLabs review

AI Tools and ReviewsAnthropicAppleCompanies

How Claude's learning mode is changing the way we think about AI in education

Quick Links

  • About us
  • Advertise with us
  • Privacy Policy
  • Terms and Conditions
  • Affiliate links Disclaimer
Advertise with us

Socials

Follow Aiholics
© 2026 AIholics.com
Accessibility Adjustments

Powered by OneTap

How long do you want to hide the accessibility toolbar?
Hide Toolbar Duration
Colors
Orientation
Version 2.4.0
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
adbanner
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?