Big tech caught using Youtube videos to train AI without permission

AI News & Big Tech Correspondent

Alex Carter writes for Aiholics, keeping readers updated on the fast-paced world of AI and Big Tech. He breaks down important news and developments from the...

- AI News & Big Tech Correspondent

Published: July 17, 2024

3 Min Read

A recent investigation has uncovered that major tech companies, including Apple, Nvidia, and Anthropic, have been using YouTube video subtitles to train their artificial intelligence (AI) models without creators’ permission. This practice goes against YouTube‘s rules and raises serious questions about data ethics in the AI industry.

Key points

Major tech companies used YouTube video subtitles to train AI without permission.
The dataset included over 170,000 videos from more than 45,000 channels.
This practice violates YouTube’s Terms of Service.
The data was part of a larger collection called “The Pile” created by EleutherAI.
Concerns include copyright infringement, use of deleted content, and potential bias in AI models.
Many creators were unaware their content was being used for AI training.
The incident highlights the need for clearer regulations and ethical practices in AI development.

The investigation, conducted by Proof News, revealed that these companies used subtitles from over 170,000 YouTube videos, spanning more than 45,000 channels. The data came from a wide range of sources, including educational channels like Harvard and MIT, news outlets such as the BBC, and popular YouTubers like PewDiePie and MrBeast.

This data was part of a larger collection called “The Pile,” created by a non-profit AI research lab called EleutherAI. The Pile includes various datasets, with YouTube Subtitles being one of them. Companies like Apple, Nvidia, Salesforce, and others used this data to train their AI models, likely unaware of its exact origins.

ai youtube videos dataset training subtitles the pile eleuther ai — The dataset included over 170,000 videos from more than 45,000 channels.

The use of this data raises several concerns. First, it violates YouTube’s Terms of Service, which prohibit using content without creators’ permission. Second, it includes material from deleted videos and channels, potentially infringing on creators’ rights to remove their content from the internet. Lastly, some of the data contains biased or inappropriate content, which could affect the AI models’ outputs.

Many creators were unaware that their content had been used in this way. Some, like Dave Farina of the YouTube channel Professor Dave Explains, argue that companies profiting from creators’ work should provide compensation or face regulation.

The issue extends beyond YouTube. Similar concerns have been raised about AI companies using books and other copyrighted material without permission. Several authors have filed lawsuits against AI companies for alleged copyright violations.

As AI technology continues to advance rapidly, the debate over data usage, creator rights, and ethical AI training practices is likely to intensify. This incident highlights the need for clearer regulations and more transparent practices in the AI industry to protect content creators and ensure responsible AI development.

Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox

ChatGPT Health turns OpenAI's chatbot into a personal health assistant

Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance

9 Bold AI Predictions From Nvidia's Jensen Huang: How AI Will Reshape Wealth, Jobs, and Industry

NVIDIA RTX PRO 5000 72GB Blackwell: Supercharging agentic AI on your desktop

Archives

Categories

Why scaling isn't the whole story: DeepMind's take on AI progress and breakthroughs

Big tech caught using Youtube videos to train AI without permission

Key points

Leave a Reply Cancel reply

Trending

Your may also like!

What GPT-5 means for AI's future: Power, pitfalls, and a new tech era

What to expect from GPT-5: The next wave in AI evolution and how to prepare

The ChatGPT desktop app for macOS is now available for all users

The 10 stages of Artificial Intelligence

Quick Links

Socials

Archives

Categories

Key points

More Read

Sign Up for the Daily AI Pulse

One email a day. All the stories that matter.

Leave a Reply Cancel reply

Trending

Your may also like!

Socials