NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AI-FRONTIER...1 min read

The Atlantic created a searchable database of the music used to train AI

Share
NOW LET US Article – The Atlantic created a searchable database of the music used to train AI

The Atlantic has uncovered four massive music datasets used to train AI models and launched a searchable database for the public to check if their songs were used.

Atlantic reporter Alex Reisner recently uncovered four datasets of music being used to train AI models and made them fully searchable for the public. Two of the sets are absolutely enormous at 12 million and 9 million tracks. The other two are much smaller, but still represent a significant amount of training data at over 100,000 songs each.

The Atlantic created a searchable database of the music used to train AI

Millions of tracks are freely available in datasets, even if they’re not supposed to be.

Millions of tracks are freely available in datasets, even if they’re not supposed to be.

According to Reisner, the sets have been downloaded thousands of times and, while it’s impossible to know exactly who has used them, Google and Stability have both confirmed they have in research papers. Some of the sources, like the Free Music Archive dataset, are free to stream for personal use but require licensing for commercial applications.

While the datasets are freely available on the internet in theory, using them as training data is not as simple as downloading a ZIP file and feeding it to an AI model. As Reisner explains:

Three of the datasets I found are distributed as a list of links to songs on YouTube or Spotify. AI developers download the actual audio using tools that automate the job, some of which allow developers to bypass logins, advertisements, and mechanisms that might earn money or subscribers for creators. Such tools violate the terms of service of these platforms.

Names that pop up in the dataset range from pop stars like Lady Gaga and Fred Again.., to Radiohead, Aphex Twin, Wu-Tang Clan, Bruce Springsteen, and experimental composer Hainbach. You can hop over to the Atlantic’s AI Watchdog site and search through the songs, books, and other media being used to train the world’s AI models yourself.

Follow topics and authorsfrom this story to see more like this in your personalized homepage feed and to receive email updates.

© 2026 Now Let Us. All rights reserved.

Source: The Verge AI

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Siri AI Hands On: A Smart, Helpful Assistant

ai-frontier

Siri AI Hands On: A Smart, Helpful Assistant

I spent a day exploring San Francisco with the developer beta of Apple's revamped Siri AI to see if it lives up to the hype.

NOW LET US Related – The film about Sam Altman has been dropped by Amazon MGM

ai-frontier

The film about Sam Altman has been dropped by Amazon MGM

Amazon MGM has reportedly dropped 'Artificial', a film directed by Luca Guadagnino about the dramatic firing and rehiring of OpenAI CEO Sam Altman in 2023.

NOW LET US Related – A startup claims it broke through a bottleneck that’s holding back LLMs

ai-frontier

A startup claims it broke through a bottleneck that’s holding back LLMs

Miami-based AI startup Subquadratic claims its new model, SubQ, has solved a decade-long mathematical bottleneck in LLMs by replacing dense attention with a highly efficient sparse attention mechanism. Independent testing by Appen suggests the technology could drastically cut costs and boost processing speeds.

NOW LET US Related – Barret Zoph is out at OpenAI again after just five months

ai-frontier

Barret Zoph is out at OpenAI again after just five months

Five months after returning to OpenAI to lead its enterprise AI sales, Barret Zoph has departed the company once again, following a brief stint at Mira Murati's rival startup.

NOW LET US Related – How the Peter Thiel-Linked Dialog Club Secretly Ranks Its Members

ai-frontier

How the Peter Thiel-Linked Dialog Club Secretly Ranks Its Members

Leaked internal data reveals that Dialog, a private club cofounded by Peter Thiel, secretly grades and ranks its prominent members using algorithms, wealth, and fame to dictate event pricing, seating, and membership status.

NOW LET US Related – The White House Is Making Up Its Rules for AI in Real Time

ai-frontier

The White House Is Making Up Its Rules for AI in Real Time

The Trump administration's sudden crackdown on Anthropic's advanced AI models reveals an ad-hoc, "Wild West" approach to regulation. As the White House makes up rules in real time, other tech giants are forced to adapt to an unspoken licensing regime.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.