NVIDIA’s new multilingual speech AI: opening doors for 25 European languages

Granary provides nearly 1 million hours of clean, multilingual speech data for 25 European languages.

AI Tools, Prompts & Practical AI Expert

Leo Martins is the AI Tools & Practical AI Expert at Aiholics, focused on helping readers use artificial intelligence to improve productivity, creativity and everyday work....

- AI Tools, Prompts & Practical AI Expert

Published: August 15, 2025

5 Min Read

Have you ever wondered why AI speech recognition and translation often overlook many European languages? With nearly 7,000 spoken languages worldwide, only a tiny fraction get solid AI support. But recently, I came across exciting news from NVIDIA that could seriously shake things up for speech AI and multilingual tech.

NVIDIA just released Granary – a huge open dataset boasting around 1 million hours of multilingual audio — alongside two new AI models designed to power high-accuracy speech transcription and translation across 25 European languages. What’s particularly cool is that this isn’t just about the popular languages but also those less talked about like Croatian, Estonian, and Maltese.

Breaking down barriers with the Granary dataset

One of the biggest challenges in speech AI is data scarcity, especially for languages without large annotated datasets. Granary tackles this head-on by combining and refining publicly available speech data through a clever pipeline that doesn’t rely on intensive human labeling. This pipeline, powered by NVIDIA’s NeMo Speech Data Processor toolkit, transforms unlabeled audio into clean, structured datasets primed for training.

The impact? Developers get a massive, ready-to-use resource that covers not just the European Union‘s 24 official languages, but also Russian and Ukrainian. This breathes life into languages that traditionally lagged in AI support, enabling inclusive and expansive speech technologies. According to the researchers, Granary requires about half as much training data to reach target accuracy compared to older popular datasets – a big efficiency win.

The models powering high-quality, real-time speech AI

Along with Granary, NVIDIA rolled out two standout models showcasing what’s possible. First up, there’s Canary-1b-v2, a billion-parameter model optimized for top-notch transcription and translation across those 25 languages. It’s reported to match the quality of models three times its size but runs inference up to 10 times faster – a remarkable feat for production-scale use.

Then there’s Parakeet-tdt-0.6b-v3, which is a more streamlined 600-million-parameter model tailored for fast, real-time transcription. It can process long audio clips in single passes and automatically detect the language without extra prompting – perfect for scenarios demanding high throughput like multilingual chatbots or customer service agents.

Both models feature refined outputs with accurate punctuation, capitalization, and word-level timestamps, ensuring that the transcriptions aren’t just fast but also polished.

What this means for speech AI developers and users

What I find most inspiring is NVIDIA’s open approach. By sharing the Granary dataset and the two models openly, they’re empowering the global community of speech AI developers to build and adapt tools for a wide range of languages and applications.

This kind of collaboration means faster innovation cycles, better AI quality for less-resourced languages, and more inclusive tech that extends beyond the typical handful of global languages. For everyday users, it hints at a future where multilingual voice assistants, translation services, and customer support feel natural and effective no matter what language you speak.

NVIDIA’s Granary cuts required training data by about half while expanding coverage to 25 European languages — including those underrepresented before.

Plus, the use of the NVIDIA NeMo suite throughout this work underscores how modular AI toolkits can accelerate complex projects, making it easier for teams to filter high-quality data and fine-tune models efficiently.

Key takeaways

Granary is an open-source dataset with around 1 million hours of curated multilingual speech data, addressing language data scarcity, especially for lesser-supported European languages.
NVIDIA’s Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models demonstrate how to balance accuracy and speed for different speech AI needs, from transcription to translation.
The open, accessible approach aims to democratize speech AI development and accelerate innovation across a wider language spectrum.

In the end, this initiative shines a light on the power of combining massive data, smart pipelines, and efficient models to push the boundaries of what speech AI can do — making tech more inclusive and useful for millions of people across Europe and beyond.

Gmail enters the Gemini era: AI Overviews, smarter replies, and a cleaner inbox

ChatGPT Health turns OpenAI's chatbot into a personal health assistant

Nvidia fast-tracks Vera Rubin chips, promising a 5x jump in AI performance

9 Bold AI Predictions From Nvidia's Jensen Huang: How AI Will Reshape Wealth, Jobs, and Industry

NVIDIA RTX PRO 5000 72GB Blackwell: Supercharging agentic AI on your desktop

Archives

Categories

More than half of banking jobs could be automated by AI, but banks will be slow to adopt, a Citi report reveals.

NVIDIA’s new multilingual speech AI: Opening doors for 25 European languages

Granary provides nearly 1 million hours of clean, multilingual speech data for 25 European languages.

Breaking down barriers with the Granary dataset

The models powering high-quality, real-time speech AI

What this means for speech AI developers and users

Key takeaways

Leave a Reply Cancel reply

Trending

Your may also like!

The 10 stages of Artificial Intelligence

The ChatGPT desktop app for macOS is now available for all users

Luma AI launches AI-powered text-to-video generation platform Dream Machine

Elon Musk's Grok Imagine: Bringing AI-generated videos to the masses

Quick Links

Socials

Archives

Categories

Breaking down barriers with the Granary dataset

The models powering high-quality, real-time speech AI

More Read

What this means for speech AI developers and users

Key takeaways

Sign Up for the Daily AI Pulse

One email a day. All the stories that matter.

Leave a Reply Cancel reply

Trending

Your may also like!

Socials