Microsoft open-sources advanced speech AI

Microsoft has launched VibeVoice, a "frontier-level" open-source project for speech synthesis and speech recognition. By making the technology freely available via GitHub, Mi...

Håkon Berntsen 3. June 2026 2 min read

Microsoft open-sources advanced speech AI — Illustrasjon: AI-generert

Microsoft has launched VibeVoice, a "frontier-level" open-source project for speech synthesis and speech recognition. By making the technology freely available via GitHub, Microsoft is challenging the trend of proprietary AI models and democratising access to advanced speech technology.

What is "Frontier Speech AI"?

The term "frontier" is used for AI models that represent the best available in their field. VibeVoice is positioned as a leading solution for:

Speech synthesis (text-to-speech)
Speech recognition (speech-to-text)
Real-time translation
Synthetic media

What makes VibeVoice special is that it is fully open source – anyone can view, modify and use the code without licence fees.

Why is open source important?

When Microsoft releases "frontier" models as open source, several things happen:

Innovation accelerates: Developers around the world can build on top of the technology
Smaller companies gain access: Solutions that were previously expensive become free
Localisation becomes easier: Languages such as Norwegian, Sami and other minority languages can get better support
Transparency increases: Anyone can inspect the code for security and bias

Earlier release: VibeVoice-ASR

Microsoft already launched VibeVoice-ASR (Automatic Speech Recognition) in January 2026. That model was specialised in long-form audio, perfect for podcasts, meeting recordings and transcription.

The latest version (VibeVoice 1.5B) can:

Generate up to 90 minutes of speech
Use four distinct voices
Produce natural-sounding synthetic speech from a single text prompt

Implications for Norwegian technology

For Norwegian developers and companies, VibeVoice opens the door to:

Better Norwegian TTS for accessibility systems
Free speech recognition for start-ups
Sami speech synthesis (by fine-tuning the model)
Competitive AI products without licensing costs

Challenging OpenAI and Google

Microsoft's open-source strategy stands in contrast to OpenAI (partly owned by Microsoft itself) and Google, who keep their best speech models proprietary. By "opening up" frontier models, Microsoft hopes to:

Establish GitHub as the primary hub for AI innovation
Build a developer community around its tools
Remain competitive even if other companies have better proprietary models

Sources:

AIToolly (31 March 2026)
Microsoft GitHub documentation
Reddit /r/StableDiffusion

Related topics: #Microsoft #OpenSource #SpeechSynthesis #AI #Norway #GitHub

Håkon Berntsen

Microsoft open-sources advanced speech AI

What is "Frontier Speech AI"?

Why is open source important?

Earlier release: VibeVoice-ASR

Implications for Norwegian technology

Challenging OpenAI and Google

Related stories

Svalbard Global Seed Vault passes 1.4 million samples

From science to reality: Sony's AI robots match humans

Deepfake technology now in real time: What does it mean for security?