Skip to content
NB EN
Nettsak

Microsoft open-sources advanced speech AI

Microsoft has launched VibeVoice, a "frontier-level" open-source project for speech synthesis and speech recognition. By making the technology freely available via GitHub, Mi...

Håkon Berntsen 2 min read
Microsoft open-sources advanced speech AI
Illustrasjon: Nettsak

Microsoft has launched VibeVoice, a "frontier-level" open-source project for speech synthesis and speech recognition. By making the technology freely available via GitHub, Microsoft is challenging the trend of proprietary AI models and democratising access to advanced speech technology.

What is "Frontier Speech AI"?

The term "frontier" is used for AI models that represent the best available in their field. VibeVoice is positioned as a leading solution for:

  • Speech synthesis (text-to-speech)
  • Speech recognition (speech-to-text)
  • Real-time translation
  • Synthetic media

What makes VibeVoice special is that it is fully open source – anyone can view, modify and use the code without licence fees.

Why is open source important?

When Microsoft releases "frontier" models as open source, several things happen:

  1. Innovation accelerates: Developers around the world can build on top of the technology
  2. Smaller companies gain access: Solutions that were previously expensive become free
  3. Localisation becomes easier: Languages such as Norwegian, Sami and other minority languages can get better support
  4. Transparency increases: Anyone can inspect the code for security and bias

Earlier release: VibeVoice-ASR

Microsoft already launched VibeVoice-ASR (Automatic Speech Recognition) in January 2026. That model was specialised in long-form audio, perfect for podcasts, meeting recordings and transcription.

The latest version (VibeVoice 1.5B) can:

  • Generate up to 90 minutes of speech
  • Use four distinct voices
  • Produce natural-sounding synthetic speech from a single text prompt

Implications for Norwegian technology

For Norwegian developers and companies, VibeVoice opens the door to:

  • Better Norwegian TTS for accessibility systems
  • Free speech recognition for start-ups
  • Sami speech synthesis (by fine-tuning the model)
  • Competitive AI products without licensing costs

Challenging OpenAI and Google

Microsoft's open-source strategy stands in contrast to OpenAI (partly owned by Microsoft itself) and Google, who keep their best speech models proprietary. By "opening up" frontier models, Microsoft hopes to:

  • Establish GitHub as the primary hub for AI innovation
  • Build a developer community around its tools
  • Remain competitive even if other companies have better proprietary models

Sources:

  • AIToolly (31 March 2026)
  • Microsoft GitHub documentation
  • Reddit /r/StableDiffusion

Related topics: #Microsoft #OpenSource #SpeechSynthesis #AI #Norway #GitHub

Related stories