Academy decor
Voice

What are voice models all about

If you have ever wanted your content to sound more natural, more personal, or more closely aligned with a specific brand voice, voice models make that possible.

In HeyGen, voices bring your scripts and avatars to life through AI-generated narration. Every voice you choose is powered by a voice engine, a model designed to produce natural, expressive, and linguistically accurate speech. Selecting the right engine helps you align the tone, pace, and emotion with your message.

What are voice models

A voice model is the underlying AI system that generates speech. It determines how a voice sounds, how expressive it is, how fast it speaks, and how well it performs across different languages.

HeyGen offers multiple voice engines, each optimised for different use cases such as training, marketing, storytelling, or localisation.

Automatic voice engine

The Auto setting allows HeyGen to automatically select the most suitable voice engine based on your video’s language and content. This is a good option if you want dependable results without having to choose a model manually.

ElevenLabs voice engine

ElevenLabs offers studio-quality narration in over 70 languages, making it ideal for most video and voice projects.

If you’re using a custom voice, you can also choose which voice model powers it, giving you greater control over tone and realism. By default, HeyGen uses the multilingual V3 model from ElevenLabs, which is known for its natural expression and strong multilingual performance.

Turbo voice models

For projects that require faster generation, you can switch to one of the Turbo models. These provide lower latency and quicker processing, but are primarily optimised for English content.

Starfish voice engine

Starfish is optimised for Asian languages, including Chinese, Japanese, and Korean. It ensures natural pronunciation and pacing for content tailored to specific regions.

Panda voice engine

Panda is HeyGen’s expressive engine, designed for emotional delivery and advanced control. It supports features such as Voice Director and Voice Mirroring, allowing precise control over timing, emphasis, and tone.

Fish voice engine

Fish, powered by fish.audio, focuses on expressive English voiceovers. It works very well for storytelling, conversational videos, and content that benefits from subtle, nuanced delivery.

Together, voice engines and models give you control over how your videos sound, from tone and emotion to speed and linguistic accuracy.