
If you’ve ever wanted your content to sound more natural, more personal, or closely aligned with a specific brand voice, voice models make that possible.
In HeyGen, voices bring your scripts and avatars to life through AI-generated narration. Every voice you select is powered by a voice engine, a model designed to produce natural, expressive, and language-accurate speech. Choosing the right engine helps you match the tone, pace, and emotion of your message.
What voice models are
A voice model is the underlying AI system that generates speech. It determines how a voice sounds, how expressive it is, how fast it speaks, and how well it performs across languages.
HeyGen offers multiple voice engines, each optimized for different use cases such as training, marketing, storytelling, or localization.
Auto voice engine
The Auto setting allows HeyGen to automatically select the best voice engine based on your video’s language and content. This is a good option if you want reliable results without manually choosing a model.
ElevenLabs voice engine
ElevenLabs delivers studio-quality narration across more than 70 languages, making it suitable for most video and voice projects.
If you’re using a custom voice, you can also choose which voice model powers it for greater control over tone and realism. By default, HeyGen uses the multilingual V3 model from ElevenLabs, which is known for natural expression and strong multilingual performance.
Turbo voice models
For projects that need faster generation, you can switch to one of the Turbo models. These offer lower latency and quicker processing, but are primarily optimized for English content.
Starfish voice engine
Starfish is optimized for Asian languages, including Chinese, Japanese, and Korean. It ensures natural pronunciation and pacing for region-specific content.
Panda voice engine
Panda is HeyGen’s expressive engine, designed for emotional delivery and advanced control. It supports features like Voice Director and Voice Mirroring, allowing precise control over timing, emphasis, and tone.
Fish voice engine
Fish, powered by fish.audio, focuses on expressive English voiceovers. It works well for storytelling, conversational videos, and content that benefits from nuanced delivery.
Together, voice engines and models give you control over how your videos sound, from tone and emotion to speed and linguistic accuracy.