Academy decor
Voice

What are voice models

If you’ve ever wanted your content to sound more natural, more personal, or closely aligned with a specific brand voice, voice models make that possible.

In HeyGen, voices bring your scripts and avatars to life through AI-generated narration. Every voice you select is powered by a voice engine, a model designed to produce natural, expressive, and language-accurate speech. Choosing the right engine helps you match the tone, pace, and emotion of your message.

What voice models are

A voice model is the underlying AI system that generates speech. It determines how a voice sounds, how expressive it is, how fast it speaks, and how well it performs across different languages.

HeyGen offers multiple voice engines, each optimized for different use cases such as training, marketing, storytelling, or localization.

Auto voice engine

The Auto setting allows HeyGen to automatically select the most suitable voice engine based on your video’s language and content. This is a good option if you want consistent, reliable results without having to choose a model manually.

ElevenLabs voice engine

ElevenLabs offers studio-quality narration in over 70 languages, making it ideal for most video and voice projects.

If you’re using a custom voice, you can also choose which voice model powers it for greater control over tone and realism. By default, HeyGen uses the multilingual V3 model from ElevenLabs, which is known for natural expression and strong multilingual performance.

Turbo voice models

For projects that require faster generation, you can switch to one of the Turbo models. These offer lower latency and quicker processing, but are primarily optimised for English content.

Starfish voice engine

Starfish is optimised for Asian languages, including Chinese, Japanese, and Korean. It ensures natural pronunciation and pacing for region-specific content.

Panda voice engine

Panda is HeyGen’s expressive engine, designed for emotional delivery and advanced control. It supports features such as Voice Director and Voice Mirroring, allowing precise control over timing, emphasis, and tone.

Fish voice engine

Fish, powered by fish.audio, focuses on expressive English voiceovers. It works well for storytelling, conversational videos, and content that benefits from nuanced delivery.

Together, voice engines and models give you control over how your videos sound, from tone and emotion to speed and linguistic accuracy.