background leftbackground right

HeyGen Bootcamp: How to scale and localize your message globally

Tony Faccenda
Written byTony Faccenda
Last UpdatedDecember 16, 2025
Promotional image for HeyGen Bootcamp featuring Andrei Gusev, Research Engineer, and Onee Yekeh, Product Manager, alongside the text 'Scale your message globally'.
Create AI videos with 230+ avatars in 140+ languages.
Summary

Learn how to use HeyGen video and audio dubbing, precision lip sync, and multilingual avatar videos to translate content and scale your message worldwide.

If you’re already creating great videos in HeyGen, translation is where things really start to scale.

This Bootcamp session with Onee Yekeh (Product Manager) and Andrei Gusev (Research Engineer) was all about one question: How do you take the videos you already have and make them work in every market you care about without rebuilding everything from scratch?

Here’s the breakdown of what they covered and how to think about translation inside HeyGen.

Two ways to translate: Audio dubbing vs video dubbing

When you go to create → translate a video in HeyGen, you’ll see two options:

  • Video dubbing
  • Audio dubbing

They both start the same way:

  1. Upload a video file or paste a YouTube / Google Drive link
  2. Choose your target language(s)
  3. Click translate

But what you get back is different.

What video dubbing does

Video dubbing is the full experience:

  • Translates your speech into the target language
  • Clones your voice
  • Generates the new audio
  • Resyncs your lips and facial movements to match the new language

This is what most people think of as the “magic” HeyGen translation feature: your face, your voice, speaking another language with natural lip sync.

Video dubbing is ideal when:

  • Your face is clearly visible and speaking to the camera
  • Lip sync realism matters (courses, CEO messages, marketing videos)
  • You want viewers to feel like you really recorded in that language

What audio dubbing does

Audio dubbing:

  • Translates and dubs the audio
  • Keeps your original video visuals exactly as they are
  • Does not adjust the lips

This is useful when:

  • The speaker is small on screen or not the main focus
  • You’re working with footage that doesn’t need lip sync (screen recordings, B-roll-heavy edits)
  • You care more about quick multilingual audio than hyper-realistic lips

You can access both from:

  • Create → translate a video, or
  • Apps → translate video in the HeyGen interface

Speed vs precision engines: When to choose which

Inside video dubbing, HeyGen gives you two translation engines:

  • Speed
  • Precision

They both translate your video, but they optimize for different things.

Speed engine: For everyday use and fast turnaround

The speed engine is designed to optimize:

  • Cost
  • Latency (how long it takes)
  • Standard-quality lip sync

Roughly speaking:

  • A 1-minute input video takes about 4–12 minutes to process
  • It’s great for:
    • Talking-head videos with moderate movement
    • Day-to-day content
    • Social clips, regular updates, internal comms

If your video doesn’t have wild motion, multiple speakers, or tricky angles, speed will usually do the job.

Precision engine: For quality, complexity, and tough footage

The precision engine is built for maximum quality. It uses HeyGen’s latest, more advanced models to handle:

  • Wide angles and side angles
  • Lots of head movement
  • Objects partially blocking the face (hands, mics, props)
  • Multiple speakers with tighter timing requirements
  • Lower-quality original audio

It’s particularly good for:

  • Multi-speaker videos (panels, interviews, group content)
  • Educational content (lectures, explainer videos)
  • Footage where the camera or subject isn’t perfectly framed straight-on

If your video is visually or structurally complex, precision is the engine you want to try.

Main languages vs local variants: How target language actually works

When you pick your target language, you’ll often see:

  • A “general” version of the language (e.g., English general)
  • One or more locale-specific variants (e.g., English US, English UK)
  • Additional accents or regional forms in a “second tier”

Andrei explained why: there’s a trade-off between voice similarity and accent consistency.

When to use the general language option

General options (like English general) are optimized to:

  • Preserve your speaker identity as much as possible
  • Keep your voice clone sounding like you in the new language

Choose general when:

  • You care about the translated voice sounding recognizably like the original speaker
  • The exact accent (US vs UK vs neutral global) matters less than identity

When to use locale-specific variants

Locale options (like English US, English UK) focus on:

  • Accent consistency
  • Naturalness for that specific region

Choose locale variants when:

  • You’re targeting a specific market and want the accent to match audience expectations
  • You’re less concerned about the voice being a perfect match to your original and more focused on local feel

In other words:

  • General → “Make it sound like me”
  • Locale-specific → “Make it sound like a native of this market”

Advanced translation settings: Collections, captions, and more

Out of the box, you can:

  1. Upload your video
  2. Choose a target language
  3. Click translate

But there’s also an advanced options panel that’s worth knowing about.

Key settings include:

  • Dynamic duration
    • Improves quality by flexing the video’s duration slightly to better fit the translated speech timing
  • Lip sync toggle
    • Turn lip sync on or off if you want only audio changes
  • Remove background sound
    • Optionally strip or reduce the original background audio
  • Resolution
    • Maintain resolution up to 4K for the output
  • Collections and multilingual player
    • If you select multiple target languages and keep collection turned on:
      • HeyGen creates a group of translations
      • You get a multilingual player where viewers can switch between languages (e.g., French, Arabic, English original)
  • Captions and enhanced voice
    • You can enable captions, enhance the voice track, and more, depending on your use case

You can always add more translations later by hitting modify on an existing translation, selecting new languages, and regenerating within the same collection.

Proofread: Edit translations before you generate the final video

Sometimes “translate and go” isn’t enough. Maybe:

  • You’re working with regulated content
  • The script is heavy with domain-specific terms
  • You have in-house language experts who need to review

That’s where proofread comes in.

How proofread works

When you start a translation:

  1. Upload your video
  2. Choose your target language(s)
  3. Instead of clicking translate, click review and edit

HeyGen will:

  • Process your video and generate:
    • Input transcription (original language)
    • Output transcription (translated language)

In the proofread interface (available for team and enterprise users), you can:

  • See both original and translated text side by side
  • Edit the translated text directly
  • Use search and replace for bulk fixes (e.g., standardizing a term)
  • Adjust timing in the timeline, nudging speech speed or segment splits
  • Preview the result in context
  • Swap the voice used for the translation, if needed

Once you’re happy, click generate result and HeyGen creates the final dubbed video with your reviewed translation baked in.

Inviting native speakers to review

For teams working with external reviewers or local experts, there’s an invite proofreader option:

  • Give someone view or edit access to a specific project.
  • They can:
    • Log in
    • Review and adjust transcriptions
    • Preview audio and timing
    • Approve and generate the final video

This is especially useful when you don’t speak the target language yourself but want a native speaker to sign off.

Handling multiple speakers and overlapping speech

What about videos with more than one person?

According to Andrei:

  • Multiple speakers are supported
    • The system can:
      • Identify different speakers
      • Translate each separately
      • Keep timing aligned
  • The hardest case is overlapping speech (people talking over each other)
    • This is challenging even for humans
    • If speech is overlapping heavily, expect some limitations

Rule of thumb:

  • If you can clearly tell who is saying what, HeyGen’s translation engine will generally handle it well

Multilingual avatar videos vs video translation: What’s the difference?

This is a big source of confusion, so Onee broke it down clearly.

There are two different translation concepts in HeyGen:

  1. Video translation (dubbing)
  2. Multilingual avatar videos in AI Studio

They both involve language, but they work in different layers.

Video translation: transform an existing video

Video translation is what we covered above:

  • You start with finished footage (camera-shot or previously generated)
  • You translate the audio track (and optionally lips)
  • You get back a dubbed version of that same video in another language

This is post-production: You’re changing a rendered video.

Multilingual avatar video: translate the script and regenerate

Multilingual avatar video lives inside AI Studio and works differently:

  1. You have an existing AI Studio draft (avatar + script + elements)
  2. You click translate in the studio
  3. You choose target languages
  4. HeyGen:
    • Translates your script
    • Translates text on the canvas (titles, labels, etc.)
    • Creates new drafts for each language

From there:

  • You can open the new draft in, say, French
  • Review or tweak the translated script and on-screen text
  • Click generate to create a video natively authored in that language

Key difference:

  • Video translation → Takes a finished video and dubs it
  • Multilingual avatar video → Translates the project (script + visuals) and generates a fresh video in that language

They’re complementary, not competing. Use:

  • Video translation when:
    • You already have a final video file
    • You don’t want to reopen or redesign the project
  • Multilingual avatar video when:
    • You’re still in AI Studio
    • You want on-screen text, captions, and layout to change with the language
    • You’re comfortable generating separate drafts per language

Real-world improvements: What the precision engine can handle now

Onee closed the session by showing what the newest precision model can handle in practice:

  • A science demo with a cloth in front of the face → Precision kept lip sync natural even with facial occlusion
  • A clip with lots of head movement and angle changes → The lips stayed aligned as the speaker moved
  • A wide-angle shot and side-profile video → The model applied lip sync convincingly even when the face wasn’t straight-on
  • A lower-quality original video → Precision still produced realistic lip sync in the translated version

The takeaway: If your footage is even slightly “difficult” (movement, side angles, objects, multiple speakers), precision is worth trying first.

Best practices for multilingual workflows in HeyGen

To wrap it up, here’s a simple way to think about your translation options:

  1. Start with what you have.
    • If it’s a finished video file → use video dubbing
    • If it’s an AI Studio project → consider multilingual avatar drafts
  2. Pick the right engine.
    • Speed for simple talking heads, everyday content
    • Precision for complex visuals, multi-speaker, or “hero” pieces
  3. Choose the right language variant.
    • General for preserving your voice identity
    • Locale variants for specific markets and accents
  4. Use proofread when quality really matters
    • Side-by-side text
    • Native reviewers
    • Voice swaps and timing adjustments
  5. Keep languages separated by scene when mixing
    • HeyGen can technically handle multiple languages in one scene
    • But for right-to-left and left-to-right combos (like Arabic + English), it’s usually easier to keep them in separate scenes so animation markers stay sane

If you’re already comfortable creating videos with avatars in HeyGen, translation is the lever that lets you:

  • Reach new markets
  • Repurpose your best content without reshooting
  • Give every region a version that feels made for them

The simplest way to start:

  1. Take a one-minute video you already like
  2. Go to translate a video
  3. Choose video dubbing, pick precision, and select one target language
  4. Let it process, then watch the multilingual player

Once you see yourself, or your avatar, speaking another language naturally, it becomes obvious how much global leverage you can get from every single video you make.


Continue Reading

Latest blog posts related to HeyGen Bootcamp: How to scale and localize your message globally.

Start creating videos with AI

See how businesses like yours scale content creation and drive growth with the most innovative AI video.

CTA background