Q: How can I create custom avatars from photos for personalised videos?

Yes, here are the steps to use the Photo Avatar API Upload Existing Photos (via Upload Asset API) Create an Avatar Group:Create an Avatar Group by grouping photos of the same person. (optional) Train the Avatar Group: Once your Avatar Group is created, you can train the model to recognise the subject's unique features, expressions, and other elements to ensure it generates realistic avatars. If you would like to use Template Video Generation to create personalised videos at scale, you can use or replace the avatar by following this guide. If you want to use the Avatar Video Generation, you can use or plug in the avatar by following this guide.

Q: What localisation capabilities does the API offer?

The API offers extensive global reach, supporting over 40 languages and a library of more than 300 diverse voices, enabling smooth cross-border communication. Beyond simple text-to-speech in different languages, the platform provides "Video Translation" features that can take an existing video and translate the audio while simultaneously re-syncing the avatar’s lip movements to the new language. This ensures that the visual performance remains just as authentic in Spanish or Japanese as it was in the original English recording. HeyGen Video Translation can support 175 languages and dialects (according to here).

Question 1

What is the difference between the Video Agent API and the standard Video Generation API?

Accepted Answer

The key difference is in how you balance automation with detailed, hands-on control.

The Video Agent API takes a single text prompt and triggers the autonomous orchestration of avatar creation, script writing, and visual asset creation and layouts. It offers a full range of precise control while also allowing complete creative freedom. Ideal for large-scale content exploration, internal video creation, and automation. It is a truly unique offering in the entire industry.

In comparison, the Standard Video Generation APIs have two main parts: 1) Avatar Video Generation and 2) Template Video Generation. Developers create Avatars and Video Templates using HeyGen’s web platform, which can then be used by the API. Even though it requires more setup, these APIs provide the precise control needed for brand-consistent, high-production-value assets. Enterprise customers have created millions of videos through them to automate their content pipelines.

Question 2

How can I create custom avatars from photos for personalised videos?

Accepted Answer

Yes, here are the steps to use the Photo Avatar API

Upload Existing Photos (via Upload Asset API)
Create an Avatar Group:Create an Avatar Group by grouping photos of the same person.
(optional) Train the Avatar Group: Once your Avatar Group is created, you can train the model to recognise the subject's unique features, expressions, and other elements to ensure it generates realistic avatars.
If you would like to use Template Video Generation to create personalised videos at scale, you can use or replace the avatar by following this guide.

If you want to use the Avatar Video Generation, you can use or plug in the avatar by following this guide.

Question 3

Can I generate AI avatars from text descriptions without uploading any photos?

Accepted Answer

Yes, the API supports pure text-to-avatar generation through a structured descriptive framework that removes the need for external image assets. By providing specific parameters across eight required fields—including age, gender, ethnicity, and style—the AI creates a unique, high-resolution persona. For example, selecting "East Asian" ethnicity with a "Professional" style and "Cinematic" lighting will prompt the engine to return a selection of unique Avatars and Looks, effectively enabling enterprises to scale diverse cast libraries that do not exist in the real world.

You can follow this guide here for prompt-to-avatar.

Question 4

How does the template system work for batch personalised videos?

Accepted Answer

The template system is designed for high‑efficiency “mail‑merge” style video production, where a master layout acts as a container for dynamic data. Users first create or select a template via the Dashboard or API, then identify specific placeholders for text, images, or audio. By sending a single POST request to the template’s generate endpoint with a JSON payload of variables, the system automatically renders unique video files for each recipient, making it a leading solution for personalised sales outreach and customised customer onboarding at scale.

Question 5

What is the recommended way to match voices with avatars?

Accepted Answer

To ensure the highest level of realism and lip-sync accuracy, the recommended "Golden Path" is to programmatically retrieve and utilise the default_voice_id associated with a specific avatar. This method guarantees that the vocal characteristics—such as gender, tone, and regional accent—are already optimised for that avatar's visual persona, significantly reducing the risk of "uncanny valley" effects. If a custom voice is required, developers should always filter the v2/voices list to match the avatar’s metadata to maintain audio-visual consistency.

Question 6

How should I manage long video generation times in a production environment?

Accepted Answer

Because high-fidelity AI video rendering is a resource-intensive process that can take several minutes, the API is designed to be used with an asynchronous, event-driven architecture via Webhooks. Instead of holding an open connection (which can lead to timeouts), your application should register a webhook URL to receive an automated "push" notification once the avatar_video.success event is triggered. This allows your backend to remain performant and process the video—via the provided video_url—only when it becomes available.

Question 7

What localisation capabilities does the API offer?

Accepted Answer

The API offers extensive global reach, supporting over 40 languages and a library of more than 300 diverse voices, enabling smooth cross-border communication. Beyond simple text-to-speech in different languages, the platform provides "Video Translation" features that can take an existing video and translate the audio while simultaneously re-syncing the avatar’s lip movements to the new language. This ensures that the visual performance remains just as authentic in Spanish or Japanese as it was in the original English recording.

HeyGen Video Translation can support 175 languages and dialects (according to here).

Build scalable video infrastructure with the HeyGen API

Trusted by 80% of Fortune 100 companies for scale, security, and speed

100%

40%

4x

< 3

30

60

Enterprise-grade video intelligence

Proofreading API

Video Translation API

Video Agent API

Video Generation API

Text-to-Speech API

Template API

Simple for developers. Fast for teams to deliver

Programmatic video for every department, tailored to your needs

Learning & Development

Sales enablement

Marketing

Training & support

Enterprise API security, reliability, and control

SOC 2 Type II and GDPR

99.8% uptime

Dedicated support

Role-based access control

Get enterprise discounted rate

HeyGen Skills: Train your AI agent on best practices for using the HeyGen API & MCP

Integrate AI video generation into creator workflows

Have questions? We have the answers