A guide on getting the most out of HeyGen's Video agent through prompting.
The HeyGen Video Agent is built for Knowledge-Based Creators to create the best explainer videos.
This Prompt Guide is aimed at helping users learn how to best prompt our Video Agent to get the best results. It includes the most basic information your prompts should include, the more advanced controls that you can insert through prompting, as well as a few example sessions that might be useful.
The basics: what video agent actually needs
Before you type a single word, understand the three controls at your fingertips:
Avatar Selection — Pick a specific avatar or let Auto mode find one that fits your content. Pro tip: You can also go avatar-free with voice-over only, but you must explicitly say "no avatar" in your prompt.
Duration — Let Auto decide based on your content or choose 30s, 1min, 2min, etc. (note that agent still follows your prompt / script when it comes to length, so this is not a forced control)
Aspect Ratio — Portrait or landscape. or also leave as Auto.These are your baseline. The magic happens in the prompt itself.
The prompt: more context = better videos
For Video Agent to build high-quality videos for you, at the minimum, use the prompt box to describe the content you are trying to deliver. Here's what a basic prompt looks like:
"Introduce HeyGen to knowledge workers, talk about its Talking Avatar models, how people use it, and mention Video Agent at the end."
“Make a compliance training video and explain phishing in detail, use some examples and list out top watch-outs”
The more context and intent you provide, the better the Video Agent can structure scenes, pacing, and visuals.
The pro move: use your script directly
This is the single biggest upgrade most people miss. You can paste a full video script into the prompt. The Video Agent will largely follow it scene‑by‑scene, while improving flow, timing, and visuals.
Video Agent will follow it scene-by-scene while improving flow, timing, and visuals automatically. Here's a script-driven prompt in action:
Intro (A-roll, motion graphics overlay)
VO: "If your work is mostly explaining things — updates, ideas, decisions — video usually helps, but making it takes too much time."What is HeyGen (motion-graphics B-roll)
VO: "HeyGen helps introverts turn ideas into production-ready videos — without cameras, editing, or studios."Talking Avatars (A-roll + demo cut)
VO: "Our talking avatar models offer realistic and natural delivery using your own digital identity."Use-cases (Motion Graphics list)
VO: "Teams use it for internal training, online education, product explanations, and knowledge sharing."Introducing Video Agent (end beat)
VO: "And with our new Video Agent — one prompt becomes a structured, animated video, end to end."
End card: HeyGen · Empower Knowledge-Based Creators
Note: The agent may make small edits (grammar, pacing) to improve clarity and video flow.
Attachments: give Video Agent reference material
You can upload files to help Video Agent understand your content:
Images & Videos: Product screenshots, existing assets, diagrams, or any media you want included.
Pro tip: Upload your own photo and ask the agent to use it as your talking avatar.
PDFs & Documents: Training materials, research papers, or product docs. The agent will extract key information.
Pro tip: When uploading references, add context about how you want them used. For example:
- "Use the attached product screenshots as B-roll when discussing features"
- "Reference the attached PDF for accurate technical specifications"
Advanced prompting:
PRO Tip: Here's my personal favorite prompt addition. I add this to almost everything:
"Use minimal, clean styled visuals. Blue, black, and white as main colors.
Leverage motion graphics as B-rolls and A-roll overlays. Use AI videos when necessary. When real-world footage is needed, use Stock Media. Include an intro sequence, outro sequence, and chapter breaks using Motion Graphics."
Try adding these to all your prompts and see if you like the results!
But why does this work? Let's break it down!
Define your visual style & colors
Our Video Agent is capable of executing your style requirements consistently. Use style descriptors to guide the visual direction of your entire video.
Example style descriptors:

Defining colors:
You can specify exact color codes and font families for consistent branding:
"Use #1E40AF as primary blue, #F8FAFC as background white, and #0F172A for text. Use Inter font family throughout."
"Stick to our brand colors: coral (#FF6B6B), navy (#2C3E50), and cream (#FFF5E6)"
Why this matters: Defining visual style is critical because it allows the agent to produce consistently styled video end-to-end. Without it, sometimes the visuals can look a bit off from scene to scene.
Motion Graphics, AI Image & Videos & Stock Media
Animated graphic elements: text animations, icons, charts, shapes, transitions.
Best for:
- A-roll overlays: Lower thirds, bullet points alongside avatar, animated callouts
- B-roll scenes: Full-screen animated explanations, data visualizations
- Chapter cards: Section breaks, intros, outros
- Information display: Statistics, comparisons, timelines
Example: "Use motion graphics to display the 5 key benefits as animated bullet points appearing one by one while the avatar speaks."
AI-Generated Images & Videos
Created by generative AI based on your descriptions.
Best for:
- Conceptual illustrations
- Custom scenarios that stock footage won't cover
- Stylized visuals in a particular artistic style
- Product mockups in various contexts
Example: "Generate an AI image showing a futuristic office where humans and AI work together: use this as B-roll for the 'future of work' section."
Stock Media
Real-world footage from stock libraries.
Best for:
- Authentic scenes (real offices, cities, people)
- Industry-specific content (medical, manufacturing, retail)
- Emotional moments
- Establishing shots
Example: "Use stock footage of a busy corporate office for B-roll when discussing workplace productivity."
Quick Reference: The Media Type Matrix

Scene-by-Scene Prompting: Maximum Control
When you need precise output, prompt each scene individually.
Basic structure:
Scene 1: [Scene Type] Visual: [Describe exact visual] VO/Script: "[What the avatar says]" Duration: [Approximate length]
Here's a detailed product launch video example:
Scene 1: Intro (Motion Graphics)Visual: Animated logo reveal with particle effects, brand colors sweep Duration: 3 seconds
Scene 2: Hook (A-roll with overlay)Visual: Avatar on branded background, text overlay "The Future is Here" VO: "What if I told you that creating professional videos just got 10x easier?" Duration: 5 seconds
Scene 3: Problem Statement (Stock Media B-roll)Visual: Stock footage of frustrated person at computer, then clock ticking VO: "Traditional video production takes weeks. Coordinating schedules, booking studios, endless editing rounds..." Duration: 8 seconds
Scene 4: Solution Introduction (A-roll + Motion Graphics overlay)Visual: Avatar speaking, animated product logo appears beside them VO: "Introducing HeyGen Video Agent — your AI-powered video creation partner" Duration: 6 seconds
Scene 5: Feature Showcase (Motion Graphics B-roll)Visual: Animated screen recording style, showing interface with callouts VO: "Simply describe what you want, and watch your video come to life" Duration: 10 seconds
Scene 6: Benefits (Motion Graphics list)Visual: 3 benefits animate in one by one with icons VO: "Save time. Maintain consistency. Scale your content." Duration: 8 seconds
Scene 7: CTA (A-roll)Visual: Avatar, confident pose, CTA text overlay VO: "Try HeyGen Video Agent today and transform how you create videos" Duration: 5 seconds
Scene 8: End Card (Motion Graphics)Visual: Logo, tagline, website URL, social handles Duration: 4 seconds
Real Prompts You Can Steal
Compliance Training:
Use a professional female avatar. Make a compliance training video explaining phishing in detail. Use examples and list top watch-outs. Leverage motion graphics as A-roll overlay and B-roll to help explain core concepts.
Educational Explainer (Voice-Over Only):
Create a 1-minute video about camera aperture. Use minimal science diagrams and visualizations. No avatar needed, only voice-over. Cool neutrals (navy, cyan), thin-line diagrams, and slow elegant motion. B-roll is abstract scientific illustrations. Sequencing: definition → diagram expansion → conceptual layering, with fade-through transitions.
Brand Story (Animated):
Make a video telling the story of how Twitch got started. Use cartoon-style animations and overlays. I want Twitch's iconic colors and fonts. Use motion graphics overlays and AI-generated B-roll.
The Bottom Line
Video Agent isn't magic; it's a production partner that executes your creative direction.
The more specific you are about content, style, media types, and scene structure, the closer you'll get to exactly what you envision.
Start with a script. Define your visual style. Match media types to content types. Prompt scene-by-scene when precision matters.
You had the message. Now you own the production.
Try these prompts with your next video. Then tell us what worked.







