AI that connects, adapts, and inspires
At HeyGen, our AI research is driven by a mission to make visual storytelling accessible to everyone.



Meet our AI leaders
Innovating at the intersection of creativity, communication, and technology.
Charly Hong
Head of AI Research
Charly Hong brings over a decade of expertise in computer vision and AI, with a focus on human modeling, understanding, and video generation. He has authored over 60 publications and patents, underscoring his commitment to innovation and impactful solutions. At HeyGen, Charly drives advancements in AI technology that seamlessly bridge research and product.


Rong Yan
Chief Technology Officer
Rong Yan is the CTO of HeyGen, dedicated to making visual storytelling accessible to all. Previously, he was VP of Engineering at HubSpot, leading Data and Intelligence products, and held leadership roles at Snapchat, Square, and Facebook. Rong earned his M.Sc. (2004) and Ph.D. (2006) from Carnegie Mellon. A prolific researcher, he has over 60 publications, 35 patents, and expertise in AI, data mining, and computer vision.


Joshua Xu
Chief Executive Officer
Joshua Xu is the Co-Founder and CEO of HeyGen, driving the mission to transform visual storytelling with AI-powered content creation. Previously, he was a lead engineer at Snapchat (2014–2020), spearheading innovations in ads ranking, machine learning, and computational photography. With a Master’s in Computer Science from Carnegie Mellon, Joshua brings deep expertise in machine learning, computer vision, and generative AI.


Jun-Yan Zhu
Advisor
Jun-Yan Zhu is the Michael B. Donohue Assistant Professor of Computer Science and Robotics at Carnegie Mellon University, where he leads the Generative Intelligence Lab. His research focuses on generative models, computer vision, and graphics, with the mission of empowering creators with generative models. He has received the Samsung AI Researcher of the Year, the Packard Fellowship, the NSF CAREER Award, among other awards.



Our research pillars: shaping tomorrow’s AI
Redefining digital identity with precision and quality
Our focus on avatar generation emphasizes controllability, consistency, and unparalleled quality. By advancing AI-driven creation, we enable avatars to mirror human expressions and behaviors seamlessly, bridging the gap between reality and the digital world.
Multimodal language models powering video intelligence
We build multimodal language models that jointly reason over text, audio, and visual signals to better understand intent and context. This foundation powers video translation with improved semantic fidelity, enables avatar modeling that stays consistent across scenes, and unlocks video agents that can interpret goals and generate end-to-end content with higher reliability.
Breaking language barriers with multimodal solutions
Leveraging AI to create multimodal video translation solutions, we aim to make global communication more accessible. By seamlessly integrating text, voice, and visuals, we transform videos into universally understandable content, empowering cross-cultural connection.
Real-time engagement through multimodal innovation
Enabled by real-time rendering and advanced multimodal solutions, our interactive avatars bring conversations to life. These avatars not only respond dynamically but also redefine user interaction, making technology more engaging and human-like.
Emotion AI for expressive, realistic digital humans
Emotion AI helps our systems go beyond “talking” to truly communicating — by aligning what the script means with how it should feel. By coordinating tone and prosody in voice with on-point gestures and facial expressions, we generate avatars that maintain emotional coherence over time, closing the realism gap and pushing the frontier of human-like presence.
Agentic systems enabling video agents at scale
We develop agentic systems that turn video creation into a goal-driven workflow: planning, tool use, iteration, and verification. These capabilities power our video agent, allowing it to break down user intent into steps, make informed decisions along the way, and produce more controllable, consistent outcomes—while supporting safety and quality constraints in real production settings.


"We’re engineering AI that is not only powerful but also trustworthy and easy to use. Our goal is to redefine what’s possible with AI video generation, making it indispensable for businesses and delightful for users."



