The launch of HeyGen’s AI Studio marks a transformative moment in AI-powered video creation, not just for users but also for the engineering team that built it. Unlike typical product upgrades, AI Studio required an architectural reinvention. At its core is a shift from traditional timeline-based editing to a fully text-centric paradigm. The engineering behind this transition required solving complex challenges while maintaining usability and scalability.
We spoke with Nikunj Yadav, HeyGen’s Head of Product Engineering, to understand what it took to build HeyGen’s most ambitious product to date.
A paradigm shift from timeline to text
HeyGen’s previous studio relied on a traditional timeline-based editor, similar to Final Cut or Adobe Premiere. While effective for manual editing, it clashed with the dynamic nature of AI-generated voiceovers. Since most of HeyGen’s speech is produced using text-to-speech (TTS), even minor script changes could disrupt the timing and misalign visuals like images and animations.
To resolve this, the new AI Studio was built around a text-first editing paradigm. Instead of associating media with fixed timestamps, users now anchor assets directly to specific words in the script. For example, if an image should appear when the avatar says “HeyGen,” the system detects the exact timing of that word in the audio and syncs the visual accordingly.
This required a hybrid architecture: the user-facing interface is word-based, but video rendering remains time-based. To bridge the two, the team engineered a real-time translation layer that maps text-based interactions to time-based outputs. Early efforts involved reusing parts of the legacy canvas, but the fundamental differences between the old and new data models introduced complexity. Adapter layers were created to maintain state and ensure compatibility, but eventually, the team rewrote the canvas to align natively with the text-centric approach.
Frequent rewrites weren’t setbacks. Instead, they were part of a deliberate engineering strategy to refine architecture, boost performance, and prepare the platform for future growth.
Iterative design guided by user behavior
Rather than replicating the old studio feature-for-feature, the team analyzed usage data from the legacy editor and prioritized the most essential functions. This enabled them to ship an internal version in record time, followed by a limited external release for real-world testing.
User feedback was central to development. Screen recordings and analytics revealed pain points, such as users instinctively trying to drag the playhead, prompting the team to add a timeline scrubber. To streamline debugging and accelerate fixes, they implemented LogRocket and introduced an in-app feedback button, giving developers direct insight into user sessions and enabling rapid issue resolution.
Accessibility was a core principle. The editor was designed to serve both casual users and professionals, intentionally avoiding complex timeline tools in the early versions. These were introduced gradually as user behavior warranted them.
The team also made forward-looking decisions. Although animations hadn’t been widely used in the legacy editor, they were prioritized early to expand storytelling potential and give the engineering team hands-on experience with the animation stack, laying the groundwork for more advanced features.
Innovation through voice, gestures, and control


Several standout features in AI Studio required deep technical innovation. Voice mirroring lets users record their voice and replicate its tone and cadence using AI, eliminating the need to select or fine-tune pre-recorded voice options. This feature, initially suggested internally, addressed a major pain point around achieving natural voice delivery.
Gesture control was another major breakthrough. With the text-first model, users can now assign gestures like pointing or nodding to specific words, enabling avatars to sync physical expressions precisely with speech. This wasn’t feasible in the timeline-based system.
Voice Director, built using new OpenAI primitives and HeyGen’s proprietary systems, gives users granular control over voice outputs, allowing them to fine-tune delivery with greater specificity. These innovations were the result of cross-functional collaboration between engineering squads focused on avatars, voice modeling, and the text editor itself.
A scalable foundation for long-term growth
Much of what makes AI Studio successful lies beneath the surface. The modularization and rewriting of the canvas were foundational investments, enabling support for complex animations, media-heavy scenes, and faster render times. As developers gained fluency with the new architecture, feature velocity increased to unlock enhancements like B-roll alignment, background music, and video import trimming.
LogRocket, datadog and other observability tools strengthened the team’s ability to respond quickly to bugs and user issues. These systems transformed debugging from a reactive task into a real-time feedback loop, closing the gap between problem discovery and resolution.
Looking ahead, AI Studio is built for longevity and scale. Its intuitive interface makes video creation accessible to non-experts, supporting HeyGen’s goal of making video storytelling accessible to all. At the same time, its modular, extensible architecture allows the platform to evolve quickly, with new features and UI updates shipping weekly.
As Nikunj put it, “If you were to take snapshots of the text editor along the way, you’d see how much it has changed visually, functionally, and architecturally.” That evolution is a testament to HeyGen’s commitment to building not just a product, but a powerful creative engine for the AI video generation era.