The Autoplay Roadmap: How Our Thinking Evolved
When we started Autoplay, the dream was simple: real-time automation in software.
A chatbot that could take a command - “Schedule this report for me” - and then go do it, directly in the UI.
But we ran into a hard truth: copilots don’t fail because the AI can’t click the right buttons. They fail because people don’t know what to ask.
Users don’t know what they don’t know. They don’t know which features exist, what’s possible, or how others in their org are using the product. And even if they did, they only want answers that are relevant to them in that moment. That was the core hypothesis Autoplay was built on:
Users don’t know what they don’t know
Users only want to be shown or told what is relevant to them and what they don’t already understand
Users want to understand how others in their organization/team use the same software
Users will enjoy using the product more if it’s presented through a gamified experience
That’s why we shifted from “do-it-for-me” copilots → into “show-me-why” intent detection.
Step One: Recording the Right Data
At first, we tried the brute force route: screenshots and computer vision.
It was slow, expensive, and it missed crucial context like hover time, mouse drift, hesitation.
Then Monday.com gave us a better idea:
“Why not plug into our session replays instead? See if AI can do more there.”
That unlocked everything.
Atlas: Why Session Replays Became Our Wedge
We realized something obvious in hindsight:
Product teams hate watching session replays, but they have to, because that’s where the “why” lives.
So we built Autoplay Atlas.
It takes raw replays and turns them into high-impact insights:
Where users hesitate
What they know vs. don’t know
What goals they’re trying to achieve
Instead of hours of replay watching, you get immediate clarity.
Building the Models
We started by converting video into frames and training models on hesitation, knowledge, intent.
Early versions meant uploading FullStory or Sentry links and getting back a summary.
From there, we layered on conversational intelligence into individual sessions: a chatbot you could ask,
“What can you tell me about what they user is trying to do”
By November, our models could reliably detect intent. Genie (our UI assistant) was starting to adapt in real time. We began asking customers whether they wanted us to integrate with existing replay tools (PostHog, FullStory, Hotjar) or replace them outright.
Scaling the Pipeline
February
Shipped smarter NLP - questions like “Show me hesitation during onboarding filters” just worked.
Knowledge and hesitation models got sharper.
Added search across sessions + hypothesis testing.
April
Redesigned the timeline for clarity.
Launched collection summaries.
Search was upgraded with unsupervised clustering - surfacing emergent goals and golden path deviations.
PostHog came on board to help us scale infrastructure and co-sell.
May
Golden Path shipped: define your ideal workflow, then instantly see where users deviate.
We refined our ICP: PLG martech tools with clear adoption paths (campaigns, integrations, automations). Complex enough for AI to matter, narrow enough to measure impact.
Outbound focused here → first commit in from Sendlane.
Thinking in First Principles: TERRA
As we mapped intent at scale, we developed TERRA:
Task, Event, Representation, Reasoning Architecture.
It’s the most efficient way we’ve found to connect raw clicks → to meaningful goals.
And it powers everything from hypothesis testing to real-time copilots.
Hypothesis Testing: One Flow, Not Ten Tools
The current adoption problem:
Time-to-value is too slow.
Teams stitch together multiple tools to ask simple questions: Is this drop-off caused by a bug, a bad UX pattern, or a user decision?
Our answer: turn everything into tags.
An Issue = tag:issue_onboarding.
A Golden Path = tag:golden_path_activation.
A Hypothesis = tag:hypothesis_filters_confusion.
Instead of bouncing between views, you stay in one unified search canvas. Every step is traceable, shareable, repeatable. The agent can run the flow end-to-end: check for bugs, compare hesitation before/after, classify outcomes as decision vs. UX.
It’s hypothesis testing, but agentic.
Where We’re Headed
We’re pushing into agent-driven workflows, where Autoplay doesn’t just surface friction but tests it automatically.
Future R&D:
Labeling data via chatbot interactions → supervised learning + RLHF.
Session replay–powered copilots that intervene in real time.
Tight integrations with PostHog error tracking, cohorts, and feature flags.
The long-term vision loops back to where we started: real-time copilots. But this time, grounded in user intent, context, and data.