The Reason Co-Pilots Keep Getting Complex Products Wrong
Why Co-Pilots Need Actual UI Understanding, Not Just Good Language Models
I was using PostHog’s co-pilot recently and noticed something interesting.
Normally, when AI gets things wrong, it tends to hallucinate, confidently giving you answers that are completely off.
But this time, it wasn’t hallucinating. It was doing the opposite. It kept telling me what workflows “weren’t possible.”
Not wrong answers. Not made-up instructions. Just false negatives; the AI confidently claiming something couldn’t be done, even though it absolutely could.
This revealed a new problem: the co-pilot wasn’t actually reasoning about the product. It was reasoning about the documentation.
Co-pilots rely almost entirely on knowledge bases - documentation, forums, API pages, support articles - and those don’t always reflect how the products work.
This isn’t specific to PostHog.
It happens in every complex platform I’ve tried using a co-pilot with.
And the more I run into this, the more obvious the gap becomes:
co-pilots don’t understand the UI, the workflow, or the situation you’re actually in.
They understand text.
Here’s the breakdown of where things fall apart.
Co-pilots rely on documentation and forums, not the real product
Documentation tells you the “official” way to use a product.
Power users rely on workarounds, hacks, and discovering interesting combinations of features and workflows to achieve their own unique goals.
Docs rarely cover these.
So the co-pilot gives answers that are correct in theory, but too restricted to work in practice.
In my PostHog example, the co-pilot repeatedly said the workflow I wanted was “not yet possible” because the documentation didn’t describe that workflow.
I later explained the problem to a PostHog engineer, and he proposed a completely different approach.
Because he understood the actual goal (not the literal workflow I had described), he could see paths I didn’t know existed.
He wasn’t confined to the questions I asked. He wasn’t limited by my understanding of the feature set. He wasn’t restricted to what’s written down. He could reason across the product, not just answer inside it.
And when I asked the co-pilot how to follow the engineer’s instructions, it had no problem explaining the steps.
That’s the fundamental gap:
Co-pilots answer the prompt, humans answer the intent.
If it’s not written down, the co-pilot assumes it doesn’t exist.
Complex platforms can do far more than what’s written
Everyone who uses tools like PostHog, Datadog, HubSpot, Notion, Zapier, or Retool knows this:
There are a hundred things the product can do that are never officially documented.
You learn them by clicking around, testing settings, seeing how things behave under certain conditions, chaining features together, manipulating existing flows, and watching how the UI reacts.
This is the type of knowledge humans discover naturally, but co-pilots can’t access at all.
They don’t explore the UI. They don’t test. They don’t observe behaviour. They don’t learn from interaction.
Which means they fundamentally misunderstand how these products are actually used.
You can’t problem-solve with a UI you don’t understand
Documentation doesn’t always get updated. Old support articles don’t get replaced. Forum posts contradict each other.
The co-pilot is stuck in a snapshot of how the product used to work.
The deeper issue isn’t just that documentation gets stale, it’s that a co-pilot relying on static text can’t reason about what the product can do, only what the docs say it can do.
It has no ability to explore the UI, test different orders of steps, combine features, or try alternative paths to reach the same goal. So if the documentation only describes one “official” workflow, the co-pilot assumes that’s the only workflow that exists.
Humans do this instinctively. We try things. We click around. We chain features together. We change the order. We combine steps. We see what happens.
If you give someone a list of ingredients and one recipe, they’ll only ever make that one dish.
A real cook sees the same ingredients and immediately knows ten things they can make.
Full automation makes the problem worse
If the agent fully automates a workflow, and something is wrong (and something will be wrong) the user ends up needing to debug the AI’s mistake without understanding the underlying system.
This is especially painful when:
the account has custom configurations
the user has special permissions
the data is messy
the UI behaves differently for that workspace
the workflow has branching logic
If something breaks, the user is blind.
This is the opposite of “help.”
Users don’t want to re-explain everything every time
Another practical issue:
If you repeat the workflow a week later, you don’t want to:
rewrite the entire prompt
specify every detail
describe your setup again
define the data sources
re-explain the exact workflow
You want the co-pilot to:
know your context
remember your configuration
understand your workspace
and help you perform the next step
not restart from scratch.
This is where UI understanding matters.
The context lives in the interface, not the prompt.
Co-pilots need to assist, not take over
Most people don’t want the AI to run off and complete everything.
They want:
suggestions
guardrails
UI highlighting
“this is where you went wrong”
“this is the next step”
partial automation
shared control
Co-pilots should help with actions, not replace them entirely.
The real solution: let the co-pilot learn by interacting with the UI
The only way around all these problems is for co-pilots to understand software the way humans do - by using it.
Something closer to OpenAI’s Operator-style agents that:
click through the interface
test actions
inspect elements
understand actual UI states
watch how the product behaves
identify errors visually
confirm steps through interaction
This would give the co-pilot access to real product knowledge, not second-hand descriptions.
The next generation of co-pilots won’t win on bigger models, they’ll win on better product understanding.
Until then, we’ll keep running into the same thing:
co-pilots that sound right but don’t actually help.


