What Building with AI Taught Us About Software
Robustness, Composability, and the Limits of Agents
We’ve been wrestling with a question that keeps coming up in every serious AI conversation: will AI replace software? More specifically, will agents replace workflows?
At first glance, it looks like that future is already here. New AI-native “replacements” for SaaS appear every week: AI CRMs, AI recruiting tools, AI design tools, AI everything. But once you actually use them, a pattern emerges: almost all of them lack robustness.
And robustness is the thing that matters most.
By robustness, I don’t mean polish or speed. I mean the ability to support real-world complexity: deep customization, edge cases, multiple stakeholders, evolving use cases, and scale over time. The kind of robustness you see in tools like CAD suites, professional photo-editing software, or enterprise CRMs. These tools work not because they automate everything, but because they let humans shape them endlessly around their specific needs.
Robustness is what allows software to scale horizontally and vertically. Instead of building a tool that solves one narrow problem extremely well, you build a system that can be adapted to many problems, and then further adapted within each of those problems. That’s why one piece of software can power thousands of companies, each using it in slightly (or wildly) different ways.
VCs understand this instinctively. That’s why, over the past decade, they’ve bet heavily on vertical SaaS; software that deeply understands a domain but still preserves flexibility. What we’re seeing now is the same bet being made again, just rebranded: vertical agents instead of vertical SaaS. The hope is that agents will finally be able to encode that flexibility automatically.
So far, they haven’t.
A big contributor to this problem is what people loosely call “vibe coding.” Yes, it’s impressive that you can spin up a prototype, or even a simple production app, in hours instead of months. But there’s a cost. Everything starts to look the same. The same layouts, the same interactions, the same mental models.
Vibe-coded software is limited by what the model thinks software should look and feel like. Which means that genuinely clever, opinionated, or novel UI/UX gets flattened into safe, repeatable patterns. That’s not just a taste issue; it’s a data issue. The model can only remix what it has seen.
As a result, most AI products end up differentiated by only two things:
The context data they’re plugged into
Minor UI variations on top of the same underlying interaction model
But here’s the problem: the moment you try to make that software more powerful, e.g. by adding features, supporting edge cases, or enabling deeper customization - you start removing AI from the critical path. You fall back to explicit controls, configuration, logic, and structure. In other words, software.
The most robust piece of software most people have ever used is probably Excel. At its core, it’s almost offensively simple. No opinions. No automation. Just cells and functions. Yet it works as both a backend and a frontend. You can model a business, run analytics, build workflows, or create something completely unintended by its creators.
Excel doesn’t succeed despite its lack of AI, it succeeds because of its raw, composable nature. AI can sit on top of it, but it’s not what gives Excel power.
This leads to an uncomfortable conclusion: a lot of AI today is being used to solve problems that don’t actually need to be solved.
Take AI in recruitment. In practice, most “AI recruiting agents” are just large language models wrapped around filters and keyword searches. Things that already exist and work very well with Linkedin recruitment or sales navigator. The pitch is that it feels magical the first time, type what you want, get results, but the magic doesn’t scale. It breaks down the moment you care about nuance, tradeoffs, or consistency.
They’re party tricks, not systems.
In these cases, AI doesn’t produce better outcomes. It just produces outcomes faster. And speed without robustness is rarely a long-term advantage.
This distinction matters a lot for how we think about AI at Autoplay.
Our AI features (golden path deviation, unsupervised clustering, hesitation detection etc.) are intentionally treated like functions. They’re closer to filters, tags, or functions/columns in a Google Sheet than they are to autonomous agents. Each one does something specific, deterministic, and composable. On their own, they’re not the product.
The product is the experience around them.
Autoplay should feel as robust as any serious analytics or product tool. That means our AI features must work in combination with non-AI features. They need to respect filters, cohorts, URLs, time ranges, and organizational context. They need to be debuggable, inspectable, and controllable.
Our first real mistake was leaning too hard into a semantic search / ChatGPT-style interface, asking questions like “Where do users get stuck?” That sounds powerful, but it’s fundamentally flawed. Without isolating control factors (specific cohorts, specific flows, specific dates), you don’t get truth. You get plausible answers.
And plausible answers are dangerous.
Real insight comes from constraint, not abstraction. Cohorts. URLs. Time windows. Comparisons. AI can help compute, surface, and suggest, but robustness comes from giving humans the tools to reason precisely.
The future isn’t AI replacing software. It’s AI becoming a primitive inside robust software. A function, not an agent. A building block, not the building.
And the companies that win won’t be the ones with the best demo, they’ll be the ones that understand that robustness is still the hard part.

