AI Agents Don't Fail Because of Code—They Fail Because of People

·Commentary on SaaStr

Everyone knows AI agents need maintenance. The real question is what kind of maintenance they actually need.

Jason Lemkin over at SaaStr recently put numbers to it with his new weekly show about running 20+ agents in production. He shares the preview environment outages, the micro-hallucinations, the model-based regressions that break without code changes. All good stuff. All true.

But here's what he's missing: those technical failures are symptoms, not the disease.

Our data shows something different. We're tracking 47 problems in the 'Software Development & Engineering' category with an average severity of 3.8/5, and here's the pattern: the highest-severity issues aren't about code breaking. They're about people not knowing what to do when code works differently than expected. They're about organizational resistance to changing workflows. They're about teams building agents that solve technical problems but create human ones.

Take that Clay example Jason mentions. The Sculptor agent quoted 5x the normal cost because it defaulted to the most expensive enrichment model. Most customers wouldn't have known to push back. That's not a technical failure—that's a design failure. Someone built an agent that optimizes for something (maybe accuracy, maybe speed) without considering what the user actually needs (cost-effective results).

We see this pattern everywhere. Builders create agents that technically work but fail to account for how real people interact with them. The agent that blames third-party integrations instead of diagnosing the actual problem? That's not just bad code—it's bad communication design. The agent that needs 15 minutes of daily maintenance to prevent drift? That's not a model problem—it's a workflow integration problem.

Jason's right about one thing absolutely: "set and forget does not work with agents." But our data suggests the reason isn't just that models update or code breaks. It's that businesses change. Pricing changes. Customer expectations change. Internal processes change. And if your agents aren't designed to adapt to those human and organizational changes, they'll fail even if the code runs perfectly.

Look at that 90% idle time statistic Jason drops. All their agents are idle 90% of the time. He frames this as a capacity optimization problem—how do we use all that extra capacity? But our data from tracking workflow integration problems suggests a different interpretation: maybe the agents are idle because they're not solving the right problems. Maybe they're technically capable but organizationally misaligned.

When we look at problems across industries, the pattern is clear. Healthcare companies struggle with compliance requirements that agents can't navigate without human oversight. Financial services firms hit regulatory walls. Marketing teams build agents that generate content but can't align with brand voice guidelines without constant tuning. These aren't technical limitations—they're human-in-the-loop requirements that most agent designs ignore.

Jason's show will probably focus on the technical maintenance stories—the model updates, the integration failures, the cost overruns. And those are important. But if you're building agents, you need to think bigger. You need to design for the human systems your agents will operate within.

Here's what our data says actually works:

Design agents that fail gracefully. Not just technically, but communicatively. When Jason's agent blamed Qualified for a problem that wasn't Qualified's fault, that created confusion and wasted time. Better design would have the agent say "I'm not sure what's wrong, but here are the three most likely causes based on similar past incidents" rather than pointing fingers.

Build feedback loops into everything. Not just error reporting, but user satisfaction tracking. Not just technical metrics, but business outcome metrics. When that pitch deck analyzer started hallucinating revenue numbers, the problem wasn't detected until someone noticed the pattern. Automated quality checks could have caught it sooner.

Assume your agents will be wrong sometimes. Design for correction, not perfection. When Clay's agent misunderstood new pricing, the failure wasn't just technical—it was a training gap. But more importantly, the design allowed that misunderstanding to lead directly to an unnecessary upgrade purchase. Better design would require confirmation or human review for significant changes.

Solve for organizational adoption, not just technical capability. Jason mentions that Amelia's learning curve with AgentForce was steeper than with Qualified. That's not a technical problem—that's a user experience problem. The agent that's technically superior but harder to use will fail where the simpler, more focused agent succeeds.

This is why we're tracking 12 problems in 'AI/ML Cost Management' with an average severity of 4.1/5. It's not just about optimizing credit usage—it's about designing systems that make cost transparent and controllable. It's about building agents that help users make better decisions, not just execute commands.

Jason's show will be valuable because it shows what running agents in production actually looks like. The maintenance requirements are real. The technical challenges are real. But if you're building agents, don't stop there. Look at the human systems. Look at the organizational barriers. Look at how real people actually interact with your creation.

Because here's the truth our data reveals: you can fix every technical bug, optimize every model, and still fail if you don't solve for the people using your agents. The most successful agents we track aren't the most technically sophisticated—they're the ones that fit seamlessly into existing workflows, that communicate clearly when they're uncertain, that enhance human capabilities rather than trying to replace them.

Jason's right that "no lead left behind" might be the simplest unlock. But achieving that requires more than just technical execution. It requires designing agents that understand what leads actually need, that can adapt to different industries and contexts, that work with human teams rather than in isolation.

So watch the show. Learn from the technical stories. But then look deeper. Because the real maintenance your agents need might not be in the code—it might be in how you design them to work with people.

This article is commentary on the original article by Jason Lemkin at SaaStr. We encourage you to read the original.

Explore more problems and app ideas across Software Development, Marketing, Sales.

Browse App Ideas

Join the beta — full access for the first 1,000 builders

Join Beta