Moving fast with agents without losing comprehension
Posted 23 March 2026 · 6 min read
Addy Osmani wrote a great post last week on comprehension debt, the hidden cost of AI-generated code. The core idea: AI generates code far faster than humans can evaluate it, and that gap quietly hollows out the team's understanding of their own codebase.
It resonated with me, but what struck me most is a specific asymmetry in how the industry is responding. Most guidance around working with agents optimises for agent comprehension: context files, MCP servers, documented skills, feeding in the right information so the agent can reason about your codebase. There's far less conversation about the equally important problem: making sure humans still understand the system the agent is changing.
We're optimising for agent comprehension while human comprehension quietly erodes. That gap is what's made me think carefully about how I've been working, and what actually needs to be in place before you can move fast without losing the understanding that keeps a codebase healthy.
The thing reviews were actually doing
Reviews aren't just quality assurance. They're how understanding spreads across a team. When someone reads your code carefully enough to approve it, they're building a mental model of what changed and why. That's the mechanism by which a team stays collectively oriented to its own codebase.
Agents put this mechanism under pressure, not by making code worse, but by generating it faster than the review process was designed to handle. Sometimes moving fast and trusting the agent is the right call, especially in well-covered, well-understood parts of the codebase. But when it goes wrong the consequences compound. Each poorly-understood change makes the next review less meaningful as you're reasoning about new code against a mental model that's already drifting.
What I've learned from trying
My initial instinct when I ran into this was process. Break large agent changesets into smaller sequenced MRs, each telling a coherent part of the story, each individually deployable, like a slow-motion replay after a fast-forward session. There's something to it. A large MR where I reorganised commits to be reviewed one by one got merged without friction. Making changes legible and telling a coherent story is always the right instinct.
But I also have five stacked MRs on a legacy codebase sitting in draft. I understand what the changes do, but I don't trust the existing test coverage to catch the side effects and functional behaviour that could break. Without that confidence there's an implicit expectation of manual verification underneath the whole thing, and that's asking a reviewer to carry the risk you haven't dealt with.
Process can make changes more legible. It can't substitute for a safety net that isn't there.
What comprehension actually needs to look like now
It's not line-by-line, that's not feasible anymore, and pretending otherwise just means some reviews are theatre. But it's not nothing either. I think it works at three levels.
The first is behavioural: does it work as expected? This is where test coverage becomes the most important investment a team can make. Real coverage that covers real behaviour across paths users actually take, alongside type safety that catches type errors at compile time. If the compiler and test suite are doing their job, reviewers don't need to trace every line. The places where coverage is thin, or where teams have been relying on manual testing, are exactly the places where agent velocity stops being speed and starts being negligence.
The second is architectural: do we broadly understand how the changes work, and can we update our mental model of the system? This is something agents can help with directly. Ask the agent to summarise the meaningful decisions in a changeset, not the mechanical changes but the choices a human needs to evaluate: what alternatives were considered, where the non-obvious decisions are, what the author would flag in a code walkthrough. Use that as the basis for your MR description. I've packaged this into an agent skill you can drop into your own workflow, it produces a structured MR description and a commit structure recommendation you can review and use to help make agent-generated changesets more legible to reviewers.
The third is standards: does the code meet the conventions the team has agreed on? Linting handles a lot of this automatically and anything you can push into a linter is one less thing a human reviewer needs to spend attention on. For the things linting can't catch, I've written before about agent skills. If your standards are documented well enough to guide the agent writing the code, they're documented well enough to guide an agent reviewing it too.
Show your working
Good authorship has always mattered. It matters more now. The reviewer wasn't in your agent session and they have no ambient understanding of what you were trying to do, what tradeoffs you considered, what decisions the agent made that you consciously kept. That context doesn't transfer through the diff, you have to transfer it deliberately.
That means flagging the architectural decisions that actually need human eyes, not just describing what changed but why. It means thinking carefully about commit structure so the story of the change is legible before someone even reads the code. It means writing a description that demonstrates you understood what the agent produced, because if you can't explain it clearly there's a risk you've switched to passive delegation.
The Anthropic study Addy cites found that engineers who used AI for passive delegation, just letting it produce code without staying actively engaged, scored significantly lower on comprehension tests than those who used it as a thinking tool. The agent doesn't replace the engineer. It's a tool, and you still need to understand what it's doing and why, not just that it works. That understanding is what your reviewer deserves: guide them toward it rather than leaving them to reconstruct it from scratch.
Not every change carries the same risk or requires the same depth of review, and being explicit about that is part of good authorship too. Ship / Show / Ask is a useful frame for this, calibrating the level of review based on the nature of the change and the trust already established with your team.
What fast actually requires
The five MRs sitting in draft aren't blocked by process or by my understanding of the code. They're blocked because the safety net isn't there. That's the first obligation, fix it before you ship, not after.
But a solid test suite without the authorship work just means your reviewer can confirm nothing broke. That's not the same as understanding what changed, or why, or what the agent decided that you consciously kept. The agent gives you velocity. What makes that velocity real is being able to explain what you built and why, not just that it works.