AI agents and platform teams

There's a lot of noise and hype around AI at the moment, with a lot focused on job displacement and what engineers stand to lose. I want to write about the opposite: what I've gained over the past several months by leaning into AI agents as part of my day-to-day work on a platform team.

I'm not going to pretend it's magic, agents make mistakes and don't replace the engineering knowledge required to give them clear instructions. Some of the work I've been doing with agents has felt like a genuine step change. I write code faster, but the bigger change is scope: I can take on work I would have deferred.

The Platform Team Problem

If you've read my earlier post on platform team challenges, you'll know this context: a small team, responsible for a large surface area. At any given time we're maintaining shared services & libraries, a micro-frontend architecture spanning 28 projects, CI pipelines and more.

The frustrating reality of platform work is that the hardest problems aren't always technical. Often they're organisational. You can identify something that needs fixing across 20+ projects, know exactly what the fix looks like, and still have it take months. This is because every team has their own priorities that don't always align with platform team initiatives. The challenge for platform teams is bridging the gap between "this should be done" and "teams have capacity to do it".

Agents are starting to close that gap in ways I didn't expect.

Research Tasks: Making Cross-Cutting Decisions Viable

On a platform team, the decisions that matter most are often the ones that are hardest to make confidently. Before you can commit to deprecating a service, replacing a library, or enforcing a new standard across the organisation, you need to understand who's affected and what the edge cases look like. That investigation work has always been expensive, often leading teams to skip it and make the call with incomplete information or delay the decision indefinitely because nobody has the capacity to do it properly.

Understanding a Legacy Service

We had a service that had been around for years. No clear owner, minimal documentation, and a question that kept coming up: who is actually using this, and how? We felt the service needed to be retired, but we couldn't make a clear decision about migration until we understood real-world usage. Getting that answer meant manually tracing dependencies across GitLab, digging through a legacy SVN repository, and piecing together a picture from code written years ago by people who've long since moved on.

With an agent, I was able to hand off the bulk of that discovery work. It traversed consuming projects across both GitLab and SVN, identified call sites, summarised how the service was being used in different contexts, and flagged edge cases I should be aware of. What could have taken a couple of days became something I could review and build on in a fraction of the time.

It doesn't replace the need for human judgement and following up with the teams affected, but it does make the investigation work much more tractable.

Icon Usage and WCAG Compliance

Another example was in our work to assess our compliance with WCAG AA standards, part of which was assessing whether we were using icons consistently and correctly across our applications. To understand the scope of the problem and how we'd fix it, we needed to know which icon components were being used, whether they were being used correctly in context, and where the real WCAG AA gaps were.

Going project by project manually would have been tedious. Instead, I used an agent to evaluate each icon usage in context, driven by our design system usage data, and populate a spreadsheet with its findings.

I reviewed and corrected as needed, but the time investment from my side was dramatically lower. More importantly, we could make the decision about what to do next based on the actual data.

Migrations and Upgrades: Doing the Work, Not Just Designing It

Once you can quickly establish the scope of a problem, the natural next question is whether you can fix it the same way. This is where agents have had the most impact for me, allowing us to go beyond sharing migration guides to shipping the changes.

React Upgrades Across 28 Micro-Frontends

If you read my post on the micro-frontend React upgrade, you'll know how much coordination that kind of work involves. Each project is owned by a different team with different priorities. Getting them all to action a migration, even a well-documented one with clear steps, is a significant challenge.

After the most engaged teams had completed the migration, we were left with a significant number of projects blocking the final release. Rather than waiting for teams to action the migration, I used an agent to work through each project individually: follow the migration guide, make the necessary config changes, resolve the issues specific to that project, and open an MR. For each one, the agent handled the mechanical parts of the migration and the teams received ready-to-merge code changes rather than a request for action.

Before

After

Every single MR was merged. The one issue we hit across the entire rollout came from a team who had done the migration manually before my MRs went out. The agent-generated changes had a cleaner record than the human-driven ones.

The conversion rate from "MR opened" to "MR merged" is dramatically higher than "migration guide sent" to "work completed". Teams still review, still merge, still own the code but have less to do before they can merge.

CI Pipeline Refactor

A similar story with a recent GitLab CI refactor. We needed to move a shared pipeline away from the deprecated only keyword to the rules syntax. The change itself is well-understood, but it needed to happen across every project consuming the pipeline, each of which had its own quirks and occasionally breaking changes to deal with.

Typically these sorts of changes are unevenly adopted across projects, with teams delaying adoption until they're forced to. This sort of inconsistency can lead to confusion when different projects' pipelines are behaving differently.

By using an agent to identify all affected projects programmatically, work through the migration for each one, handle the breaking changes it encountered, and submit MRs, I was able to deliver the change across all projects in a fraction of the time it would have taken manually.

Practical tips

Reading back through the examples above, they might sound straightforward. In practice there was a fair amount of trial and error before these tasks started going reliably well. A custom glab skill was the biggest win and the rest of these tips came from that experimentation.

Define a skill for your tooling

The single most impactful thing I've done is write a custom agent skill for the glab CLI - GitLab's command-line tool. A skill is a markdown file that gives the agent clear, opinionated instructions for how to do a specific thing: in this case, how to search across projects, how to open MRs, how to make file changes via the REST API

Without this, agents will either refuse to interact with GitLab or produce inconsistent, fragile bash one-liners. With a well-defined skill, they have a reliable playbook. Agent Skills are an open standard, supported across tools like Cursor, Claude Code and GitHub Copilot.

Give agents access to your internal context via MCP

Out-of-the-box agents know nothing about your internal systems. They can read code, but they don't know what your components are supposed to do or what the intended API of a shared library looks like.

I've addressed this with our design system MCP server. MCP (Model Context Protocol) is an open standard that lets you expose internal data and documentation in a structured, queryable way that agents can use. Ours is built on top of our Storybook, giving agents access to component props, usage guidelines, and examples. When the agent is reviewing icon usage for accessibility it can look up how icon-related components from our design system are intended to be used.

Because MCP is an open standard, the same server works across tools. That said, maintaining an MCP server is a real cost, not a one-time investment. For us it's been worth it, but you can also just point agents to some local markdown files or use something like Atlassian MCP if you've got internal documentation in Confluence.

Feed your audit data in as a starting point

Rather than asking an agent to discover which projects use a given library from scratch, give it the answer. We have usage analytics for our design system that tells us exactly which projects are using which components and at what version. Feeding this data to an agent at the start of a task means it's starting from our usage data instead of guessing, which is both faster and more accurate.

I've been experimenting with a project to help scan and parse dependency usage across an organisation, providing agents with MCP tools for querying the data directly.

Upfront structure reduces mistakes on research tasks

On a platform team, research tasks usually feed decisions that affect 20+ projects so accuracy is important. A vague summary that misses edge cases isn't just unhelpful, it leads to bad decisions.

For research tasks I've found it's worth investing time upfront in defining the output format. Rather than asking an agent to "research icon usage across our projects", give it a spreadsheet template with one row per project and specific columns to complete.

This forces the agent to be methodical. It works project by project, column by column, and there's a clear definition of done for each row. When agents have an open-ended research task with no structure, they tend to summarise at a higher level of abstraction than you want, or stop early because they think they're done.

Write the migration guide anyway and use it as your prompt

For platform teams, you'll write a migration guide to communicate the change to consuming teams regardless. What I've found is that feeding the same guide to an agent is a useful validation step in its own right.

If the agent gets confused or takes a wrong turn, it often reveals a gap or ambiguity in the instructions. Edge cases that weren't covered in the guide get surfaced when the agent encounters them in real projects. By the time you've run the agent across a handful of projects, you've also stress-tested your documentation, which makes the guide better for teams who end up needing to review the changes or follow the steps manually.

On tools

I've primarily been using Cursor for this kind of work, with some experimentation in Claude Code and GitHub Copilot. Honestly, the differences between tools matter less than the quality of the context you give them. A well-defined skill and a good MCP server will get you further than switching tools.

The open standards here (MCP for context, markdown-based skill files for agent behaviour) mean that investment you make in one tool largely transfers to others. Worth keeping that in mind rather than getting too locked in to any one workflow as pricing fluctuates across providers and models.

What this changes

Agents make mistakes. They need clear context and well-defined tasks. You still need to review the output, understand what they've done, and apply your own judgement. There are tasks where they struggle and I continue to need a few iterations in the planning phase to get the results I want.

The framing of "AI replaces engineers" misses what's interesting here. What I've experienced is closer to this: the ceiling on what one engineer can take on has gone up.

Before, I might have identified a problem affecting 28 projects and accepted that fixing it would take months of coordination. Now, I can just fix it and let teams review the work rather asking them to make the changes themselves. Before, a research task spanning dozens of repos might have meant a week's investment. Now it might mean an afternoon.

This matters because platform teams are almost always under-resourced relative to the surface area they're responsible for. We've always had to make hard calls about what's worth tackling. AI agents are expanding the set of things that are economically viable to do, not by replacing the human judgement involved, but by handling the mechanical execution that previously made those tasks prohibitively expensive.

There's also something more personal to this that I think is specific to platform work. One of the persistent frustrations of being on a platform team is a feeling of distance from impact. You make a change but adoption is slow, rollout is uneven, and by the time you find out something wasn't quite right, months have passed. It can feel reactive: responding to requests, unblocking teams, waiting to be needed.

What I've noticed is that agents can shift that dynamic. Delivering a ready-to-merge MR to every consuming team isn't just faster, it also means I see the results immediately. Blockers surface in days rather than months. There's a sense of genuine end-to-end ownership that's been hard to achieve in platform roles before, where so much of delivery depends on other teams prioritising your work.

There's a narrative that AI is primarily about cutting costs, about doing the same work with fewer people. Maybe that's true in some contexts. But in my experience, the more interesting story is about doing things that weren't worth doing before: covering more ground, maintaining higher standards, taking on the migrations that would otherwise stay on the backlog indefinitely, and finally closing the loop between platform decisions and real-world outcomes.

That's what has me excited. Not the fear of what gets replaced, but the question of what becomes possible.