A year ago, the most ambitious thing AI did for our engineering team was tab complete. We had Copilot on, like everyone else. It saved us thirty seconds at a time, hundreds of times a day, which adds up to a real number but does not change anyone's life.
A year later, I open Slack in the morning and read a summary message from an AI that ran overnight. It read every Sentry alert from the previous day, decided which were already tracked in Linear, opened tickets for the rest, ranked them by risk, decided which ones it could safely fix on its own, and tagged those for an automation that opens a pull request before anyone is awake.
The change between those two pictures was not the model. What changed was what we did around it.
We did not plan this arc. It happened in steps, each one obvious only after the previous one was working. By the spring, autocomplete was not enough. We wanted the AI to understand multiple files at once, so we moved to Cursor and Claude. By the summer, the team was using AI for everything and we wrote down a set of usage guidelines because we wanted to be honest with each other about what was working and what wasn't. By late summer we had AI reviewing pull requests as a first pass, before any human looked at them. That is table stakes today. It wasn't yet then.
The interesting part started in the fall. Three things happened at once.
The first was that we started using sub-agents. Not one giant model trying to hold our entire codebase in its head, but a fleet of smaller agents, each with a job description: this one writes database migrations, this one writes tests, this one investigates errors. The orchestrating agent picked the right one for the task. We had been trying to fit all our context into one prompt, and it had been getting worse, not better. Specialization fixed that.
The second was that we adopted a plan-then-execute methodology. The agent would write a plan first — what files it intended to touch, what approach it would take — and we would review the plan before any code was written. The number of "AI wrote a beautiful PR that solved the wrong problem" incidents dropped sharply. It is much cheaper to redirect a plan than to redirect a finished diff.
The third was that we started running AI inside our CI pipeline. We pointed it at our weekly dependency update PRs and asked it to actually read every changelog, cross-reference how each package was used in our codebase, and produce a risk assessment for the human reviewer. We connected it to Linear so it could pick up a tagged ticket and produce a draft PR before anyone got around to assigning it. Each of these on its own would have felt like a stunt. Together, they shifted what the engineer's day looked like.
By winter, the daily Sentry workflow I described above was running. It was the moment AI stopped being a tool the engineer reached for and started being a process the engineer arrived to.
If I try to summarize what we actually learned from all of this, it is not really about AI. It is about where the bottleneck in software engineering lives. We used to think the bottleneck was writing code. So we pointed AI at the writing, and got modest gains. The bigger gains showed up when we pointed AI at the parts of the job engineers tolerate but do not love — reading changelogs, triaging errors, hunting through unfamiliar code to understand a stack trace. That work is mechanical, high-volume, and silently eating their week. It is also the work an AI does cheerfully.
Across a year we ended up with AI involved at every layer of how we ship software. Planning, when it helps draft what to build. Implementation, when it writes the first draft of code. Review, when it is the first pass on a pull request. Debugging, when it investigates an error overnight. Each layer needed its own guardrails — what AI was allowed to do without human review, where it had to stop and ask, what blast radius it was permitted to operate in. The guardrails turned out to be as important as the AI. Usually more.
I would not have predicted any of this a year ago. I will not predict next year either. Software development is being rebuilt around AI in real time, and most of what I have just described will look quaint inside twelve months. The team that figures out each next version of this first will not be the team with the best model. It will be the team that built the best context, the best guardrails, and the best habits around what is by now a commodity — and that built them in a way that survives the next thing changing underneath. That is the part that compounds. That is the part nobody can rent.