DORA Metrics and AI as an Amplifier
AI magnifies whatever your organization already is, for better or worse
The most useful finding from recent DORA research on AI-assisted development is also the most humbling: AI is an amplifier. It magnifies the strengths of healthy organizations and the dysfunctions of struggling ones. If your delivery system is solid, AI makes it faster. If it's broken, AI helps you produce broken things more quickly. The returns come from the surrounding system, not the tools.
The four metrics still anchor everything#
No matter how much the tooling shifts, the four classic DORA delivery metrics remain the dashboard worth watching:
- Lead time for changes: how long from commit to running in production.
- Deployment frequency: how often you ship.
- Change fail rate: the percentage of deploys that cause a problem.
- Failed-deployment recovery time: how fast you recover when one does.
These four haven't been replaced by anything AI introduced. They're still the truest read on delivery performance you can get, and AI-assisted work should be measured against them rather than against some new, shinier proxy.
Change fail rate is the canary#
Among the four, watch change fail rate most closely when adopting AI assistance. It's the canary in the coal mine for quality. AI can help you write and ship code faster, but faster output is worthless if more of it breaks in production. Several teams find that AI adoption increases their velocity and their instability at the same time, and only the change fail rate makes that visible. If it creeps up as AI usage grows, you're trading speed for fragility, and you need to know before customers tell you.
Deployment frequency expectations, for context, vary by what you're building. Prototypes and experiments can and should deploy continuously. Production applications should target at least weekly. There's no universal right number, only a right number for your context.
What AI does not fix#
It's tempting to hope AI will smooth over organizational pain. It won't.
- AI does not reduce process friction. Only organizational change reduces friction. If approvals, handoffs, and queues slow you down today, AI-generated code arrives at those same queues and waits in the same lines.
- AI does not cause or cure burnout. Burnout is driven by culture, not tooling. A team drowning in unrealistic expectations and constant interruption will stay burned out no matter how good the autocomplete gets.
These limits are exactly why the amplifier framing holds. AI accelerates the work, but it leaves the system that surrounds the work untouched.
Measure against yourself, not the industry#
Compare each team to its own prior period, not to industry benchmarks. Benchmarks tell you where some abstract median sits; they don't tell you whether your team got better this quarter. Your context, domain, and constraints make cross-company comparison mostly noise. Trend against your own history and you'll actually learn something.
And be deeply skeptical of self-reported productivity. One study found developers who were actually slowed by about 19 percent while using AI still believed they were roughly 20 percent faster. That gap is not a rounding error; it's a warning. Perception of speed is not speed. If you measure how productive people feel, you'll measure their optimism, not their output.
The seven amplifying capabilities#
The research points to organizational capabilities that determine whether AI amplifies up or down. Invest here and AI pays off:
- A clear, documented AI policy. This is the strongest amplifier of all. Teams that know what's sanctioned, what's prohibited, and where the guardrails sit get far more out of AI than teams improvising.
- Working in small batches, so changes are easy to review and reason about.
- Strong version control practices, the foundation everything else rests on.
- A quality internal platform with solid CI/CD.
- A healthy data ecosystem, where data is trustworthy and accessible.
- AI-accessible internal data, so the tools can ground their work in your actual context.
- A user-centric focus, keeping the team pointed at real outcomes rather than output for its own sake.
Notice that almost none of these are about AI itself. They're about being a healthy engineering organization. AI just turns up the volume on what's already there.
What not to measure#
Finally, a short list of metrics that will mislead you if you treat them as goals:
- Rework rate on its own, which lacks the context to mean anything in isolation.
- AI adoption percentage, which measures usage, not value. High adoption of a tool that's hurting you is worse, not better.
- Lines of code or commit volume, the classic vanity metrics that reward bulk over benefit.
- Perceived productivity in isolation, for the self-deception reason above.
Keep your eyes on the four delivery metrics, watch change fail rate like a hawk, build the capabilities that amplify upward, and measure your team against its own past. The tools will keep changing. The discipline is what compounds.