What DORA metrics actually tell you — and what they don't

If you've spent any time in engineering leadership over the last five years, you've almost certainly been handed a slide with four metrics on it: deployment frequency, lead time for changes, change failure rate, and mean time to restore. These are the DORA metrics — and they are genuinely useful. The problem is how most organisations use them.

What the metrics actually measure

Before getting into the pitfalls, it's worth being precise about what each metric is measuring:

Deployment frequency — how often you successfully release to production. High performers deploy multiple times per day; low performers deploy monthly or less.
Lead time for changes — the time from a code commit to that code running in production. This captures your entire pipeline: review, build, test, deploy.
Change failure rate — the percentage of deployments that cause a degradation requiring a hotfix, rollback, or patch. High performers sit below 15%.
Mean time to restore (MTTR) — how long it takes to recover from an incident. This measures both your detection capability and your recovery processes.

Together they give you a picture of throughput (the first two) and stability (the second two). The insight from the original DORA research is that high-performing teams are simultaneously faster and more stable — the trade-off between speed and quality is largely a myth.

How organisations get this wrong

The most common mistake is treating DORA metrics as a performance management tool rather than a diagnostic one. I've seen this pattern repeatedly: leadership sets targets ("we need to be deploying weekly by Q3"), dashboards go up, and within a month teams are gaming the numbers.

Deployment frequency goes up because teams split their work into smaller batches — which is actually a good thing — but also because they start counting internal deployments to staging environments. Lead time comes down on paper because engineers start measuring from PR merge rather than from the moment work was picked up. The metrics look better; nothing has actually changed.

The second mistake is benchmarking against the DORA elite tier as a starting point. The research bands are useful context, but an organisation deploying fortnightly that moves to weekly has made a meaningful improvement regardless of where that sits in the global distribution. Progress matters more than classification.

How to use them properly

Start with an honest baseline. The first conversation I have with any engineering team is about what the numbers actually are today, measured consistently, before any improvement work begins. This matters because it's the only way to tell if anything is working.

Define your measurements precisely and don't change them. Decide exactly what counts as a deployment, exactly where you start the lead time clock, exactly what qualifies as a change failure. Write it down. If the definition shifts, your trend line becomes meaningless.

Use the metrics to identify bottlenecks, not to assign blame. If lead time is long, is it in code review — work sitting unreviewed? In the build pipeline — slow tests? In the deployment process itself — manual steps, approval gates? The metric tells you there's a problem; it doesn't tell you where. You need to break it down further.

Pair them with qualitative signals. DORA metrics are lagging indicators — they tell you how the system has been performing, not why. Engineer satisfaction surveys, incident retrospectives, and team conversations give you the context to interpret what you're seeing.

What high performance actually looks like

The teams I've seen that genuinely score well on DORA metrics share a few characteristics that have nothing to do with the metrics themselves:

They have strong automated test coverage at all levels, which means they trust their pipeline
They deploy small changes, which reduces the blast radius of anything that goes wrong
They have on-call rotations that include the engineers who wrote the code, meaning incidents are resolved by people who understand the system
They do blameless post-mortems and actually act on the findings

The metrics are downstream of these behaviours. Focus on the behaviours and the metrics follow; focus on the metrics and you get measurement theatre.

Used as a diagnostic tool with an honest baseline and consistent measurement, DORA metrics are one of the most actionable frameworks available to engineering leaders. The research behind them is solid, and the four-metric model is simple enough to explain to non-technical stakeholders. Just don't let them become the goal — they're a map, not the territory.

Back to Insights

What DORA metrics actually tell you — and what they don't

What the metrics actually measure

How organisations get this wrong

How to use them properly

What high performance actually looks like

Related reading

Flow metrics: measuring what DORA can't

AI-assisted development in practice: lessons from building four products