Scaling What Already Works

Diagnose before intervening, measure before optimizing, and leave the team stronger.

3 min read

This is the play for a product in production with real, evidenced usage, where the question is no longer "does it work" but "does it work at 10x load, with a larger team, while shipping faster than competitors." The system is proven; now it has to hold up.

When to use this play#

Run this play when:

The product is in production with evidence of real usage.
There is an identifiable problem worth fixing, such as scaling pain, a security gap, or slow velocity.
There is a commitment of at least about three months, so impact is measurable.
There is at least one client-side technical owner.

A short, vague engagement cannot move these numbers credibly, so the time commitment is part of the gate.

How to run it#

1. Diagnose before intervening. Spend one to two weeks gathering data before committing to any intervention. Measure before you optimize. Across the diagnostic, look at six dimensions:

Performance: p50, p95, and p99 latency, error rates, throughput, and Core Web Vitals.
Capacity: current load versus headroom, and where the bottlenecks actually are.
Reliability: incident frequency, deploy success rate, and time-to-recovery.
Security: the gap between current posture and what the system needs.
Cost: the unit economics, such as cost per request.
Velocity: the DORA metrics (deploy frequency, lead time, change failure rate, time to restore).

2. Choose interventions from the data. Pick from the menu based on what the diagnostic showed, not on instinct.

Performance interventions:

Cache at the right layer. Most teams cache too late and in the wrong place.
Add read replicas for query-heavy workloads.
Move synchronous bottlenecks to async processing.
Push session state out of the app process.
Tune the database.

Reliability interventions:

Define SLOs and error budgets first.
Add observability before adding infrastructure.
Improve deploy safety with feature flags, canary releases, and automated rollback.

3. Resist premature horizontal scaling. A single instance usually has more headroom than the team assumes. Exhaust the cheap wins, like caching and query tuning, before reaching for more boxes.

4. Protect a debt and reliability allocation. Do not run feature-velocity sprints back to back. A reasonable default split is about 70 percent features, 20 percent debt and reliability, and 10 percent exploration. The reliability slice is what keeps the velocity slice sustainable.

5. Build the team's capability, not its dependency on you. Success is the client team growing more capable, not more reliant. Pair, document, and hand over ownership as you go.

Common traps#

Intervening before diagnosing. Optimizing without data means optimizing the wrong thing convincingly.
Scaling horizontally too soon. It is expensive and it hides the bottleneck rather than fixing it.
Stacking feature sprints with no reliability allocation. Velocity bought this way is borrowed against future incidents.
Creating dependency. If the team cannot run the system without you, the engagement has not succeeded.

Signals it's working#

Latency, error rate, deploy frequency, lead time, and cost are moving in the right direction.
The single instance is doing more work before anyone adds infrastructure.
The client team is taking on interventions themselves.

How it ends#

This play ends when the diagnostic targets are met or deliberately re-baselined, there is a documented runbook, and there is measurable improvement in latency, error rate, deploy frequency, lead time, or cost. The clearest sign of a good ending: the client team is more capable than when you arrived, and the roadmap is owned by them.