The 5 P99 Patterns I Find in Almost Every Engagement

Published on August 12, 2025

Your metrics look healthy:

Average response time: 50ms
P50 (median): 45ms
P99: 8 seconds

That P99 is about to kill your acquisition.

After 50+ infrastructure audits, these five P99 patterns appear in nearly every technical due diligence. They're invisible at current scale, devastating at 10x.

1. The Connection Pool Time Bomb

What I find:
PostgreSQL max_connections set to 100. Application pool set to 95.
Current usage: 40-60 connections. "Plenty of headroom."

At 2x traffic, connections hit 90+. P99 requests wait for available connections.
At 3x traffic, total gridlock. P99 goes from 200ms to 30+ seconds.

Connection management is tricky with databases in general. More connections doesn't necessarily mean more performance otherwise we'd just have an unlimited of connections. More connections means more context switching means more overhead means cpu more utilization until your database's CPU is maxed out.

What happens at scale:
The metrics that reveal it:

P99 spikes correlate with connection pool saturation
Database CPU at 20% but response times through the roof
Error logs show "timeout waiting for connection pool"
This is usually a "tipping point", i.e. things will be fine and then suddenly degrade

Real example from DD:
SaaS platform raising Series B. Everything looked fine at 10K concurrent users.
Modeled 50K users: complete connection starvation. P99 hit 45 seconds.
Fix required major architectural changes. Deal repriced down $3M.

What I check:

Connection management
Connection lifecycle (how long held)
Prepared statement cache vs connection limits
Queries execution speed
PgBouncer or connection pooling layer

The question that catches it:
"Show me P99 latency grouped by connection pools wait time."

Most teams have never run this query. When they do, the correlation is undeniable. If you see P99 spikes that don't correlate with CPU or memory usage, you're likely looking at connection starvation. If they do correlate with high CPU, you're using too many connections, and need to optimize your queries.

2. The Cache Expiration Stampede

What I find:
Beautiful cache hit ratios: 95%+. Response times under 50ms.
Cache TTL: 1 hour. Looks perfect.

What happens at scale:
Popular key expires. 1,000 simultaneous requests hit the database.
Database generates the same expensive query 1,000 times.
P99 spikes to 10+ seconds. If using Redis, potential memory explosion.

The metrics that reveal it:

P99 spikes exactly every hour (or whatever TTL is)
Database CPU spikes align with cache expiration
Redis memory usage spikes during regeneration

Real example from DD:
E-commerce platform preparing for Black Friday. Normal traffic: fine.
Simulated Black Friday: cache expiration caused 45-second P99 spikes.
Every expired product page triggered 500+ database queries.

What I check:

Cache warming strategies
Probabilistic early expiration
Distributed lock on cache regeneration
Background refresh vs synchronous regeneration
Request coalescing (make sure that if many requests go out to the same id, we only fetch it from the upstream store once)

The question that catches it:
"What happens when your top 10 cache keys expire simultaneously under peak load?"

The answer usually involves nervous laughter. Smart teams use jittered TTLs and background refresh. Most teams haven't thought about it until their first Black Friday.

3. The Timeout Death Chain

What I find:
Service A → Service B → Service C
Each with 5-second timeouts. "Conservative" timeout strategy.
Total possible wait: 15 seconds. Math checks out, right?

What happens at scale:
Service C slows down (P99 hits 4 seconds).
Service B waits, queues backup (P99 hits 8 seconds).
Service A waits for B, connections exhausted (P99 hits 15+ seconds).
Users experience random 30-second page loads.

The metrics that reveal it:

P99 latency multiplication across service boundaries
Thread pool exhaustion in upstream services
Correlation between downstream P99 and upstream P99

Real example from DD:
Fintech platform with 12 microservices. Payment service P99 of 2 seconds cascaded into 25-second checkout times. During DD load test, discovered $2M of required circuit breaker implementation.

What I check:

Timeout settings vs actual P99 of dependencies
Circuit breaker implementation
Bulkhead patterns
Async vs sync communication patterns
Retry amplification effects

The question that catches it:
"Map out your timeout chain for your critical user path - what's the worst case?"

When teams actually do the math, they realize their "5-second timeout" can turn into 30+ seconds of user wait time. The fix isn't shorter timeouts - it's circuit breakers and async patterns.

4. The Memory Pressure P99 Killer

What I find:
JVM heap set to 8GB. Normal usage: 4-6GB. "We have buffer."
GC pauses: 50-200ms. "Acceptable."
P99: 2 seconds. "Not great, but livable."

What happens at scale:
10x traffic = 10x object creation.
Minor GCs every 5 seconds become every 500ms.
Major GCs go from 200ms to 2+ seconds.
P99 becomes P50. Every other request hits a GC pause.

The metrics that reveal it:

P99 spikes correlate with GC events
Sawtooth memory pattern gets steeper
CPU usage includes high system time (GC threads)

Real example from DD:
Real-time analytics platform. P99 at 1.5 seconds seemed OK.
Under 5x load test: P99 hit 12 seconds purely from GC.
Required complete move to off-heap memory. 6-month project.

What I check:

GC logs analysis (frequency and duration)
Heap size vs actual need
Object creation rate
Memory leak patterns
Native memory usage (off-heap)

The question that catches it:
"Show me P99 with GC events overlaid - what percentage of P99 is GC?"

The answer is usually 60-80% for Java services under pressure. Modern GCs help, but physics wins. More objects = more collection = worse P99.

5. The Cold Start Traffic Jam

What I find:
Auto-scaling works great! Scales from 10 to 100 pods in 2 minutes.
Lambda functions handle burst traffic. "Infinitely scalable."
Cold start: 3-5 seconds. "Only on first request."

What happens at scale:
Traffic spike triggers 50 new pods/lambdas.
Each serves first request in 5 seconds (cold start).
Those 50 users experience 5-second response times.
At Black Friday scale: 1,000s of cold starts = mass user abandonment.

The metrics that reveal it:

P99 spikes correlate with scaling events
First request to new instances always slow
Uneven distribution of latency across instance pool

Real example from DD:
Serverless API startup. Average response: 100ms. Beautiful.
P99: 6 seconds. All from cold starts.
At projected scale: 30% of requests would hit cold starts.

What I check:

Cold start frequency vs traffic patterns
Pre-warming strategies
Container/function size vs startup time
Scale-up triggers vs actual need
Keep-alive patterns

The question that catches it:
"What percentage of your P99 requests are first requests to new instances?"

Teams rarely track this. When they do, they discover that their "elastic infrastructure" is punishing their most valuable traffic spikes with terrible performance.

My P99 Technical DD Checklist

After finding these patterns repeatedly, I've developed a systematic approach:

Quick P99 Health Check (2 hours)

P99/P50 ratio - should be under 10x
P99 stability during deploys
P99 correlation with scaling events
P99 breakdown by endpoint
P99 trend over last 90 days

Deep P99 Analysis (2 days)

Database connection pool modelling
Cache expiration simulation
Timeout chain mapping
GC impact analysis
Cold start percentage calculation

P99 Load Test (1 week)

10x current traffic simulation
Cache invalidation under load
Cascading failure testing
Memory pressure testing
Scaling event impact

The P99 Questions That Save Deals

If you're evaluating a technical acquisition, ask:

"What's your P99 during your busiest hour?"
"Show me P99 grouped by database wait time"
"What happens to P99 when cache expires?"
"How does P99 change during deployments?"
"What percentage of P99 is from cold starts?"

If they can't answer these, you have a problem.

Need P99 Analysis for Your Acquisition?

These patterns hide in every infrastructure. They're invisible until you hit scale - or until someone who knows where to look finds them during due diligence.

Learn more about P99 audits and technical DD →