Your metrics look healthy:
- Average response time: 50ms
- P50 (median): 45ms
- P99: 8 seconds
That P99 is about to kill your acquisition.
After 50+ infrastructure audits, these five P99 patterns appear in nearly every technical due diligence. They're invisible at current scale, devastating at 10x.
1. The Connection Pool Time Bomb
What I find:
PostgreSQL max_connections set to 100. Application pool set to 95.
Current usage: 40-60 connections. "Plenty of headroom."
At 2x traffic, connections hit 90+. P99 requests wait for available connections.
At 3x traffic, total gridlock. P99 goes from 200ms to 30+ seconds.
Connection management is tricky with databases in general. More connections doesn't necessarily mean more performance otherwise we'd just have an unlimited of connections. More connections means more context switching means more overhead means cpu more utilization until your database's CPU is maxed out.
What happens at scale:
The metrics that reveal it:
- P99 spikes correlate with connection pool saturation
- Database CPU at 20% but response times through the roof
- Error logs show "timeout waiting for connection pool"
- This is usually a "tipping point", i.e. things will be fine and then suddenly degrade
Real example from DD:
SaaS platform raising Series B. Everything looked fine at 10K concurrent users.
Modeled 50K users: complete connection starvation. P99 hit 45 seconds.
Fix required major architectural changes. Deal repriced down $3M.
What I check:
- Connection management
- Connection lifecycle (how long held)
- Prepared statement cache vs connection limits
- Queries execution speed
- PgBouncer or connection pooling layer
The question that catches it:
"Show me P99 latency grouped by connection pools wait time."
Most teams have never run this query. When they do, the correlation is undeniable. If you see P99 spikes that don't correlate with CPU or memory usage, you're likely looking at connection starvation. If they do correlate with high CPU, you're using too many connections, and need to optimize your queries.
2. The Cache Expiration Stampede
What I find:
Beautiful cache hit ratios: 95%+. Response times under 50ms.
Cache TTL: 1 hour. Looks perfect.
What happens at scale:
Popular key expires. 1,000 simultaneous requests hit the database.
Database generates the same expensive query 1,000 times.
P99 spikes to 10+ seconds. If using Redis, potential memory explosion.
The metrics that reveal it:
- P99 spikes exactly every hour (or whatever TTL is)
- Database CPU spikes align with cache expiration
- Redis memory usage spikes during regeneration
Real example from DD:
E-commerce platform preparing for Black Friday. Normal traffic: fine.
Simulated Black Friday: cache expiration caused 45-second P99 spikes.
Every expired product page triggered 500+ database queries.
What I check:
- Cache warming strategies
- Probabilistic early expiration
- Distributed lock on cache regeneration
- Background refresh vs synchronous regeneration
- Request coalescing (make sure that if many requests go out to the same id, we only fetch it from the upstream store once)
The question that catches it:
"What happens when your top 10 cache keys expire simultaneously under peak load?"
The answer usually involves nervous laughter. Smart teams use jittered TTLs and background refresh. Most teams haven't thought about it until their first Black Friday.
3. The Timeout Death Chain
What I find:
Service A → Service B → Service C
Each with 5-second timeouts. "Conservative" timeout strategy.
Total possible wait: 15 seconds. Math checks out, right?
What happens at scale:
Service C slows down (P99 hits 4 seconds).
Service B waits, queues backup (P99 hits 8 seconds).
Service A waits for B, connections exhausted (P99 hits 15+ seconds).
Users experience random 30-second page loads.
The metrics that reveal it:
- P99 latency multiplication across service boundaries
- Thread pool exhaustion in upstream services
- Correlation between downstream P99 and upstream P99
Real example from DD:
Fintech platform with 12 microservices. Payment service P99 of 2 seconds cascaded into 25-second checkout times. During DD load test, discovered $2M of required circuit breaker implementation.
What I check:
- Timeout settings vs actual P99 of dependencies
- Circuit breaker implementation
- Bulkhead patterns
- Async vs sync communication patterns
- Retry amplification effects
The question that catches it:
"Map out your timeout chain for your critical user path - what's the worst case?"
When teams actually do the math, they realize their "5-second timeout" can turn into 30+ seconds of user wait time. The fix isn't shorter timeouts - it's circuit breakers and async patterns.
4. The Memory Pressure P99 Killer
What I find:
JVM heap set to 8GB. Normal usage: 4-6GB. "We have buffer."
GC pauses: 50-200ms. "Acceptable."
P99: 2 seconds. "Not great, but livable."
What happens at scale:
10x traffic = 10x object creation.
Minor GCs every 5 seconds become every 500ms.
Major GCs go from 200ms to 2+ seconds.
P99 becomes P50. Every other request hits a GC pause.
The metrics that reveal it:
- P99 spikes correlate with GC events
- Sawtooth memory pattern gets steeper
- CPU usage includes high system time (GC threads)
Real example from DD:
Real-time analytics platform. P99 at 1.5 seconds seemed OK.
Under 5x load test: P99 hit 12 seconds purely from GC.
Required complete move to off-heap memory. 6-month project.
What I check:
- GC logs analysis (frequency and duration)
- Heap size vs actual need
- Object creation rate
- Memory leak patterns
- Native memory usage (off-heap)
The question that catches it:
"Show me P99 with GC events overlaid - what percentage of P99 is GC?"
The answer is usually 60-80% for Java services under pressure. Modern GCs help, but physics wins. More objects = more collection = worse P99.
5. The Cold Start Traffic Jam
What I find:
Auto-scaling works great! Scales from 10 to 100 pods in 2 minutes.
Lambda functions handle burst traffic. "Infinitely scalable."
Cold start: 3-5 seconds. "Only on first request."
What happens at scale:
Traffic spike triggers 50 new pods/lambdas.
Each serves first request in 5 seconds (cold start).
Those 50 users experience 5-second response times.
At Black Friday scale: 1,000s of cold starts = mass user abandonment.
The metrics that reveal it:
- P99 spikes correlate with scaling events
- First request to new instances always slow
- Uneven distribution of latency across instance pool
Real example from DD:
Serverless API startup. Average response: 100ms. Beautiful.
P99: 6 seconds. All from cold starts.
At projected scale: 30% of requests would hit cold starts.
What I check:
- Cold start frequency vs traffic patterns
- Pre-warming strategies
- Container/function size vs startup time
- Scale-up triggers vs actual need
- Keep-alive patterns
The question that catches it:
"What percentage of your P99 requests are first requests to new instances?"
Teams rarely track this. When they do, they discover that their "elastic infrastructure" is punishing their most valuable traffic spikes with terrible performance.
My P99 Technical DD Checklist
After finding these patterns repeatedly, I've developed a systematic approach:
Quick P99 Health Check (2 hours)
- P99/P50 ratio - should be under 10x
- P99 stability during deploys
- P99 correlation with scaling events
- P99 breakdown by endpoint
- P99 trend over last 90 days
Deep P99 Analysis (2 days)
- Database connection pool modelling
- Cache expiration simulation
- Timeout chain mapping
- GC impact analysis
- Cold start percentage calculation
P99 Load Test (1 week)
- 10x current traffic simulation
- Cache invalidation under load
- Cascading failure testing
- Memory pressure testing
- Scaling event impact
The P99 Questions That Save Deals
If you're evaluating a technical acquisition, ask:
- "What's your P99 during your busiest hour?"
- "Show me P99 grouped by database wait time"
- "What happens to P99 when cache expires?"
- "How does P99 change during deployments?"
- "What percentage of P99 is from cold starts?"
If they can't answer these, you have a problem.
Need P99 Analysis for Your Acquisition?
These patterns hide in every infrastructure. They're invisible until you hit scale - or until someone who knows where to look finds them during due diligence.