Callsure.ai

On March 15th at 2:47 AM, our system processed its one millionth call. I know the exact time because I was awake, watching our dashboards, convinced something would break. Nothing did. But getting there? That's a story worth telling.

This isn't a victory lap. It's an honest account of what it takes to build voice AI infrastructure that actually works at scale including the spectacular failures that taught us more than our successes ever could.

Why This Matters: Voice AI has a latency budget of about 300ms before conversations feel awkward. That's 5-10x more demanding than typical web applications. Scale problems hit earlier and harder.

The Day Everything Fell Apart

Let me start with our worst day: October 3rd, 2023.

We'd just signed our biggest customer a national retail chain expecting 50,000 calls during their holiday promotion. We were confident. Our load tests looked great. We'd provisioned extra capacity.

At 9:03 AM on launch day, our average latency spiked from 180ms to 2.4 seconds. Conversations became impossible. Customers were hanging up. Our customer's support lines were melting down. My phone wouldn't stop buzzing.

What went wrong? Our architecture had a hidden bottleneck we'd never seen in testing.

"Load tests lie. They tell you how your system handles synthetic traffic. Real traffic is messier, more correlated, and always finds the weakness you didn't know existed."
A very expensive lesson That cost us $47,000 in credits and nearly a customer

The Architecture That Finally Worked

After October 3rd, we rebuilt from first principles. Here's what our production system looks like today:

Edge Layer

Global PoPs for <50ms to any caller. WebRTC termination, initial audio processing.

↓

Processing Layer

Distributed STT/NLU processing. Auto-scaling Kubernetes clusters. Regional failover.

↓

Intelligence Layer

LLM inference with response caching. Context management. Decision routing.

↓

Integration Layer

CRM connectors, action execution, human handoff orchestration.

The Five Lessons That Changed Everything

Lesson 1: Latency Is a Feature, Not a Metric

In web applications, the difference between 200ms and 400ms response time is barely noticeable. In voice AI, it's the difference between natural conversation and awkward silence.

We obsess over P99 latency, not averages. A system with 150ms average but 800ms P99 will frustrate 1 in 100 users consistently. They'll never trust it.

147ms

P50 Latency

Median response time

203ms

P95 Latency

95th percentile

289ms

P99 Latency

Worst 1% of calls

99.97%

Uptime

Last 12 months

Lesson 2: Cache Everything That Doesn't Change Mid-Conversation

Here's a secret: about 40% of what our AI "thinks about" during a call doesn't actually require real-time computation.

Customer history? Cached. Product information? Cached. Common response patterns? Cached. We only hit our LLM for genuine reasoning tasks.

This caching strategy reduced our compute costs by 34% and improved average latency by 28%.

Lesson 3: Graceful Degradation Isn't Optional

Systems fail. Networks have bad days. Cloud providers have outages. The question isn't if something will break it's what happens when it does.

Our degradation hierarchy:

1 Primary region down: Auto-failover to secondary in <3 seconds
2 LLM latency spike: Fall back to cached responses for common intents
3 Integration failure: Queue actions, complete call, retry async
4 Complete outage: Graceful handoff to human queue with context preserved

Lesson 4: Observability Is Your Immune System

We track 847 distinct metrics across our infrastructure. That sounds excessive until you realize that our October 3rd incident would have been caught by metric #312 (connection pool saturation rate) if we'd been watching it.

Lesson 5: The Best Architecture Is One You Can Change

Our system looks nothing like it did 18 months ago. That's not technical debt it's evolution.

Hard-Won Wisdom: Design for replaceability, not permanence. The technology landscape changes too fast to bet everything on any single provider or approach.

The Numbers Today

1.2M+

Calls Processed

And counting

Languages

Real-time support

15K+

Concurrent Calls

Peak capacity tested

$2.3M

Infra Cost Saved

Via optimization work

For Engineers Building Similar Systems: Start with observability. Build for failure. Cache aggressively. And remember the system that survives isn't the most sophisticated one, it's the one that fails gracefully.

Want to Join Our Engineering Team?

We're hiring engineers who love solving hard problems at scale.

View Open Roles →

Building Scalable Voice AI: What We Learned from 1 Million Calls