The Voice AI Fast Enough to Pass as Human. (Cartesia)

Voice AI has a latency problem. Humans notice delays below 200 milliseconds in conversation. Most AI voice systems, until recently, couldn’t get anywhere close. The result was that AI-powered voice products felt stilted, unnatural, and frustrating to use in real time.

Cartesia set out to solve that. The company builds real-time voice AI infrastructure, and its flagship model, Sonic, delivers audio responses with sub-90ms latency. That’s fast enough for genuinely natural conversation. More than 50,000 companies are now using it.

Built on Google Cloud

To hit sub-90ms latency at scale, Cartesia needed infrastructure that could keep up. The company built on Google Cloud, using its GPU infrastructure and global network to run inference fast enough that the latency stays imperceptible. When you’re serving real-time voice to tens of thousands of companies, the compute and networking underneath it matter enormously.

The Google Cloud case study describes Cartesia as having built “the world’s fastest voice AI” on Google Cloud infrastructure, with Sonic reaching production quality in human evaluations.

What This Looks Like in Practice

Cartesia is a good example of what the ISV opportunity around GCP AI actually looks like at its best. The company built a product that wouldn’t exist without Google Cloud’s AI infrastructure, and that product is now embedded in tens of thousands of other products.

Every company using Cartesia’s API to build a voice assistant, an AI phone agent, or a real-time translation tool is running on top of infrastructure Cartesia built on Google Cloud. That’s the compounding nature of the ISV model: the infrastructure investment Cartesia made flows downstream to every company in its ecosystem.

Sub-90ms latency was the hard technical requirement that made real conversation possible. Google Cloud is where Cartesia achieved it.

Want to go deeper?