300 Petabytes, Zero Downtime, and AI Velocity (PayPal)

PayPal had a problem that sounds like a good problem to have. After 25 years of growth, acquiring Venmo, Braintree, and other services, and processing billions of transactions, the company had accumulated 400 petabytes of data spread across a dozen siloed systems. What started as a mountain of valuable customer insight had become a mountain range with no trails between the peaks.

The consequences were practical and painful. Providing a unified view of a small business owner using PayPal for online sales and Venmo for local transactions required complex, costly processes. Fraud detection models, personalization engines, and real-time analytics were all constrained by the same fragmentation. As PayPal described it, their own success in growth had created complexity that threatened their next evolution.

The Migration

PayPal consolidated multiple platforms, including what is believed to have been the world’s largest Teradata deployment, along with Hadoop clusters, Redshift, Snowflake, and other systems. They chose BigQuery as the destination: fully managed, with the ability to scale compute and storage independently, and native integrations with AI built in. The familiar SQL interface mattered too, given the scale of PayPal’s engineering organization.

The execution team, working with Google Cloud Consulting, migrated more than 300 petabytes of data, decommissioned around 25% of workloads, and did it all with zero downtime. No customer impact. No service interruption. For a payments company operating across approximately 200 markets with billions of transactions in flight, that constraint was non-negotiable.

Speed to AI Is the Real Prize

The migration unlocked something that’s hard to put a number on but easy to feel in a product roadmap: velocity. When data for model training is 16x fresher and feature engineering has instant access to clean, governed data, the time between “we want to build this AI feature” and “this AI feature is in production” compresses dramatically. Vertex AI is now optimizing logistics planning across more than 5,000 daily shipments. Queries run 2.5x to 10x faster, including the complex queries used by data scientists. Every new fraud detection model, every personalization improvement, every real-time analytics feature now builds on the same foundation instead of fighting infrastructure before the work even starts.

If you’ve been inside a company that has grown through acquisition and carries a fragmented data landscape, you know the internal conversation: too much data, too complex, can’t afford any downtime. That was PayPal’s situation, at a scale most organizations will never approach. One of the world’s largest payment companies, operating across 200 markets, moved 300 petabytes with zero customer impact. The question isn’t whether it’s possible. The question is whether you want to be building AI features on a unified foundation two years from now, or still explaining why your models are running on stale, siloed data.

Want to go deeper?

PayPal’s historic data migration (Google Cloud Blog), PayPal’s own account of the migration, the technical choices, and the AI outcomes.
PayPal’s Dataflow migration (Google Cloud Blog), How PayPal replaced their self-managed Flink infrastructure with Dataflow for real-time streaming analytics.