Major League Baseball generates terabytes of data per game. For years, the vast majority of that data died in a silo. We’ve entered an era where we have entirely too much data and nowhere near enough context. Scoreboards will gladly flash a 100-mph pitch velocity, but they won’t tell you that the batter hasn’t hit a slider for a strike in three weeks, or that the pitcher’s grandfather played for the exact same franchise in 1954. Fans don’t need a firehose of raw decimals; they need a digital scout translating data into interesting narrative in real time.
The Cost of Cold Telemetry
For years, we’ve had rows of data that could tell us exactly how many inches a curveball dropped, but translating that into why it mattered for the current at-bat required expert extrapolation. If you weren’t a professional scout, the magic of the game could get lost in math. This is the classic enterprise IT trap: you build a massive lake of data, and then you only give the users a very small, very confusing straw to drink from.
Major League Baseball knew they had a latency problem that wasn’t about the network. It was cognitive latency. A fan might look up a stat to understand why a play was significant. By then, the next batter was already walking to the plate. In the world of live sports, an insight that’s three seconds late is effectively worthless. Most generative AI projects stumble here because “real-time” usually means “sometime before the sun burns out.” MLB needed something that could parse petabytes of history. It had to look at the live scenario and spit out a clever insight in under two seconds.
Surprisal: The Math of Being Interesting
They turned to Google Cloud and Gemini models to build “Scout Insights”, the goal of which was the be interesting, not merely accurate. The engineering team used a mathematical concept called “surprisal” to train the AI. It’s essentially a measure of how unexpected a piece of information is. Let’s be honest. Nobody needs an AI to tell them that a home run is a good thing. We need to know that the home run was the hardest-hit ball in American Family Field since 2014. That’s the kind of trivia that wins bar bets. It makes you feel like the smartest person in the stands.
The architecture behind this is where things get truly nerdy in the best way possible. The system doesn’t wait for a play to happen and then ask a model to think about it. Instead, it uses BigQuery and AlloyDB to anticipate possible scenarios before they occur. Yes! It looks at the lineups, the historical matchups, and the stadium context. Then it pre-generates potential insights. When the live data hits the system, it doesn’t have to “think” from scratch. It just matches the reality of the diamond to the best pre-baked insight in the library. It’s the ultimate “this meeting could have been an email” move for data processing. This rapid-fire match keeps the digital commentary in lockstep with the physical game.
The Outcomes of Digital Color Commentary
Opening Week 2026 served as the ultimate trial by fire. The tech didn’t just hold up; it absolutely crushed it, spitting out roughly one high-value insight every single inning across fifteen daily games. We weren’t just staring at dry batting averages anymore. Gemini started digging up real gold. For instance, it flagged that two opponents were born just 47 miles apart in the same corner of Texas. It even spotted that a specific outfielder hadn’t been on the injured list in nine straight seasons. This isn’t just basic data processing; it’s color commentary that actually scales. You can’t hire enough experts and announcers to do what this system does autonomously every night.
This transformation matters because it shifts the role of technology. Tools move from being a recording device to an interpreter. For years, the promise of “big data” in sports was that we’d eventually know everything. But knowing everything is exhausting! What we actually want is to understand what matters. By using Gemini to filter for “surprisal,” MLB has created a filter for relevance. They’ve taken a hundred years of tradition and a decade of high-frequency sensors and turned them into a conversation. It turns out the best use for generative AI isn’t replacing the game. It’s helping us remember why we loved it in the first place.
The Six-Season Partnership
Building this wasn’t just about plugging in an API and hoping for the best. A six-season partnership saw Google Cloud engineers embed with the MLB team. This effort mirrored a tech spring training camp. Everyone had to learn how to play together. They had to tune the AI’s personality. This ensured it was playful but still respected the “canonical truth” of the game. If the AI started making jokes about the mascot during a crucial play, fans would’ve revolted. It’s a delicate balance. You have to be a helpful scout without being that guy in the bleachers who won’t stop talking.
Ultimately, the Scout Insights project is a blueprint for any enterprise. It shows how to make sense of a massive data footprint. Find the “surprisal” in your data if you want users to care. Move past the dashboard and into the narrative. You might be selling season tickets or software licenses. Your customers don’t want a firehose of information. They want a clever tidbit that makes them feel like part of the story. If you can deliver that in under two seconds, you’ve already won.
Want to go deeper?
MLB pitches AI-powered commentary in its play-by-play app
How MLB is bringing AI-powered color commentary to fans with Scout Insights
