Apigee Got a New Job: The Control Plane for Your AI.

For most of the last decade, an API gateway was a simple idea. Take an HTTP request, authenticate it, maybe rate limit it, send it to the right backend. Boring, important, and mostly invisible. Apigee was really good at this. Then generative AI happened, and suddenly the API gateway has a much more interesting job.

Google Cloud has been quietly turning Apigee into a control plane for AI. The scope of what it now handles is worth understanding.

The Problem with Calling LLMs Like They’re Regular APIs

Calling an LLM isn’t like making a database query. A database query costs roughly the same every time. An LLM call can cost a penny or a dollar. It depends entirely on how many tokens are in the prompt and the response. Tokens are the unit of work for LLMs: roughly 4 characters each. You pay for every one, input and output combined.

This creates a governance problem that standard API management tools weren’t built for. Limiting requests per minute does nothing to stop a single verbose user from sending a 10,000-token prompt every minute. Standard auth controls unauthorized access. It doesn’t cover a prompt injection attack where a user tricks the model into ignoring your instructions. And logging the request tells you an API call happened, not what was actually sent or returned.

Apigee now handles all of this natively, in the proxy layer, before the request ever reaches the model.

What It Actually Does Now

The centerpiece of the AI management story is the LLM gateway capability. Apigee sits in front of any model endpoint, including Vertex AI, Gemini, OpenAI, Anthropic, and self-hosted models, and applies policies to every call. Token quotas enforce per-user or per-tenant spend limits on both input and output tokens. Semantic caching uses Vertex AI embeddings to detect when two different prompts are asking essentially the same question, and returns the cached answer instead of making a redundant model call. Model Armor runs natively in the proxy layer. It validates prompts and filters outputs for prompt injection, jailbreak attempts, and sensitive data exposure.

The semantic caching piece is worth dwelling on for a second. Traditional caching is exact-match: the same string returns the cached result. Semantic caching is different. If one user asks “what is the refund policy” and another asks “how do I get my money back,” those are different wordings of the same question. Semantic caching catches that, returns the same answer, and saves the model call. For products with high query overlap, this can cut LLM costs significantly without touching a line of application code.

The Agentic API Problem

The more interesting frontier is what happens when AI agents start calling your APIs. Agents are not passive. They reason, plan, and take actions: calling external systems, retrieving data, triggering workflows. Model Context Protocol (MCP) is the emerging standard for how agents describe and call external tools. Apigee now manages MCP servers natively.

In practice, an ISV with an existing REST API catalog can expose those APIs as MCP tools through Apigee. No rewriting required. The agent ecosystem can discover and call them. Apigee handles authentication, rate limiting, and observability on every tool call, the same way it handles every other API call. The ISV’s existing API surface becomes an agentic integration layer by default.

This matters because agent-driven traffic behaves differently from human-driven traffic. An agent making autonomous decisions can generate bursts of rapid, sequential API calls that look nothing like a human user session. Without policy enforcement at the infrastructure layer, a misbehaving agent can hammer a backend system in ways that are hard to detect and expensive to recover from.

Why This Matters

For software vendors selling into regulated industries, the compliance question around AI features is real. When a healthcare company or bank asks how sensitive data stays out of the model, “we train developers to be careful” doesn’t close the deal. The answer “we enforce prompt sanitization and output filtering at the infrastructure layer, with audit logs on every call” is a different conversation.

Apigee makes that second answer possible without requiring the application team to build it themselves. The governance layer is in the proxy. Audit trails live in Cloud Logging. Token budget enforcement sits in the policy. None of it requires application code changes.

A few things worth thinking about: If your AI features scale to 10x current usage tomorrow, do you have visibility into which customers are consuming what? If an agent in your product starts looping and making thousands of API calls, what stops it? And when your next enterprise prospect asks how you govern your AI infrastructure, what is your answer?

Want to go deeper?

  • Apigee AI Gateway overview, The full capability set: token quotas, semantic caching, Model Armor, multi-model routing, and observability.
  • MCP support for Apigee, How Apigee manages remote MCP servers and surfaces existing APIs as agent tools.
  • Using Apigee for AI, Engineering deep dive on multi-model routing, RAG integration, and agentic API patterns.