AI Agents Are Calling Your APIs. Your Infrastructure Wasn't Built for Them.

TL;DR

Nearly half of enterprises (48.9%) are completely blind to non-human AI agent traffic consuming their APIs right now (Salt Security, April 2026)

AI agents break API infrastructure in four specific, fixable ways: authentication, rate limiting, observability, and security

MCP has become the de facto connectivity standard — but it shipped without mandatory auth and has produced 30+ CVEs in two months

The fix isn't a new gateway. It's enforcing the same security controls on agent traffic that you already enforce on human traffic

This is the infrastructure problem of 2026, and most platform teams are behind

Your API gateway is probably handling traffic right now that it was never designed for.

Not a new partner integration. Not a mobile app. Autonomous AI agents — running inside Copilots, inside IDEs, inside customer-facing chatbots — are discovering your endpoints, authenticating with whatever credentials they were handed, and calling your APIs at a rate and pattern that your rate limits, audit logs, and authentication flows weren't designed to handle.

According to Gartner, 40% of enterprise applications will integrate task-specific AI agents by end of 2026 — up from less than 5% in 2025. IDC goes further: by 2027, agent use among Global 2000 companies will increase tenfold, with token and API call loads rising by a factor of 1,000.

Your infrastructure has months, not years, to get ready. Here's exactly what breaks and what to do about it.

The Four Ways AI Agents Break Your API Layer

1. Authentication: OAuth Was Designed for Humans Clicking "Approve"

The OAuth 2.0 Authorization Code + PKCE flow — the one your APIs almost certainly use — requires a browser, a user consent screen, and a human clicking a button. Autonomous agents can't do any of that.

Teams respond in predictable ways. They issue a service account with broad scopes and share the credentials across multiple agents. Or they use long-lived API keys because they're simpler. Both approaches produce the same result: when an agent is compromised or misconfigured, the blast radius is enormous, because the agent has been granted the full permission set of a user or service account that was scoped for convenience, not least privilege.

The structural problem is what security practitioners call the Confused Deputy problem: without audience binding, an agent can be tricked into surrendering its credentials to a malicious service. OAuth 2.1 addresses this via audience binding (RFC 8707) and sender-constrained tokens — but most API platforms haven't enforced it yet, and most teams haven't asked.

The right model is delegation with attribution: tokens that cryptographically encode both the user identity and the agent identity, so every API call carries the answer to "who authorized this, and which agent executed it." RFC 8693 (OAuth 2.0 Token Exchange) defines this. Your gateway should be enforcing it.

What this looks like in practice: short-lived tokens (15-minute expiry), automatic rotation every 24 hours, action-level scopes rather than broad grants, and a token vault — a centralized service that manages credential storage, rotation, and retrieval for every agent in your fleet. The same way a password manager works for humans, but built for autonomous software.

This is not a new problem. It's standard OAuth hygiene applied to a new class of clients. The infrastructure to solve it exists. Most teams just haven't applied it to agents yet.

2. Rate Limiting: Agents Don't Call Like Apps, and They Don't Stop

Traditional rate limiting is built around one assumption: a predictable request rate from a client that will back off when throttled. AI agents violate both halves of that assumption.

Consider a research agent tasked with "analyze our top five competitors." It might make 5 API calls. Or 500. The agent decides at runtime based on what it finds in each response. No rate limit you set in advance can anticipate this, because the call volume is a function of reasoning, not traffic.

And when you do rate-limit an agent, it doesn't give up. A human hitting a 429 navigates away. An agent with a goal retries with exponential backoff — forever, or until it exhausts its context window. You haven't throttled it. You've just made it patient.

The deeper problem is that request count is the wrong unit of measure. A single AI agent request can consume 100 times the backend resources of a typical human request, but a traditional rate limiter gives both the same tick. One chat completion burning through 8,000 tokens gets the same "1 request" count as a metadata lookup. The real cost is invisible until the invoice arrives.

The worst-case scenario is documented: a $47,000 LangChain recursive loop where two agents ping-ponged requests continuously for 11 days before anyone noticed. No spend limits, no monitoring on inter-agent communication, no stop conditions. The only thing that eventually slowed it down was an external API's rate limiter — not the team's own controls.

The fix has two parts. First, move from request-based to token-based rate limiting — count tokens consumed, assign cost weights per model and per operation, and track against budget envelopes rather than request counts. Second, set dollar-denominated governance boundaries: "$50 per agent per day" with a structured response when the budget is exhausted. This is deterministic and auditable in a way that "1,000 RPM" is not.

Your gateway should be enforcing this at the per-agent-identity level, not just per API key. If it can't differentiate one agent from another inside the same service account, you're governing a population you can't see.

3. Observability: You Can't Trace What You Can't See

Here's what your current logs show when an AI agent makes an API call: an API key, a timestamp, an endpoint, and a status code.

Here's what they don't show: which human authorized the action, what the agent's goal was, what reasoning led to this specific call, whether the call was part of a multi-step delegated workflow, and whether the outcome was correct or silently wrong.

The structural issue is that LLM-based agents are stateless by design. Each model invocation is independent. The "state" — the chain of reasoning, the context, the sequence of tool calls — exists inside the prompt context window, not in any system your logging infrastructure can observe. Traditional APM tools will tell you that POST /api/chat returned 200 in 4.2 seconds. They won't tell you that inside that request, the agent made five LLM calls, the third one selected the wrong tool, the tool returned stale data, and the model faithfully summarized garbage. That's a silent quality failure — a 200 response with wrong content — and it's invisible to standard monitoring.

The audit gap has five layers: identity (who), input (what triggered it), reasoning (why the agent decided), action (what API call it made), and outcome (what happened as a result). Most organizations can log the first and third. The reasoning and action layers are typically missing.

For regulated environments, this isn't just an ops problem — it's a compliance problem. When an auditor asks "who authorized this transaction and what was the chain of custody," you need to be able to answer for both human and agent-originated calls, in the same audit log.

The emerging standard is OpenTelemetry's GenAI Semantic Conventions: structured trace attributes including gen_ai.agent.id, gen_ai.request.model, gen_ai.usage.input_tokens, and gen_ai.usage.output_tokens. Instrument at 100% sampling for agent traces — unlike traditional sampling, dropping agent spans means dropping entire workflow executions. The key architectural pattern is workflow-scoped tokens: credentials issued per agent session, so every API call made under that token is automatically attributed to a specific user intent, agent, and workflow.

Your gateway should be emitting structured JSON logs for every agent-originated request, with correlation IDs that trace the full call chain. If your current gateway can't distinguish agent traffic from app traffic, start there.

4. Security: MCP's Real Problem Is That It Shipped Without Auth

The Model Context Protocol has become the de facto standard for AI-to-API connectivity. Backed by Anthropic, then adopted by OpenAI, Google DeepMind, Microsoft, and the Linux Foundation's Agentic AI Foundation, MCP now has 97 million monthly SDK downloads, 1,600+ servers in its official registry, and support from every major AI platform. Gartner predicts 75% of API gateway vendors will have MCP features by end of 2026.

Here is the problem: MCP was designed for connectivity, not governance. OAuth 2.1 authentication was added to the spec in June 2025 — as optional. Many implementers haven't enforced it. Between January and February 2026 alone, approximately 30 CVEs were filed against MCP servers, clients, and infrastructure.

The most consequential: CVE-2025-6514 (CVSS 9.6), a remote code execution vulnerability in mcp-remote — an OAuth proxy npm package with 437,000+ downloads. A malicious MCP server could return a crafted authorization_endpoint containing embedded shell commands. Full system compromise via MCP infrastructure, documented.

Beyond credentials, there's tool poisoning: malicious instructions hidden inside MCP tool descriptions that are visible to the LLM but invisible to users. Invariant Labs demonstrated an 84.2% success rate against agents with auto-approval enabled. Palo Alto's Unit 42 team identified three additional attack vectors through MCP's sampling mechanism: resource theft, conversation hijacking, and covert tool invocation. OWASP now lists prompt injection as the top risk in their LLM Top 10.

And there are real incidents. A GitHub MCP integration was hijacked via a malicious public issue — exfiltrating private repository contents into a public pull request. A Supabase MCP integration running inside Cursor executed SQL to exfiltrate sensitive tokens after an attacker submitted a crafted support ticket. Pynt's research puts it bluntly: deploying just 10 MCP plugins creates a 92% probability of exploitation.

The structural vulnerability, as Simon Willison identified, is what he calls the "Lethal Trifecta": an AI agent is fundamentally exploitable when it simultaneously has access to private data, processes untrusted content, and can communicate externally. Most production deployments satisfy all three conditions. The vulnerability is not a bug. It's the value proposition.

What this means for your API layer: every AI agent accessing your APIs via MCP is a client that may have been compromised at the tool layer before it ever reaches your gateway. Your gateway needs to enforce the same controls on MCP-originated traffic that it enforces on REST traffic — the same authentication, the same rate limits, the same audit trail — regardless of how the agent arrived.

If you're running a separate "AI endpoint" with looser controls because it's "just for internal agents," you have a security gap. One gateway, one set of rules, one audit log.

What Agent-Ready API Infrastructure Actually Looks Like

The good news: you don't need to build a new gateway. You need to enforce existing controls on a new class of client.

The checklist is concrete:

Authentication layer:

Require OAuth 2.1 with audience binding for all agent clients
Issue short-lived, action-scoped tokens — not broad service account keys
Enforce client_id + agent_id attribution on every token so logs answer "which agent made this call"
Rotate credentials automatically; store secrets in Vault or equivalent, never in agent config

Rate limiting and cost governance:

Move to token-based rate limiting — count tokens consumed, not requests sent
Set per-agent budget envelopes with hard ceilings and structured 429 responses
Add circuit breakers for quality failures (200 OK responses with wrong content), not just availability failures
Set maximum API calls per agent workflow, not just per time window

Observability:

Emit structured JSON logs with correlation IDs for every agent-originated request
Include agent identity, parent workflow ID, user delegation chain, and token consumption
Instrument at 100% sampling — don't sample away entire agent execution traces
Route logs to your SIEM; agent traffic needs the same security monitoring as human traffic

Security:

Enforce MCP OAuth 2.1 — treat "optional" as mandatory at your gateway
Apply per-partner access controls: each agent client sees only the API products it's been explicitly granted
Pin MCP server package versions in production and run mcp-scan in CI
Map every agent action to a risk level; require human-in-the-loop approval for high and critical operations

The Governance Gap Is the Actual Problem

The technology to solve all four failure modes exists today. Standard OAuth flows, token-based rate limits, OpenTelemetry instrumentation, and gateway-level access control are not novel concepts.

What's missing is enforcement. Forrester found that 71% of enterprises deploying AI agents lack a formal governance framework for them — even as 64% plan to increase agent autonomy in the next 12 months. Salt Security found that 48.3% of organizations cannot differentiate legitimate AI agents from malicious bots in their API traffic. McKinsey found that only 30% of organizations have reached maturity level 3 or above in AI governance for agentic systems.

The gap between "we have agents in production" and "we can audit, rate-limit, and secure agent traffic the same way we do human traffic" is where the incidents happen. The $47,000 recursive loop. The data leak that triggered a Sev-1 alert. The MCP integration that got hijacked via a crafted support ticket.

The right architecture isn't a separate AI gateway with its own rules, its own credentials, its own logs. It's one gateway, with one control plane, enforcing consistent policy on every client — human-initiated or agent-initiated — with a complete audit trail that doesn't distinguish between the two.

Zerq is built for exactly this model. Every AI agent accessing your APIs via Gateway MCP uses the same client credentials as your REST apps, goes through the same access control and rate limiting policies, and appears in the same audit log. There's no second deployment, no second set of credentials, no separate audit trail for "AI traffic." One gateway path, one set of rules, one place to answer when your auditor or your incident responder asks who called what and when.

If you're working through what this looks like for your infrastructure, the Zerq documentation on Gateway MCP walks through the architecture in detail. Or request a demo to walk through your specific environment.

The Window Is Narrow

AI agent adoption is not gradual. Gartner's estimate — 5% of enterprise apps with agents in 2025, 40% by end of 2026 — represents one of the fastest adoption curves in enterprise software history. IDC's "1,000x API call load by 2027" is not hyperbole. It's the math when you multiply agent count by the number of API calls per agent workflow by the number of workflows per day.

Your API infrastructure will carry that load whether you've prepared for it or not. The agents are already calling. The question is whether you can see them, govern them, and stop them when something goes wrong.

The four failure modes above — authentication, rate limiting, observability, security — are not theoretical. They are documented, with dollar amounts and incident reports attached. The fixes are not exotic. They are standard infrastructure work applied to a new class of client.

The teams who treat this as routine platform work — assign the tickets, enforce the policies, close the gaps — will be invisible in the incident reports next year. The teams who treat it as "something to deal with later" will not.

Zerq is an enterprise API gateway that gives platform engineers one control plane for APIs and AI agents — with the same authentication, rate limits, and audit trail for both. On-prem, hybrid, or cloud. No vendor lock-in. See how it works →