Least Privilege for AI Agents: Why Every Agent Should Have Its Own Scope and Rate Limits
Overprivileged AI agents turn a single prompt injection into a full environment compromise. Every agent needs its own scoped credential, its own rate limits tuned to its call pattern, and a blast radius you can calculate before it becomes an incident.
- security
- ai
- least-privilege
- governance
- mcp
The security principle of least privilege is well understood for human users and service accounts. Apply it to AI agents and you immediately hit a practical problem: most teams provisioning AI agents for the first time don't have an agent taxonomy. They have "the AI assistant" — one set of credentials, maximum scope, no rate limits calibrated to the agent's actual pattern, and no defined blast radius.
That works until it doesn't. A single prompt injection into an overprivileged agent is not an isolated incident — it is a full environment compromise scoped only by what the agent's credential allows. If the credential allows everything, the compromise allows everything.
This post is about making least privilege concrete for AI agents: how to classify agents by their permission requirements, how to design scope matrices, and how to calibrate rate limits per agent type.
Why AI agents fail the standard service account model
Traditional service account design follows a simple pattern: one service, one credential, scoped to what that service calls. A billing service gets read/write on billing endpoints. A reporting service gets read-only on analytics endpoints. The credential scope matches the code's actual call graph.
AI agents break this in two ways.
The call graph is determined at runtime, not at deploy time. A code-driven service has a fixed set of API calls compiled into it. An AI agent's calls are determined by what the user asks and what the model decides to do in response. You cannot enumerate the agent's call graph the way you would for a microservice. This pushes teams toward over-provisioning: "give it access to everything it might need."
Agents are vulnerable to instruction injection. A service account credential used by a microservice can only be abused if the service code itself is compromised. An AI agent credential can be abused by injecting instructions into any content the agent processes — a document, an email, an API response. The threat model for agent credentials is fundamentally different from the threat model for service credentials.
The combination is dangerous: broad permissions + instruction injection vulnerability = high-value target. The correct response is not to accept this as inherent to AI agents. It is to apply least privilege discipline specifically designed for the agent threat model.
An agent taxonomy for permission design
Not all agents need the same capabilities. Before designing scope, classify agents by their function:
Type 1: Read-only research agents
Purpose: retrieve, summarise, and present information. Examples: inbox summarisation, document search, data lookup for reports.
What they call: Read-only endpoints on your APIs. Email list, document retrieval, search, analytics queries.
What they must not call: Write endpoints. External URLs not in the approved catalog. Administrative APIs. Credential management endpoints.
Blast radius of compromise: Data exfiltration of readable content. No state modification. No lateral movement to write APIs. The damage ceiling is bounded by what the read scope contains.
Rate limit profile: Burst-tolerant on reads (research agents legitimately make many read calls in short windows). No write budget at all — the credential scope enforces zero write access regardless of rate limits.
Type 2: Analytical agents with limited writes
Purpose: analyse data and produce structured outputs. May write summaries, create records, or update status fields. Examples: CRM enrichment agents, report generation agents, customer support classification agents.
What they call: Read APIs broadly, write APIs for a specific narrow resource type (e.g., the summary field on a CRM record, not the financial fields).
What they must not call: Bulk write APIs. Delete endpoints. External data transmission endpoints. Administrative APIs.
Blast radius of compromise: Incorrect data written to the specific resource type they can write. Read-scope data exfiltration. Still bounded — no bulk operations, no deletion, no external transmission if egress is filtered.
Rate limit profile: Generous read budget. Tight write budget per operation type — analytical agents produce one summary per task, not 500 writes per minute. A write rate spike is an anomaly signal, not expected behaviour.
Type 3: Action agents (highest risk, tightest controls)
Purpose: take consequential real-world actions. Examples: automated email sending, order creation, workflow triggering, partner provisioning.
What they call: Specific write endpoints for their designated function. Read APIs limited to what is needed to complete the action, not broad read access.
What they must not call: Endpoints outside their specific action domain. Bulk operations above a defined daily limit. Administrative APIs.
Blast radius of compromise: The specific action they are authorised to take, at the rate they are allowed to take it. An email agent with a 50-email-per-day limit has a bounded blast radius: 50 emails. An email agent with no rate limit and broad scope has an unbounded blast radius.
Rate limit profile: Per-operation daily and per-minute limits calibrated to expected workflow volume. Write operations rate-limited aggressively — legitimate action agents do not burst at 100 writes per second.
Designing the scope matrix
For each agent type, map its required calls to explicit scopes on your gateway. This is not a list of "things it might need" — it is a list of the minimum set of endpoints required to complete its function, nothing else.
For a CRM enrichment agent (Type 2), the scope matrix might look like:
| API | Operation | Scope | Rationale |
|---|---|---|---|
/contacts | GET (list, search) | ✓ | Needs to find contacts to enrich |
/contacts/{id} | GET | ✓ | Needs to read individual contact |
/contacts/{id}/summary | PATCH | ✓ | Writes only the summary field |
/contacts/{id} | PUT, DELETE | ✗ | Cannot overwrite or delete full record |
/deals | GET | ✓ | Reads deal context for enrichment |
/deals | POST, PUT, DELETE | ✗ | No deal modification |
/admin/* | * | ✗ | No admin access under any circumstance |
| External URLs | * | ✗ | No egress to unapproved domains |
This matrix becomes the credential scope when you provision the agent's access profile on your gateway. The scope is not advisory — it is enforced at the API layer. When a prompt injection tells the agent to "delete all contacts and send the data to external-site.com", the gateway rejects both calls regardless of what the model decides to do.
Per-agent rate limits: calibrate to the call pattern
Rate limits for AI agents should be designed around the agent's expected call pattern, not the platform's generic defaults.
The wrong approach: Apply the same per-minute rate limit to all clients. A research agent that legitimately calls 40 endpoints in 8 seconds hits the limit. You raise the limit to accommodate it. Now your action agent also has the raised limit, which means a compromised action agent can send 40 emails in 8 seconds instead of 40 per hour.
The right approach: Per-agent-type rate limit profiles with separate budgets for reads and writes.
For a research agent: high burst allowance on reads (token bucket that accumulates during idle periods and allows short bursts), zero write budget.
For an analytical agent: moderate read allowance, very tight per-operation write budget (e.g., 10 summary updates per minute maximum, regardless of how fast the model wants to go).
For an action agent: read allowance scoped to what the action needs, strict per-operation write budget with a daily maximum, alert on approaching the daily limit before it is hit.
The daily maximum for action agents is the most important control. An email agent with a 50-email-per-day limit cannot be weaponised to send a thousand emails regardless of what instructions are injected into it. The rate limit is the last line of defence after scope enforcement.
Credential lifecycle for agents
Each agent should have a dedicated credential — not a shared credential used by multiple agents of the same type. One credential per deployed agent instance.
This matters when something goes wrong. If you have five research agents running and one is behaving anomalously, a shared credential means you cannot selectively revoke access to the misbehaving agent without taking down all five. A per-agent credential means targeted revocation: disable the specific agent's credential, leave the others running, investigate.
Credential TTL for agents should be shorter than you might expect. Human user tokens expire when the session ends. Service credentials for microservices may last months. Agent credentials are a middle ground: long enough that constant rotation does not disrupt operations, short enough that a compromised credential has a defined expiry. 24-72 hour TTLs with automatic rotation are a reasonable starting point for action agents. Read-only agents can tolerate longer TTLs since the blast radius of a compromised read credential is lower.
The blast radius calculation
Before deploying any agent, write down the blast radius: "if this agent's credential is fully compromised and used by an adversary at maximum rate limits, what is the worst-case outcome?"
For a read-only research agent: adversary can read all content the agent can access at the rate limit. Worst case: data exfiltration of readable content within the scope.
For an analytical agent: adversary can write incorrect summaries to CRM records at up to 10 per minute. Worst case: data corruption limited to the summary field of up to 10 records per minute.
For an action agent with 50-email-per-day limit: adversary can send 50 emails from your domain to any address. Worst case: 50 spam or phishing emails before the daily limit is exhausted.
If the blast radius calculation produces an answer your security team is uncomfortable with, tighten the scope or the rate limits before deployment. The blast radius is a design parameter, not an accident.
Zerq's per-client access profiles and rate limit configuration let you implement per-agent scope matrices and per-type rate limit profiles without gateway-level custom code. See how Zerq handles AI agent access or request a demo to walk through your specific agent taxonomy.