How to Do Canary Deployments at the API Layer Without Touching Your Backend Code

When a backend team ships a new version of a service, the deployment story typically involves someone — the backend team, or a platform team they have to wait for — adjusting load balancer weights, changing Kubernetes manifests, or modifying a service mesh config. The backend team writes the code. Someone else controls when it takes traffic.

The API gateway model inverts this. The backend team ships their new version. It sits at a new URL. The gateway workflow controls which requests go where, and the backend team — or whoever owns the workflow — adjusts the split without touching the backend code or the underlying infrastructure.

This is what that looks like end to end.

The deployment model

The key shift is treating the new backend version as just another upstream URL. Your current backend is at https://api-internal/payments/v1. Your new version ships to https://api-internal/payments/v2. Both are running. Neither knows about the other.

The gateway workflow sees all inbound traffic to /api/payments. It decides, per request, which upstream to call. The split starts at 0% v2. You move it to 1%, 5%, 25%, 100% — or back to 0% — by changing the workflow configuration and publishing. No infrastructure coordination. No manifest change. No kubectl command.

The backends just serve requests. They don't know about the split.

Starting the canary: 0% to internal only

Before any customer traffic hits v2, validate it with internal traffic. A header-based branch in the workflow routes requests with X-Version: beta to v2, everything else to v1.

Your team, your staging environment, and any automated integration tests that set this header hit the new backend. Real production traffic continues on v1.

This is the canary's zero-risk phase. V2 is receiving production infrastructure conditions — it is behind the real gateway, calling the real downstream services, logging to the real audit pipeline — but no customer sees a v2 response unless they explicitly request it.

When this phase passes, you move to percentage traffic.

1% traffic split: the real validation

The first percentage split is where you confirm that v2 behaves correctly under real production traffic, not just the requests your tests generate.

Set the workflow to route 1% of requests to v2 (random split, seeded per request ID or user ID for consistency). 99% continues on v1.

What to watch during this phase:

Error rate parity. V2's error rate should be equal to or better than v1's. If v2 is producing more 5xx responses, roll back immediately. The workflow change to roll back is changing 1% to 0% and publishing — seconds, not minutes.

Latency parity. V2's p50, p95, and p99 latency should not be worse than v1's. A performance regression in v2 that looks acceptable at 1% becomes significant at 50%.

Functional correctness. If v2 has schema changes — new fields, renamed fields, changed types — validate that clients handling v2 responses do not break. This is where the header-based routing from the first phase is useful: you can send internal clients to v2 while external clients stay on v1, and validate the schema change before exposing it externally.

Log divergence. Structured logs from v1 and v2 should be comparable. If v2 logs show unexpected error types, unusual call patterns to downstream services, or missing fields that v1 always produces, investigate before increasing the split.

Graduating the split

There is no universal graduation schedule, but a reasonable pattern for a non-breaking change:

Internal only (header-based): Until your integration tests are green against v2
1%: 24 hours with no error or latency regressions
10%: 48 hours with all metrics healthy
50%: Until you are confident in v2's full production performance profile
100%: Decommission v1, remove the canary branch from the workflow

For breaking changes — schema changes, removed fields, changed semantics — the graduation depends on consumer migration, not just backend stability. You may run at a partial split for weeks, gradually moving consumers to v2 while v1 serves legacy clients. The gateway workflow handles both concurrently with no backend changes.

Per-consumer routing during migration

Breaking API changes are the hard case. You cannot move all consumers to v2 simultaneously because some of them need migration time. The gateway workflow handles this with client-based routing:

The workflow checks the client credential (API key or OAuth client ID) against a list of clients migrated to v2. Migrated clients go to v2. Non-migrated clients go to v1. Both backends run in parallel, serving their respective clients, until the migration is complete.

This eliminates the need for the backend team to version their API surface (the /v1 and /v2 URL paths) for every consumer. The gateway owns the routing; the backend team can consolidate to a single version once all consumers have migrated.

Rollback without drama

The highest-value property of gateway-level canary is the rollback speed. When you see a regression in v2:

Open the workflow editor
Change the v2 percentage to 0 (or remove the v2 branch entirely)
Publish

Traffic returns to v1. The time between "this is bad" and "all traffic is on stable" is under a minute, with no infrastructure access required, and no coordination with a platform team.

Compare this to rolling back a Kubernetes Deployment or a load balancer weight change: you need cluster access, the right RBAC permissions, and the confidence that kubectl rollout undo will not interact badly with your HPA or PodDisruptionBudget settings.

The gateway rollback is reversible and observable: the workflow audit log records who made the change, at what time, and what the previous configuration was.

What the backend team actually needs to do

The backend team's job in this model:

Ship the new version to a new URL (a new container image tag, a new Kubernetes Deployment with a Service, or a new Lambda function version)
Confirm the new URL is reachable from the gateway
Hand the URL to whoever manages the workflow

That's it. They do not configure load balancer weights. They do not edit Kubernetes manifests for traffic management. They do not wait for a platform team to schedule a change window.

The backend team owns the code. The gateway workflow owns the routing. The separation means neither team is blocked by the other.

Zerq's workflow designer lets you configure canary splits, header-based routing, and per-client migration with a workflow publish — no infrastructure changes required. See the conditional routing use case or request a demo to walk through your deployment and migration workflow.