BIPI

API Rate Limiting Strategies That Don't Hate Your Customers

Digital Engineering

Most rate-limiting implementations are designed to protect the server. Few are designed to be usable by API consumers. The patterns we keep recommending and the customer-facing rate-limit policy that survives integration review.

By Arjun Raghavan, Security & Systems Lead, BIPI · August 18, 2025 · 7 min read

#api#rate-limiting#backend#architecture

Every API needs rate limiting. Almost no API ships rate limiting that the team that integrates with it actually likes. The implementations are designed for one customer, the server, and treat the consumer's experience as someone else's problem. The cost shows up later as integration churn, support tickets, and engineering time spent on retry logic that should have been unnecessary.

These are the patterns we recommend on engagements where a public or partner API is in scope.

Pick the right algorithm for the threat

Fixed-window limiters (the default in most quick implementations) reset on a clock boundary. They are simple and bursty. A consumer can hit the limit at 11:59:59 and again at 12:00:01, doubling the rate near the boundary. For abuse protection this is not great.

Sliding-window limiters keep a rolling count. Smoother behaviour, slightly more state. Most reverse proxies implement this natively (nginx limit_req, envoy local_ratelimit).

Token bucket limiters handle bursts cleanly. The bucket fills at a constant rate; each request consumes a token; bursts up to the bucket size are tolerated. This is the right default for almost all customer-facing APIs because real traffic is bursty (page-load spikes, batch jobs, retries) and a strict sliding window will reject legitimate behaviour.

Scope: per-key, per-IP, per-user, per-endpoint

Most APIs need at least three layers.

Per-IP at the edge (5,000 requests per minute, say). Catches scrapers and broken clients that have not authenticated yet.
Per-API-key for authenticated requests (1,000 per minute, say, with bursts to 2,000). The customer's contract.
Per-endpoint, per-key for expensive routes (e.g., 10 per minute on a bulk-export endpoint). Stops one customer from monopolising a heavy resource.

Skipping the per-endpoint layer is the most common mistake. A single endpoint with 100x the cost of an average request can take down the API even if the customer is well within their global rate limit.

Communicate the limits clearly

Every rate-limited response should answer three questions for the consumer.

What was the limit? Send X-RateLimit-Limit on every response.
What is the current usage? Send X-RateLimit-Remaining on every response.
When can the consumer retry? Send Retry-After on a 429, in seconds.

Without these headers the consumer has to guess, which means they implement exponential backoff with jitter on every error and your API ends up looking flakier than it is. Status codes matter too. 429 for rate limit, never 503 (which means service unavailable and triggers different retry semantics in client SDKs).

Soft limits and graceful degradation

For premium customers, hard 429s are bad customer experience. The pattern we deploy in those cases is a soft limit: above the limit, the API still serves the request but with longer latency (queued for processing) or in degraded mode (cached results, fewer fields). The customer's workload completes, but the protection on the server is preserved.

Implementing this is easy: a token bucket per customer plus a flag in the request handler. The hard part is the policy. Which endpoints can be queued? Which fields can be dropped? Decide that with product, not in the rate limiter.

Idempotency keys

If your API has any write endpoints, idempotency keys are the companion to rate limiting. When a consumer retries after a 429 (or a network blip), an idempotency key prevents duplicate writes. Stripe popularised the pattern; the same Idempotency-Key header is the de facto standard now. Honour it on every state-changing endpoint.

Closing

Rate limiting is not just a defensive control. It is part of the contract you offer the consumer. Algorithms that handle bursts, scopes that protect expensive endpoints, headers that tell consumers what is happening, and policies that are documented in numbers, not adjectives. Get those right and your API stops feeling adversarial. Get them wrong and every integration becomes a slow argument.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.