Concurrency Limiter
Unlike time-based strategies, this limits how many requests are in-flight simultaneously rather than how many arrive per window. Useful for protecting downstream services from being overwhelmed by concurrent load regardless of arrival rate.
How It Works
- Maintain a counter of currently active (in-flight) requests.
- On
allow(): ifactive < max_concurrent, increment and allow. - The caller must signal completion (via
release()) when the request finishes, so the counter is decremented.
flowchart TD
Start[Request arrives] --> Check{"active < max_concurrent?"}
Check -->|Yes| Inc["Increment active counter"] --> Allow["ALLOW"]
Check -->|No| Deny["DENY"]
Allow --> Process["Process request..."]
Process --> Release["release: decrement active counter"]
A context-manager pattern is natural here:
Parameters
| Name | Type | Description |
|---|---|---|
max_concurrent |
int |
Maximum number of simultaneous in-flight requests |
Trade-offs
Pros:
- Directly prevents overload — caps actual parallel work
- No time tracking or memory proportional to request volume
Cons:
- Requires caller cooperation to release — a missed release leaks a slot permanently (timeout-based auto-release can mitigate this)
- Does not limit request rate — 1000 fast sequential requests per second all pass if each finishes before the next starts
Comparison
vs Token Bucket / Fixed Window: Time-based strategies limit the rate of requests regardless of duration. Concurrency Limiter limits the parallelism regardless of rate. They address different problems and are often combined.