Token Bucket

Starts with a bucket full of tokens (up to capacity). Each allowed request consumes one token. Tokens are replenished at a fixed rate over time, up to the maximum capacity.

How It Works

Compute elapsed time since the last check.
Add elapsed * rate tokens to the bucket (capped at capacity).
If tokens >= 1, consume one token and allow.
Otherwise, deny.

flowchart TD
    Start[Request arrives] --> Refill["Add elapsed x rate tokens"]
    Refill --> Cap["Cap at max capacity"]
    Cap --> Check{"tokens >= 1?"}
    Check -->|Yes| Consume["Consume 1 token"] --> Allow["ALLOW"]
    Check -->|No| Deny["DENY"]

Parameters

Name	Type	Description
`capacity`	`int`	Maximum number of tokens the bucket can hold; also the burst size
`rate`	`int`	Tokens added per second

Trade-offs

Pros:

Allows controlled bursts up to capacity while enforcing a long-term average rate
O(1) time and memory per request — just arithmetic
Smooth refill — no sudden counter resets like Fixed Window

Cons:

Burst size and refill rate are separate knobs that must be tuned together; misconfiguration can allow unwanted spikes

Comparison

vs Leaky Bucket: Token Bucket permits bursts (spend all tokens at once), while Leaky Bucket enforces a steady drain rate and absorbs bursts into the queue. Token Bucket is more permissive for bursty workloads; Leaky Bucket is better for smoothing output rate.

View Source on GitHub