The basic idea of rate limiting algorithms is simple. At the high-level, we need a counter to keep track of how many requests are sent from the same user, IP address, etc. If the counter is larger than the limit, the request is disallowed.
Where shall we store counters? Using the database is not a good idea due to slowness of disk access. In-memory cache is chosen because it is fast and supports time-based expiration strategy. For instance, Redis [11] is a popular option to implement rate limiting. It is an in-memory store that offers two commands: INCR and EXPIRE.
INCR: It increases the stored counter by 1.
EXPIRE: It sets a timeout for the counter. If the timeout expires, the counter is automatically deleted.
Figure 12 shows the high-level architecture for rate limiting, and this works as follows:
Figure 12
Figure 12
The client sends a request to rate limiting middleware.
Rate limiting middleware fetches the counter from the corresponding bucket in Redis and checks if the limit is reached or not.
If the limit is reached, the request is rejected.
If the limit is not reached, the request is sent to API servers. Meanwhile, the system increments the counter and saves it back to Redis.