# Rate Limits Rate limits provide control over how frequently users and applications can call Workplace AIs APIs. It helps protect the application from accidental overload and excessive automated usage while maintaining a fair experience for all users. This capability is implemented in application middleware using the Token Bucket algorithm and works consistently across all deployment models. Information around this type of rate limiting, and rate limiting in general, can be found [here](https://smudge.ai/blog/ratelimit-algorithms). {% hint style="warning" %} These controls **do not** replace a dedicated gateway or edge protection (DoS/DDoS). They operate **after** a request has already reached the application. {% endhint %} ### Configuration Rate Limiting is optional and are disabled by default. It can be enabled and configured for the following endpoints: * Search - User requests * Search - Programmatic requests made via an API key * Control Hub - User requests * Data Science API * OData API For instance, this allows you to rate limit Data Science and OData requests, without impacting the user experience. For each endpoint, you can choose to enable or disable the rate limiter. If enabled, you can configure: * **Token limit**: This defines the maximum number of requests a user or application can issue in a quick burst. * **Replenishment period**: How often new tokens are added to each bucket. * **Tokens per period**: The number of tokens added in each replenishment period. * **Queue limit**: The number of requests to queue before an error is thrown, requests will be queued until a token is available. If more requests are issued beyond this capacity, they will fail as described below. {% hint style="info" %} For programmatic access, this is recommended to set Queue Limit to 0 so the application can take immediate action. {% endhint %}

### Impact If the rate limiter is triggered, both the triggering and any subsequent requests will fail with a [429 "Too many requests" status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429) until more tokens are available.

The `retry-after` header will be set to the replenishment period. ### Logging When the rate limiter is triggered, the following warning will be logged: `Rate Limiter rejected request for {partition} to {http-method} {http-path}` * `partition` is either the username (authenticated) or originating hostname (unauthenticated) that initiated the request * `http-method` is GET/POST/etc * `http-path` is the endpoint. Query string parameters and POST bodies are not logged. ### Other considerations * The rate limits are local to each API instance and only held in memory, this means that: * Restarting an endpoint (IIS reset, restarting the Application Pool, restarting the API processes, etc.) will reset the rate limiters. * In environments with load balancers, each instance of an endpoint will have its own limiter for each user - for consistency, sticky sessions are needed at the load balancer / application gateway. * When using a load balancer, the source header must be set correctly otherwise all unauthenticated requests will be allocated to the load balancer, not the originating machine.