When handling large numbers of requests from requesting entities such as clients, data services need to provide enough capacity to handle peak demands. One way that the capacity is typically increased is by caching data in relatively fast memory so that servers often need not access the underlying physical data sources (e.g., data stores and/or other databases) to respond to requests.
Caching has its limitations, however, including that cache misses can often occur. Thus, data requests that result in cache misses need to be handled below the data caching level, by sending the request down to the underlying (e.g., physical) data store level. Further, cached data is associated with an expiration (e.g., timestamp or time-to-live, or TTL) value, and thus expire, whereby requests for data that are expired in a cache similarly need to be handled below the data caching level.
When there is a spike in the number of demands, the data service may fail to keep up. To avoid failing, a typical solution is to add capacity at a level below the caching level, e.g., add larger and/or more data stores/databases operating in parallel. However, adding such additional capacity is expensive.