Predictable performance is often an important design goal in several cloud and datacenter services, including search engines, data analytics, machine learning, and social media. Each of these services tend to be extremely latency sensitive and generally operate under strict service level agreements (SLAs). Specifically, coarse grain metrics like average response time are often not representative of overall performance and worst case latencies are frequently much more of a concern. Variability of response times causes high tail latency in components of a service, leading to violation of SLAs and more importantly leading to longer response time for users. Tail latency is the latency experienced by some but very few operations. The longest latency defines, for each service, the end of its tail.
Flash or solid state memories then to have quicker response times than traditional memory devices. However, because flash memories are generally derived from electrically erasable programmable read-only memory (EEPROM) technology, their memory cells generally have to be erased before they can be written or re-written to (i.e. flash is not generally an update-in-place technology). This causes irregularities in flash performance as externally initiated operation (e.g., reads, writes) may occur when an internally initiated operation (e.g., an erase operation, move operation, garbage collection, etc.) is occurring. This may cause the externally initiated operation to stall as the maintenance-based operation is being performed. Often these maintenance operations (specifically the erase operation) tend to be very slow (comparatively), exacerbating any wait or delay.
Currently, replication is frequently employed to deal with tail latency inconsistencies. The same memory access may be issued to multiple storage devices, wherein each storage device is often a mirror of each other. Frequently, whatever device returns the first result (e.g., because it is on a different internal maintenance schedule) is the device whose result is used. The results from the other devices are discarded, as no longer important. This generally involves more servers and bandwidth, and is generally wasteful and expensive. Further, the software (e.g., operating system, drivers, etc.) must be complex enough to handle the parallel nature of the replicated scheme. It may be desirable to alter the technology to allow for more consistent and predictable performance.