Non-volatile memory systems, such as flash memory, have been widely adopted for use in consumer products. Flash memory may be found in different forms, for example in the form of a portable memory card that can be carried between host devices or as a solid state disk (“SSD”) embedded in a host device. The SSD may be throttled for various reasons. The purpose of throttling may be to limit power consumption, monitor/control temperature, extend the memory endurance, or achieve more consistent memory performance. Accordingly, performance variations, high temperatures, or power overages may be a reason to throttle. The throttling may include slower command handling, extra command handling, or reducing endurance capabilities of the SSD. The SSD controller receives commands, such as read commands or write/program commands, from a host. There may be a command queue for when the controller has commands waiting for execution. However, when the queue is saturated with too many commands (e.g. during throttling), performance of the SSD may suffer because the commands are not executed promptly.
Merely reducing the number of commands to the memory device may be one way to throttle the SSD. In particular, the number of commands provided to the back end that executes the command may be reduced, which reduces total through-put. However, that prevents the memory device from performing as quickly and efficiently as possible. Both the number and rate of commands passed to the backend processor may be limited. Choking of the queue depth (“QD”) revealed to the back end may have several adverse consequences. First, the stalling of commands within the host processor decreases bandwidth and throughput, which means fewer commands are sent in a continual basis to the memory. The reduced throughput of the memory device may reduce the resulting temperature and power, but the reduced queue depth available for examination may result in non-optimal decisions and increased outliers affecting command quality of service (QoS). This also does not extend to beginning of life (“BOL”) performance variation. The BOL performance variation may be due to the variation in bad blocks from drive to drive such that some drives may have fewer bad blocks than others. Therefore, a drive with an unusually poor bad block distribution may have lower over provisioning, higher write amp, and lower performance. Throttling the performance of the good drives may be necessary to produce drives with minimal performance variation. Throttling by choking the queue depth to the back end means that the commands stalled in the front end receive a hit to their latencies. This may be a direct impact on Quality of Service (“QoS”) which may result in unacceptable performance standards. The BOL performance variation may require equal QoS and bandwidth on all metrics.