Modern application specific integrated circuits (ASICs) such as System-on-Chip (SoC) devices commonly need to operate at very high data rates. To achieve such high speed operation ASIC designs often include sophisticated hardware automation in addition to firmware running on a processor. One example of an ASIC with a high level of hardware automation is a memory controller in a non-volatile solid-state memory drive. In devices with a high level of hardware automation, errors in executing commands from an external host should be handled in a manner such that the command in error has limited to no impact on other commands. Common error handling schemes involve interrupts sent to the device's firmware and halting operation of the hardware block experiencing the error, or passing errors between hardware blocks, which results in significant complexity in the ASIC design. Such error handling schemes also create opportunities for one hardware block experiencing an error in a command to “back pressure” other hardware blocks involved in executing tasks associated with that same command. For example, if a hardware block halts operation because of a command that experienced an error, completion of all other commands that require a task to be completed by that hardware block will be delayed until the error is cleared, causing a latency spike.
Typically, each hardware block that experiences an error will be held in an “error state” until the error is cleared by the firmware. If two hardware blocks are in an error state at the same time, both of those hardware blocks will cause back pressure in the system, a multi-error corner case. Error handling schemes designed to deal with such corner cases add significant complexity to both the system's hardware and firmware. This complexity requires significant verification tests of the system's design before manufacturing, which can delay the time to market for the system.
A reset in an ASIC commonly involves aborting or erroring out one or more commands. A reset can occur in response to a command or signal from a host, a power loss, or a decision by the ASIC's firmware. A full system reset involves aborting all commands currently active in the ASIC, and lower level resets such a sub-system reset typically involve aborting a significant number of commands. For example, a reset of a virtual controller in a solid state storage drive may involve aborting all commands in one or more queues associated with that virtual controller, which can affect multiple hardware blocks simultaneously. Resetting a queue by aborting or erroring out all of the commands in that queue ideally should not interfere with the processing of other commands, but multiple hardware blocks handling multiple commands in error can cause back pressure in the system. Thus there is a long-felt need for an improved technique for reset and error handling in ASICs.