The computing field includes many examples of a host (master) controller controlling one or more peripheral (slave) devices via a serial data link. Some of the most familiar examples are links between “desk-scale” devices such as laptop computers, smartphones, tablets, printers, keyboards, mice, storage components, scanners, cameras, microphones, speakers, and the like using such serial protocols as Peripheral Component Interconnect Express (PCIe), Universal Serial Bus (USB), or Serial Advanced Technology Attachment (SATA). However, similar scenarios occur on the board scale (connections between chips on the same board) and on the chip scale (connections between different functional components fabricated on the same chip) using protocols such as Mobile Industry Processing Interface (MIPI). Such control connections may be important for satisfactory performance of system-on-chip (SoC) platforms.
Besides the physical connection (e.g., coaxial cable, copper wire, conductive trace, or optical waveguide), physical and logical network components at the nodes or termini (for example, the communication ports of the host and peripheral) are also considered part of the link. These physical and logical network components may include one or more “link-layer components” (e.g., bridges or switches) that implement one or more link protocols (sets of methods and standards for transmitting and receiving messages over the physical connection). Media access control (MAC) components are examples of link-layer components. Each link-layer component may be associated with a physical layer (e.g., PHY), a physical layer interface component between the link-layer component and the physical connection that may include a transmitter, receiver, or transceiver. The physical layer may encode data for transmission, decode data upon reception, and automatically negotiate data rates and other transmission parameters with its opposite number at the connected node, for example using a physical coding sublayer (PCS). The physical layer may also be responsible for controlling the timing of the transmission or reception of the individual bits of data and interacting with the physical connection in a way that takes account of its properties, for example using the physical medium-dependent sublayer (PMD). Occasionally, a peripheral connected to a host may malfunction or lose its synchronization with the host. Resulting operational errors may include attempting to send a message before completing the required handshake procedure; omitting an “end-transmission” signal such as End Of Burst (EOB), causing the communication line to remain open and the receiver to continue to wait for more data when in fact the message has been completely transmitted; premature waking from a suspended state (e.g., stall, sleep, or hibernate) in response to a false wake indication or a false “incoming burst” indication; or a handshake procedure that “hangs” or “freezes” when fabric error or addressing error causes incorrect memory access for the physical layer. The peripheral may be spontaneously disconnected from the host or experience a loss of power. The communication line may be left unterminated and “floating,” burdened with a high differential impedance on the signal paths, preventing the host or the peripheral from returning to a low-power state.
Often, such errors disrupt the functioning of the host substantially more than they disrupt the peripheral. Most typically, the disruption affects the receiver interface of the host's downstream-facing port. Even if the peripheral recovers without a reset, the host may not be able to, resulting in loss of the connection session. The errors can cause problems whether the host is in an operating state or in a suspended state. In some instances, both the host and the peripheral may be in the intended state, but a physical layer may be in an unintended state.
In some cases, uncorrected errors may trigger a cascade of other errors. A failed link may trigger failures at one or both endpoints, driver notification, or another kind of multilevel failure.
Previous recovery methods for disconnects or synchronization loss have involved recoding through the link drivers and other protocol-based or system-level approaches. These processes could take as long as 1-10 seconds to restore normal operation. If a link failure is allowed to become a multi-level failure before correction, recovery may take an especially long time.
Recovery time is part of an overall “protocol overhead” metric. Other contributors include the loss of context of the peripheral driver(s) and the power consumed by the recovery process. If timeouts are long enough and frequent enough, they may negatively impact the energy efficiency or computing power of the overall system as well as the user experience.
Therefore, a need exists for a recovery process with lower protocol overhead, particularly one that shortens the recovery time. The present disclosure addresses this need.