Hardware devices operating in a computing environment must properly communicate to achieve desired results. Well known levels of topography may exist in a serial attached small computer system interface (SA-SCSI) or SAS architecture. At the controller level, operational control exists for devices operating at the device or expander level. A physical device which acts to connect the two (controller connecting to expander) hardware communication points of such hardware devices may be commonly referred to as a physical link or PHY. A single PHY may connect via a port to allow communication between controller and expander. Alternatively, multiple PHYs may be grouped through a single port where the port is then known as a wide port. This wide port architecture may allow a controller to communicate to multiple expanders or devices. For example, multiple PHYs may communicate through a single wide port to connect a controller with multiple expanders. A PHY at the controller level connected to a PHY at the expander level may form a PHY pair. Over time, one or more PHY pairs may become unreliable resulting in degraded communication between the controller and device. Such unreliability may occur due to a damaged cable, abnormal operating temperature, electromagnetic interference (EMI), degraded transceivers and the like. While the wide port in general may be operational, any input/output (I/O) transiting the unreliable PHY pair may suffer error. This I/O error may lead to significant unstable activity such as frequent buffer flushes and retries. Further, this unstable activity may manifest itself in large I/O data transfers having only a portion of data going through the unreliable PHY pair.
PHY pairs may, as connection points, maintain a controller side and an expander side. The controller side may refer to the dominant side of the communication link while the expander side may refer to the side of the PHY pair of lesser dominance or as being in a controlled state. Preservation of bandwidth between controller and expander may be a common goal to increase speed of data travel between the PHY pair. Consequently, the use of an alternate communication path to determine PHY status may be desired.
Previous attempts at isolation of an unreliable PHY pair have focused on the expander side of the PHY pair. U.S. Pat. No. 7,738,366 to Uddenberg, et al. discloses disabling a PHY pair in an expander port based on error reading on the expander side itself. The intelligence exists on the expander side as the decision of whether or when to disable an expander port is made on the expander. While the Uddenberg patent may disable a PHY, it does not disclose criteria used to make the disabling decision. Further, Uddenberg's use of intelligence existing on the expander side may create conflict between multiple expander devices attempting to disable a single or multiple PHYs connecting through the same wide port. Such conflict within a wide port may cause additional errors.
Similarly, U.S. Pat. No. 7,912,995 to Long, et al., discloses intelligence on the SAS device side to determine a probationary state given a PHY as PHY errors reach a threshold. Again, decisions made on the expander/device side of the PHY pair may lead to additional errors in the wide port overall. Further, Long's disclosure is limited to partial depowering of a PHY pair as indicated by the ability of the PHY pair to sense an unplugged cable.
With no existing solution, reliability of the I/O transiting the erroneous PHY pair may be impaired. This impairment may trigger various faults such as random I/O timeout, retries of I/O transmission, and overall performance loss due to buffer flushes and retries of the I/O.
Therefore, it would be advantageous if a method and system existed providing for controller level detection and correction of a degraded I/O signal transmitted/received between a PHY pair.