A parallel computer system includes, for example, a plurality of information processing apparatuses (hereinafter, also referred to as nodes) which process data. The plurality of information processing apparatuses of the parallel computer system are connected to each other through a transmission path (hereinafter, also referred to as a lane). For example, the information processing apparatus using the Ethernet (trademark) of 100 Gbps which is standardized in the IEEE802.3ba standard transfers data to an information processing apparatus which is a communication destination by using a plurality of lanes. Hereinafter, the Ethernet of 100 Gbps is also referred to as 100 Gb Ethernet. For example, in the 100 Gb Ethernet, a link is realized by the plurality of lanes.
For example, when optical transmission is applied to a link in which nodes of the parallel computer system are connected to each other by the plurality of lanes, an optical module including a light emitting element which converts an electrical signal into an optical signal, a light receiving element which converts an optical signal into an electrical signal, and the like is used. A failure rate of the optical module such as a light emitting element is higher than that of an electrical component. For example, when one of the plurality of lanes has a defect due to the failure of the optical module, or the like, the link between the nodes is cut. In this case, it is not possible to perform processing such as parallel computation using a node including a defective component (for example, a defective optical module). Thus, the reliability of the parallel computer system decreases in association with an increase in the failure rate of the lane.
In other words, if the connection of the link is maintained by realizing degeneration of a defect lane (hereinafter, also referred to as a defective lane), the reliability of the parallel computer system is improved. A physical layer (hereinafter, also referred to as a physical layer PHY) of the IEEE802.3ba standard is not provided with a function of specifying a defective lane and a function of realizing lane degeneration.
Accordingly, a physical layer architecture capable of maintaining a link by avoiding a failure occurring in some lanes is proposed (for example, Akihiro Kanbe, Masashi Kono, Hidehiro Toyoda, “Lane Degeneration Technology for 100 Gbit Ethernet”, IEICE technical report, CS2010-39, pp. 13-18, November 2010). For example, in order to realize lane degeneration, a function of embedding lane switching control information in an alignment marker insertion and extraction protocol of 100 Gb Ethernet standard specification is added. Failure of each lane is detected by monitoring, for example, a header of 2 bits of a 64B/66B code block.
In addition, as a method of specifying a transmission path in a failure state from a plurality of transmission paths, a method is proposed of generating a fixed data pattern for specifying a lane in which a failure occurs and of transmitting the generated fixed data pattern to an apparatus which is a connection destination (for example, Japanese Laid-open Patent Publication No. 2006-186527). For example, the apparatus having received the fixed data pattern specifies a transmission path in a failure state from the received fixed data pattern.
In an information processing apparatus assuming use of a physical layer based on the IEEE802.3ba standard, a physical layer that does not have a function of specifying a defective lane is used, and thus it is difficult to specify a defective lane. In a method of adding a function of embedding lane switching control information in an alignment marker insertion and extraction protocol, and the like, a physical layer protocol is improved, and thus there is a concern that the amount of changes from the physical layer of the IEEE802.3ba standard may increase. When the amount of changes from the standard specification increases, there is a concern that versatility may be decreased.
In one aspect, an information processing apparatus, a parallel computer system, and a method of controlling the parallel computer system of the present disclosure specify a degenerating lane even when a physical layer that does not have a function of specifying a defective lane is used.