With the development in server virtualization and cloud computing, there has been a demand for information processing devices installable with a large-capacity memory. In many of the information processing devices of recent years, in order to increase the memory capacity, slots are provided for installing a large number of dual inline memory modules (DIMMs). There are some information processing devices in which as many as 48 DIMMs can be installed on a single system board.
In an information processing device having a large number of DIMM slots, the DIMMs are installed in predetermined slots according to a DIMM installation candidate pattern, in which whether or not installation is possible is defined on a slot-by-slot basis. Depending on the combinations of the number of central processing units (CPUs), DIMM types, and DIMM operation modes in an information processing device; there are a number of DIMM installation candidate patterns available.
At the time of installing DIMMs in a decided pattern, following issues may arise. Primarily, a DIMM may get installed in an incorrect slot position, which may lead to incorrect installation of a DIMM in a no-installation-allowed slot. If there is such incorrect installation of DIMMs, a power-on self-test (POST) processing unit of the information processing device may detect a defect during DIMM diagnosis and bring the information processing device to a halt. The POST represents a hardware diagnosis test performed when an information processing device is powered ON and when hardware resetting is performed. The test program for implementing the POST is recorded in a basic input/output system read only memory (BIOSROM). Immediately after powering ON the device, the test program is executed so that the POST is performed.
Secondly, it is also possible to think of a case of installation inadequacy in which DIMMs are not installed in some of the decided slots. That includes a scenario of forgetting the installation and a scenario in which incomplete DIMM installation leads to poor connection thereby resulting in the appearance of no installation. In an identical manner to the case of incorrect installation, in the case of installation inadequacy too, the POST processing unit of the information processing device may detect a defect during DIMM diagnosis and bring the information processing device to a halt.
Meanwhile, conventional technologies are available for obtaining DIMM-related information. For example, a conventional technology is known in which the information stored in a serial presence detect (SPD), which is mounted in each DIMM, is read using a dedicated hardware interface and a dedicated system program; and DIMM-related information is checked. Herein, the SPD represents a type of a ROM chip mounted in a DIMM, and is used to store DIMM specification information such as the capacity, the maximum clock count, and the signal type of the DIMM. The information processing device decides on a DIMM control program according to the information obtained from the SPDs.
Moreover, a conventional technology is known in which a service processor reads the setting of configuration information of a processor, a memory device, and an input-output device; and displays the state of each device.
Japanese Laid-open Patent Publication No. 2002-259227
Japanese Laid-open Patent Publication No. 2003-029998
In the conventional technology for reading information from SPDs, a defect in an SPD can be detected using an application specific integrated circuit (ASIC). However, it is difficult to detect non-insertion or erroneous insertion of DIMMs. In the conventional technology for reading the setting using a service processor, the information on the memory is obtained from SPDs, and it is difficult to detect non-insertion or erroneous insertion of DIMMs.
In this way, in the conventional technology, a defect attributed to erroneous insertion or non-insertion of DIMMs is difficult to detect before booting the device. For that reason, the operator boots the device without noticing such a defect, which leads to malfunctioning of the device followed by a system crash. When a device stops working, it is common practice to perform the following three steps in order to restore the system. To begin with, using the displayed malfunction notification as a clue, the factors causing the failure in the device are investigated. Then, if a factor could be identified, the operator eliminates that factor. Lastly, the operator reboots the device and restores the system. In this way, since it is difficult to detect a defect such as erroneous insertion or non-insertion of DIMMs before booting the device, it leads to stopping the device and rebooting the device. That results in an increase in the man-hours of the operator while building the system and operating and maintaining the memory.