In 1981, International Business Machines (IBM) introduced the IBM PC. Originally, PC was an acronym for “personal computer”. Other personal computers were, or are, the Apple II, the MITS Altair, and the Macintosh. The PC architecture, however, was unique because it was not proprietary. As such, the PC architecture became an unintentional standard. Any company could design and build a PC compatible computer. A computer was PC compatible if it could use the same parts and run the same software as other PC compatible computers.
From 1981 until 1987, IBM controlled the PC architecture. Other companies, such as Compaq, built computers that were compatible with those that IBM designed and released. In 1987, IBM introduced the PS/2 line of personal computers that did not conform to the PC architecture that IBM itself had created. As such, in 1987 IBM lost control of the PC architecture. Until that time, an “IBM compatible” meant that it used a standard PC architecture. After that time, a computer using the PC architecture was simply called a “PC” or “PC compatible”.
The reason PCs were so popular is that many manufacturers competed to make parts that could be assembled to produce computers that, more likely than not, would run Microsoft's Windows™ or MSDOS™ operating systems as well as a plethora of useful programs. The PC architecture allowed the hardware manufacturers to build computers while the software companies produced software. The separation of roles helped produce the PC revolution.
The PC architecture has evolved since 1981 with different manufacturers introducing variations on the basic design. The market did not accept some of the variations and they disappeared. The market accepted other variations and they flourished until supplanted. No single manufacturer or standards body has successfully and continuously controlled the PC architecture. It is constantly evolving with different groups competing to introduce variations and the market choosing the winners.
Microsoft and Intel collaborated to document the PC architecture and to drive it forward. The result was a series of design guides beginning with “Hardware Design Guide for Microsoft Windows 95”. The “PC 2001 System Design Guide” is the last one of the series. Since then, the series has been supplanted by the Windows Logo certification program in which Microsoft documents how Windows™ and other programs can successfully interact with the evolving PC architecture. Due to their market dominance, Microsoft and Intel have been fairly successful in defining and guiding the PC architecture.
Just as IBM lost control of the PC architecture in 1987 because PC manufacturers did not produce PS/2 compatible computers, Microsoft and Intel may also be losing control. The reason is that Microsoft is meeting credible competition for open source operating systems and applications while Intel has lost significant market share to other manufacturers.
Regardless of who or what controls it, all PC architecture computers are direct descendants of the 1981 IBM PC. As such, they have certain things in common. They all have central processing units (CPUs) with instruction sets derived from the Intel 8088, which was the IBM PC's CPU. If there is a first line printer port (LPT1) it is located at IO address 0x378. A second line printer port (LPT2) is at 0x278. The line printer ports are often called legacy ports because certain “legacy free” PCs do not have them. Regardless, the defining characteristic of the PC and the PC architecture is the clear evolutionary path from the original IBM PC. This evolutionary path is not shared by any other computer architecture.
FIG. 4, labeled as “prior art”, illustrates a high level block diagram of a PC compatible motherboard, meaning it is based on the PC architecture, in accordance with aspects of the embodiments. The components on the motherboard are divided into two groups, those that require high bandwidth communications and those that do not. The north bridge 102 coordinates communications between high bandwidth components such as the CPU 101, the system memory 103, the video subsystem 405, the Ethernet controller 406, and the south bridge 104. The south bridge 104 controls communications between low bandwidth devices and the north bridge 102. Some of the low bandwidth devices are the BIOS 105, LPT1 108, LPT2 409, the first serial port 410, the second serial port 411, the floppy drive 412, the PCI bus 413, the keyboard controller 414, the mouse controller 415, the firewire controller 416, the EIDE controller 417, the ATA controller 418, the USB controller 419, and the PCI express controller 420. Many of the low bandwidth devices are actually contained within the same computer chip as the south bridge 104.
In this document, a PC motherboard's LPT ports, serial ports, USB ports, firewire ports, and Ethernet port are all considered communications ports whereas its other components are not.
It is interesting to note that some components that were considered high bandwidth in the 1981 IBM PC architecture are now considered low bandwidth devices. For example, the 1981 IBM PC had an ISA bus that was considered quite fast. The ISA bus is no longer present, but the much faster PCI bus is. The PC architecture is currently evolving from the PCI bus to the much faster PCI Express bus. In 1981, the ISA bus was high bandwidth. In 2005, the PCI express bus is not. As the PC architecture continues to evolve, other buses and components will appear, disappear, and shift around.
FIG. 5, labeled as “prior art”, illustrates memory and memory slots in accordance with aspects of the embodiments. A system memory 103 is shown in FIG. 4. The system memory 103 is made of one or more memories. A first memory 501, second memory 502, and third memory 503 are shown in FIG. 5. The first memory 501 is held in a first memory slot 117. The second memory 502 is held in a second memory slot 118. The third memory 503 is held in a third memory slot 119. The three memories 501, 502, 503 combined are the system memory. The three slots 117, 118, 119 are used to hold the memories 501, 502, 503 and to electrically connect them to the motherboard.
Those skilled in the arts of PC hardware maintenance, PC design, or PC manufacture are familiar with all the aspects of the PC architecture discussed herein. In particular, they are familiar with PC motherboards, PC motherboard components, PC memory, and PC memory slots.
When a PC boots, it accesses instructions in the BIOS and executes them. The instructions in the BIOS can test the PC hardware, prepare it to run the operating system, and then run the operating system. Occasionally, the BIOS tests discover malfunctioning hardware, such as a bad memory. When a hardware malfunction is serious enough, many BIOS implementations cause the computer to emit a series of sounds called beep codes. A technician can listen to the beep codes and diagnose the malfunction.
A PC memory can fail in a variety of ways. Three of the ways are: a misprogrammed serial presence detect (SPD) failure, a data strobe (DQS) calibration failure, and a retention failure.
Each dual in-line memory module (DIMM), or similar memory used in a PC architecture, has a non-volatile memory containing the information necessary for the BIOS to program the North Bridge memory controller. The information, which can include size, speed, number of columns, rows, ranks, banks, manufacturer, and serial number is vital. An SPD misprogramming failure occurs when a manufacturer forgets to store information in the SPD or stores the wrong information in the SPD. Such failures prevent the DIMM from being usable at all.
When a memory controller wants to read data from a DIMM, it sends a read command, delays a calibrated amount of time, and then latches the received data. The calibrated amount of time is the time for the read command to travel from the memory controller to the DIMM plus the time for the DIMM to fetch the data plus the time for the data to travel from the DIMM to the memory controller. These timings are so critical that the Northbridge manufacturer will usually specify down to the board level how the board needs to be designed. The specification includes printed circuit trace lengths and similar low level and precise details. As with all manufacturing processes, some variation is expected. The memory controller uses a data strobe, often carried by a wire with a DQS label, to control the precise timing for latching the data. The amount of time between receiving the data strobe and latching the data is set by a process called DQS calibration. DQS calibration ensures that the data is latched at the center of the data window. DQS calibration, however, can only compensate for a certain range of timing variation. If the amount of time required by a particular DIMM is outside the DQS calibration window then the DIMM is unusable. This is called a DQS calibration failure.
A retention failure occurs when a memory can not retain data for a long enough time period. When data is written to a memory and then immediately read back from the memory, the data read is usually equal to the data written. Over time, the data stored in any memory becomes corrupted. Under current architectural guidelines, a PC memory that cannot store data for more than a few milliseconds is deemed to suffer a retention failure.
The reason beep codes are used is that they can be used when nearly every PC component is failing or cannot be used. For example, malfunctioning system memory prevents the computer from accessing the Ethernet port, the video subsystem, and other PC components. Beep codes allow a PC to report diagnostic results even if a PC computer has no system memory at all.
The problem with beep codes is that many PCs are installed in noisy environments and some do not have speakers. For example, compute clusters can have many PCs installed in close proximity. Compute cluster environments are often very noisy and crowded. Many compute clusters lack speakers because the speakers can make even more noise and because the speakers often get in the way.
Another problem with beep codes is that upon encountering a bad memory, they simply report that the system memory is bad. Recalling FIG. 5, any one of the three memories 501, 502, 503 can be bad. A technician can diagnose which memory is bad by pulling out memories, swapping them in an out of the slots 117, 118, 119, and observing the boot up sequence. This can take considerable time and break additional hardware through handling. Handling damage can be aggravated when the technician is working in a cramped and unstable position which often happens because the diagnosis must often be performed in place. Another option is for the technician to simply replace all the memory and send the suspect memory elsewhere for diagnosis and testing. This option also involves unnecessary effort, logistics problems, and arguments if the diagnosis that takes place elsewhere does not match the environment in which the memory failed.