The present invention relates generally to the field of testing computer system components, and more specifically, to a system and method for testing memory in a computer system while an operating system is active.
Conventional schemes for performing diagnostic tests on computer memory systems are well known. The importance of testing a computer's memory cannot be overemphasized. Costs and defects associated with memory defects are relatively high due to user downtime and loss of information. Many software applications are memory intensive and it is important to ensure that defective memory which impacts the performance of such software applications are detected and removed.
While most software applications are stored on the hard disk, an application must be loaded into memory first in order to execute the application. Initially, the operating system assigns a memory block (e.g. 64 kb) for the application. The operating system then copies the application from the hard disk to the allocated memory block. The application is thereafter executed within the allocated memory. While the application is active, it may require more memory blocks, which it will allocate from the operating system by calling a memory allocation routine that is part of the operating system; and from time to time the application releases the memory blocks thus allocated using a memory deallocation routine that is part of the operating system. Thereafter, upon completion of the application, the operating system relinquishes the memory block that it initially allocated for the application, and any memory blocks that the application might have allocated while it was active, for use by other applications. This allocation/deallocation of memory is constantly occurring within the computer system.
Operating systems typically allocate memory at two levels. At a lower level the operating system allocates physical memory, while virtual memory is allocated at a higher level. Further, in today's faster and efficient processors, operating systems are capable of multitasking and can execute multiple programs simultaneously. This results in increased memory requirements which are typically more than the available amount of physical memory. Consequently, virtual memory is employed to ensure that adequate memory is available. Virtual memory is a memory allocation scheme in which a computer with a lesser amount of physical memory appears as if it had a much higher amount of memory. Typically, when applications allocate memory, the operating system automatically allocates virtual memory. When an application starts to read or write to virtual memory, the operating system and the CPU detect the read or write operation and automatically verify if physical memory has been allocated to the virtual memory address that the application is accessing, and if not, the operating system allocates physical memory to the virtual memory address that is being accessed. Although an application may be allocated a large amount of virtual memory with a large number of virtual memory addresses, those addresses are mapped to physical memory only as needed. Allocation of physical memory is done in an allocation unit known as pages. The size of a physical memory page can vary with the capabilities of the CPU. In this manner, the memory allocation/deallocation routine of the operating system ensures that large memory requirements for software applications are met.
As noted, conventional schemes for performing diagnostic tests on memory in a computer system are well known. One example of such a scheme is a POST (power on self test) computer program embedded in system memory. Because POST executes every time a computer is booted on, there is a desire to minimize the time that a user has to spend waiting for the computer to boot. Therefore, POST runs quick diagnostic tests on the computer memory as well as other system components. Rather than limit testing to system memory, POST typically tests all of the components within the computer system. Moreover, the sophistication of POST is limited in order to effectively reduce the booting period when the computer system is turned on. In any event, POST is limited to the booting process and failures occurring while the operating system is active are not detected.
Other schemes include various diagnostic programs typically stored on media such as the computer's hard disk drive or floppy disk. Such diagnostic programs are commercially available for purchase by users, and are employed to detect faults related to computer components, such as memory, video, optical storage, hard disk drive, serial ports and virtual memory. In some instances, the user can select which components on which diagnostic programs should be performed. Typically, diagnostics programs test memory by writing specific data patterns to memory and then reading back these patterns for verification. That is, a deviation from the expected data pattern indicates the portion of memory as being defective.
Disadvantageously, if processes are running which occupy portions of system memory, many diagnostic programs cannot test the occupied portions. Attempts to access these portions of memory will result in a system crash. Further, diagnostic programs are typically complicated and cannot be run by a computer novice.
Another disadvantage relates to the length and complexity of diagnostic programs which have significant processor performance and memory requirements. In addition, many diagnostic programs will have a significant impact on system performance, and thus it is not advantageous to run these diagnostic programs in anticipation of failures, but rather only after it appears that a failure has occurred and only the exact cause of the failure remains to be determined.
Further, the accuracy of the results of some diagnostic programs are somewhat doubtful because the diagnostic programs when executed do not simulate the full range of operating environments in which the computers are employed. A further disadvantage of such diagnostic programs is that memory failures are not automatically detected. In fact, by the time a system failure occurs, data loss has already taken place and it is almost always too late to manually execute a diagnostic program to prevent loss of valuable information.
Moreover, computers traditionally rely on hardware components such as parity error checking and Error Checking and Correcting (ECC) mechanisms to monitor system memory for errors while the system is running. These solutions increase system cost, as they require extra hardware components, and extra memory to store error detection information.
Therefore, it would be desireable to provide a system and method which is capable of resolving the aforementioned problems relating to the conventional approaches for performing diagnostic tests on system memory.