1. Field of the Present Invention
The present invention generally relates to the field of data processing systems and more particularly to a non-uniform memory architecture (NUMA) system in which main memory data is backed-up on one or more other nodes of the NUMA system using RAID-like techniques to improve fault tolerance and reduce the amount of time spent storing data to permanent storage.
2. History of Related Art
In the field of microprocessor based data processing systems, the use of multiple processors to improve the performance of a computer system is well known. In a typical multi-processor arrangement commonly referred to as a symmetric multi-processor (SMP) system, a set of processors access a system memory via a shared bus referred to herein as the system or local bus. The use of a shared bus presents a scalability limitation. More specifically, the shared bus architecture ultimately limits the ability to improve performance by connecting additional processors to the system bus, after a certain point, the limiting factor in the performance of a multiprocessor system is the bandwidth of the system bus. Roughly speaking, the system bus bandwidth is typically saturated after four processors have been attached to the bus. Incorporating additional processors beyond four generally results in little, if any, performance improvement.
To combat the bandwidth limitations of shared bus systems, distributed memory systems, in which two or more SMP systems (referred to as nodes) are connected to form a larger system, have been proposed and implemented. One example of such a system is referred to as a non-uniform memory architecture (NUMA) system. A NUMA system is comprised of multiple nodes, each of which may include its own processors, local memory, and corresponding system bus. The memory local to each node is accessible to the other nodes via an interconnect network (referred to herein as the NUMA fabric) that links the various nodes. The use of multiple system busses (one for each node) enables NUMA systems to employ additional processors without incurring the system bus bandwidth limitation experienced by single bus systems.
For many data processing applications, reliably maintaining the application""s data is of paramount importance. The reliability of data is conventionally maintained by periodically backing up the data in main memory data to persistent or non-volatile memory. Referring to FIG. 1, a data processing system 100 is illustrated in block diagram format. Data processing system 100 may include one or more nodes 102. Each node 102 includes one or more processors 104 that access a local memory 108 via a memory controller 106. A cache memory (not explicitly shown in FIG. 1) may reside between a processor and the memory controller). Nodes 102 may share a common, persistent mass storage device or devices identified in FIG. 1 as disk 112. If multiple disks are used, they may be arranged as a redundant array of inexpensive disks (RAID) to assure high availability of the data. RAID designs are described in Source, which is incorporated by reference herein.
Local memory 108 is typically implemented with dynamic random access memory (DRAM) that is susceptible to power loss, but has a significantly faster access time than disk 112. The application data stored in local memory 108 is periodically written back to disk 112 to protect against data loss from an unexpected event such as a power outage or node crash. The frequency with which data in local memory 108 is written back to disk 112 is a function the particular application and the rate at which data accumulates in local memory 108. Data intensive applications may require frequent disk backups to guard against loss of a large amount of data. The time required to write data to or retrieve data from disk 112 (the disk access time) is characteristically orders of magnitude greater than the access time of RAM 108. Application performance may, therefore, suffer in data intensive applications requiring frequent disk backup. It would be highly desirable, therefore, to implement a system in which data is maintained with sufficient reliability in a high-speed memory to enable less frequent disk backup thereby enhancing system performance.
The problem identified above is addressed by a method and system for managing data in a data processing system as disclosed herein. Initially, data is stored in a first portion of the main memory of the system. Responsive to storing the data in the first portion of main memory, information is then stored in a second portion of the main memory. The information stored in the second portion of main memory is indicative of the data stored in the first portion. In an embodiment in which the data processing system is implemented as a multi-node system such as a NUMA system, the first portion of the main memory is in the main memory of a first node of system and the second portion of the main memory is in the main memory of a second node of the system. In one embodiment, storing information in the second portion of the main memory is achieved by storing a copy of the data in the second portion. If a fault in the first portion of the main memory is detected, the information in the second main memory portion is retrieved and stored to a persistent storage device. In another embodiment, storing information in the second portion of the main memory includes calculating a value based on the corresponding contents of other portions of the main memory using an algorithm such as checksum, parity, or ECC, and storing the calculated value in the second portion. In one embodiment, the main memory of at least one of the nodes is connectable to a persistent source of power, such as a battery, such that the main memory contents may be preserved if system power is disabled.