1. Field of the Invention
The present invention relates to a computer system using intelligent input-output, and more particularly, to a system and method for providing linearly scalable dynamic memory management in a multiprocessing system.
2. Description of Related Art
A conventional computer system typically includes one or more central processing units (CPUs) capable of executing various sequential sets of instructions, known as threads. Originally, a computer system included a single CPU capable of performing a single thread at a given time. Advances in operating systems have provided a technique for sharing a single CPU among multiple threads, known as multitasking. The development of multiprocessing brought computer systems with multiple CPUs, each executing a different thread at the same time.
There are many variations on the basic theme of multiprocessing. In general, the differences are related to how independently the various processors operate and how the workload among these processors is distributed. In loosely-coupled multiprocessing, the processors execute related threads, but, they do so as if they were stand-alone processors. Each processor may have its own memory and may even have its own mass storage. Further, each processor typically runs its own copy of an operating system, and communicates with the other processor or processors through a message-passing scheme, much like devices communicating over a local-area network. Loosely-coupled multiprocessing has been widely used in mainframes and minicomputers, but the software to do it is very closely tied to the hardware design. For this reason, it has not gained the support of software vendors, and is not widely used in PC servers.
In tightly-coupled multiprocessing, by contrast, the operations of the processors are more closely integrated. They typically share memory, and may even have a shared cache. The processors may not be identical to each other, and may or may not execute similar threads. However, they typically share other system resources such as mass storage and input/output (I/O). Moreover, instead of a separate copy of the operating system for each processor, they typically run a single copy, with the operating system handling the coordination of threads between the processors. The sharing of system resources makes tightly-coupled multiprocessing less expensive, and it is the dominant multiprocessor architecture in network servers.
Hardware architectures for tightly-coupled multiprocessing systems can be further divided into two broad categories. In symmetrical multiprocessor systems, system resources such as memory and disk input/output are shared by all the microprocessors in the system. The workload is distributed evenly to available processors so that one does not sit idle while another is loaded with a specific thread. The performance of SMP systems generally increases for all threads as more processor units are added.
An important goal in the design of multiprocessing systems is linear scalability. In a completely linearly scalable system, the performance of the system increases linearly with the addition of each CPU. The performance of the system is measured in the number of instructions that the system as a whole completes in a given time. However, in most multiprocessing systems, as the number of CPUs are increased, the performance gain realized by adding an additional CPU decreases and becomes negligible.
A common problem with multiprocessing occurs when more than one thread attempts to read or write to a common or shared memory. Those skilled in the art will recognize the data corruption that would occur if one thread were to read a set of memory locations while another thread were to write to the same set of memory locations. Common memory locations that are frequently accessed by various threads are the heap data structure and the free list. A heap is a portion of memory that is divided into smaller partitions. Each partition is allocatable on demand to store data for the need of particular threads. Once the data stored in the partition is no longer needed by the thread, the partition is returned to the heap. The heap data structure and the free list keep track of which partitions are allocated to the various threads, and which partitions are unallocated. When a thread is in need of memory, the heap data structure and free list are accessed to assign an unallocated partition of the heap to the thread. When the thread is no longer in need of the partition of memory, the partition of memory is returned to the heap. The heap data structure and free list are updated to reflect that the partition of memory is now unallocated.
The management of concurrent threads is performed by the operating system of the computer system which allocates various resources among various threads. The threads accessing the heap data structure and free list are synchronized by the operating system. In order to access the heap data structure and free list, a thread makes a call into the operating system. The actual access is performed at the operating system level. Consequently, by accessing heap data structure and free list at the operating system level, the accesses by each thread can be synchronized to prevent more than one thread from accessing the heap data structure and free list at the same time.
The operating system prevents simultaneous access to the heap data structure and free list by using spinlocks and interrupt masks. While accessing the heap data structure and free list through calls to the operating system prevents simultaneous access by the various threads, there are a number of associated drawbacks. The use of spinlocks and interrupt masking requires threads to wait while another thread is accessing the heap data structure or free list. Requiring threads to wait while another thread is accessing the heap data structure or free list substantially curtails the benefits of concurrent thread execution. As more CPUs are added, a bottleneck could potentially be created as each thread awaits access to the heap data structure and free list.
Another problem occurs because of the transition from the thread to the operating system. Normally, while a thread is being performed, the instructions of the thread are being executed, known as the application mode. When the thread makes a call to the operating system to access the heap data structure or free list, the access is performed at the operating system level, known as the kernel mode. Changing execution modes causes substantial time delays.
The present invention is directed to a system and method for dynamically managing memory in a computer system by executing an instruction within an application program causing the application program to access a heap data structure and a free list containing the addresses of unallocated regions of memory, determining the address of an appropriately sized region of memory, and allocating the region of memory to the application program.
The present invention is also directed to a method for dynamically deallocating memory in a computer system by causing an application program to place the address of a region of memory in a free list, and modifying an entry in the heap data structure.