1. Technical Field
The present invention relates generally to computer systems and in particular to memory allocation in a computer system. Still more particularly, the present invention relates to a method, system and computer program product for proving improved DMA mapping.
2. Description of the Related Art
Computer systems comprise a limited physical memory resource that is dynamically allocated to executing applications and input/output (IO) devices (or associated adapters) on request. Memory is accessed via a virtual address translated into a real (or physical) address that corresponds to the physical location within the memory. One method of completing these allocations and/or accesses to memory address space is via a direct memory access (DMA) operation issued from an IO adapter.
In many of today's computer systems, the system's physical memory address space is typically greater than the IO address space. With these computer systems, in order for the IO adapter(s) to access the entire system physical memory, some translation mechanisms is required. For example, a 32 bit IO address subsystem requires some kind of memory mapping to allow the IO adapter to access system memory addresses that are greater than 4 GB. Currently, most Operating Systems (OSes) set the maximum page size (in memory) to 4 Kbytes (4K), and thus each mapping page is 4 Kbytes. Table 1 below illustrates an example of an address mapping table, which shows the translation between system memory address and IO DMA (direct memory access) address for a given 4K page base address.
TABLE ISystem Memory AddressIO DMA address9000000E 00120000F10000009000000E 00221000F1001000......9000000E 01010000F10AF0009000000E 21002100F11B0000......
To satisfy new requirements of high performance IO adapters, the data buffer mapping size needs to be greater than 4 Kbytes, particularly to take advantage of Ethernet jumbo frame and large TCP segmentation offload (TSO) send, for example. To enable support of this larger data buffer mapping size, the developers of the OSes have improved the OS' mapping methods to allow the address mapping of more than 4 Kbytes of continuous IO DMA address.
FIG. 1 is a flow chart illustrating the prior art methods by which the device driver maps a system physical address to an IO DMA address. The illustrated method describes the AIX OS function. However, the presented flow chart is provided as one example and may apply to other OSes as well.
As shown, the process begins at block 102 at which the computer system (or IO adapter) is initialized. During initializing of the IO adapter, the device driver makes a system call to register the size of the IO address space the driver needs for the operation, as shown at block 104. D_MAP_INIT is an example of this system call. Following, at block 106, the device driver calls the memory allocation routine to allocate system memory (buffer). Then, the device driver calls the system mapping routine to map the system memory to an IO DMA address at block 108. An example of this system call is D_MAP_PAGE/D_MAP_LIST.
Once this call is made, the device driver monitors when the IO DMA address is no longer needed, as indicated at decision block 110. If the IO DMA address is still needed, then the adapter maintains the space, as shown at block 116. However, when the IO DMA address is no longer needed, the device driver calls the kernel unmap routines to return the IO DMA address back to the kernel, as shown at block 112. Then, the device driver/OS frees the allocated memory back to the kernel, as provided at block 114. Examples of these system calls that complete the return of the allocated memory back to the kernel are D_UNMAP_PAGE/D_UNMAP_LIST and D_MAP_CLEAR.
As more and more adapters request memory address space, over time, the IO DMA address space becomes more and more fragmented. When this fragmentation surpasses a threshold point, as multiple portions of the large IO DMA space are assigned to DMA requests, the contiguity of available space decreases, and contiguous space becomes more difficult to find for assigning to new DMA requests.
Additionally, as the level of fragmentation increases, the latency in obtaining an assignment of an IO DMA address (for a contiguous address space) from the OS increases as well. This increase latency may cause measurable delays in processing and thus have a substantial negative impact on the overall system performance. While these delays are very common in the operation of most computer systems today, they are not desirable. Thus, a system that initially performs DMA address allocations at a relatively fast speed, eventually loses substantial performance speed (i.e., requires increased latency) after a period of operation. These systems thus do not perform as well as when the system initially starts up.
As described with the above and other types of conventional DMA mapping (e.g., using application programming interfaces (APIs)), several limitations exist. Among these limitations are the following: (1) while the driver is able to pre-register the size of the I/O DMA address the driver needs during the IPL time, the pre-registration does not guarantee address mapping operation success all the time. That is, the mapping may fail if the memory space runs out of the amount of continuous address space that is requested by the driver; and (2) the longer the system up time, the more fragmented the IO DMA address space will become and the longer the time required (i.e., latency of operation) to obtain a large contiguous address space for an IO DMA mapping. As noted above, this increased latency negatively impacts the overall system performance.