A modern computer system typically uses a memory management unit (MMU) for translating virtual addresses to physical addresses for its central processing unit (CPU). When an application program requests a contiguous range of virtual memory, the MMU can map the range of virtual addresses to scattered physical memory addresses that are available for allocation. A modern computer system generally includes devices that need a contiguous range of memory in the device address space for operation. For example, a camera needs a memory buffer for its operation. Some systems use a scatter-gatherer or an input-output MMU (IOMMU) to collect fragmented physical memory blocks into contiguous memory blocks in the device address space. Thus, large regions of memory in the device address space can be allocated without the need to be contiguous in physical memory.
Various devices in embedded systems have neither scatter-gatherer nor IOMMU support. As such, these devices require contiguous blocks of physical memory to operate. Examples of such devices include, but are not limited to cameras, hardware video decoders and encoders, etc. Some of these devices require buffers with large contiguous physical memory. For instance, a full high-definition (HD) frame is more than two mega pixels in size, which needs more than 6 megabytes of memory to store. Because of memory fragmentation, such a large chunk of contiguous physical memory is very unlikely to be available after the system executes a period of time.
Some embedded devices impose additional requirements on the buffers, e.g., they can operate only with buffers allocated in a particular location or memory bank (if the system has more than one memory bank), or buffers aligned to a particular memory boundary. This also adds additional complexity to the memory allocation algorithm in operating systems.
Some operating systems provide memory allocation functions for applications or drivers to acquire contiguous physical memory regions. One example is the Linux® Contiguous Memory Allocator (CMA), which runs in an operating system kernel to allow allocation of contiguous physical memory blocks.
The problem of CMA is scalability. Multiple kernel subsystems or multiple virtualized environments (e.g., virtual machines) may run concurrently in a system. When a physically contiguous memory region is requested through CMA APIs, CMA starts migrating all occupied memory pages in this region to other free memory space. However, some memory pages may be locked or used by other processes, and CMA needs to maintain and keep track of page state when performing the page migration process. Thus, a high degree of page synchronization among different subsystems or virtual machines may be necessary. Due to such a complex synchronization mechanism handled by CMA, CMA allocation time becomes more and more inefficient and unstable when the requested memory region size grows. Thus, CMA may sometimes fail to meet the latency requirement of some applications, such as video playback.