One popular multi-unit parallel processor configuration is a single instruction stream, multiple data stream (SIMD) architecture. In a SIMD system, the same instruction is provided to all active processing units. Each processing unit can have its own set of registers along with some means for the processing unit to receive unique data. In newer multiprocessor chips, many small processing units (PU's) sometime called “synergistic processing elements” (SPE's) can be implemented where each SPE is generally a reduced instruction set computer that utilizes a simpler set of instructions. An SPE can have a greatly reduced memory allocation. In a multi-processor configuration each processor can have a relatively small memory allocation such as only 256K of memory.
After the processing unit processes an instruction and produces a result, the result must be stored in this relatively small memory space. This memory will typically be utilized for text, data, and stack, and heap operations. A heap can be a collection of dynamically allocated variables or an area used for allocation of storage whose lifetime is not related to the execution of the current routine and an area allocated by system software and system extensions to hold frequently used instructions. A stack can be a data construct that uses data on a last-in, last-out basis.
Memory allocation hardware and software in larger computers is very complex. However, when smaller processing units and memory systems are utilized, a sophisticated memory allocation algorithm cannot be utilized due to the lack of space for such overhead. Yet, having such a small memory space creates even a greater requirement for efficient usage of memory or memory allocations.
During operation, and when a requestor such as a PU needs to store data or utilize memory, the requestor (i.e. PU) can request a specific amount of memory and a requestee or allocator can process the request and return an address or a block of addresses to the requestor. A requestee can be a PU and may not be dedicated hardware but can be software that runs on the same hardware (i.e. same processing unit). The allocator can identify areas of memory that are available or “freed” and return these addresses to the requestor. The requestor can then send the data to memory, storing the data at the address that was allocated by the allocator. This is commonly referred to as dynamic memory allocation where areas of memory are used then freed and the allocator can track statuses of memory locations and return addresses to a processing unit based on locations in memory that are free. Static memory allocation is faster but inflexible because it has fixed limits.
Data alignment and memory allocation is generally the way data is arranged and accessed in computer memory. Data alignment is a fundamental, yet difficult issue for all modern computers. Different computer languages handle data storage allocation and data alignment very differently where some implementations have considerable overhead and are very complex. Often, a memory system will operate or be optimized to operate (i.e. store and retrieve data) on a sixteen byte basis. This is typically based on the size of the memory bus, register sizes etc. Dealing with smaller data segments can pose significant problems for an allocator. For example, when only four bytes need to be stored, the system may write only four bytes to an area that has 16 byte available. When this occurs and the four bytes are later retrieved, data structure alignment can also create additional inefficiencies.
When returning an allocation for use, the memory allocator will often use some space in front of the allocated memory or elsewhere in a structure for internal use. This is often referred to as a “header.” A header typically contains information such as the size of the allocation. Because this header is only used internally by the memory allocator, the header is considered “overhead” and reduces the usable free memory that can be allocated.