Current embedded systems are tending to implement ever more complex processings. By way of example, cell phones must implement complete telecommunication facilities. Onboard video-surveillance devices must implement complete facilities for image processing. In parallel with this increase in the complexity of applications, the number of applications embedded within one and the same apparatus is likewise constantly increasing. This is explained notably by the wish to create ever more versatile objects, by associating for example in a cell phone of “smartphone” type or in a “personal digital assistant”, telecommunication functions, multimedia functions, games or else satellite positioning functions. In addition to the need to provide ever higher computational powers, it is also necessary to be capable of optimizing the architecture specifically as a function of the current execution environment. This constitutes one of the technical problems that the present invention proposes to solve.
Moreover, many embedded systems are tending to offer wider scope for possibilities, leaving users free to use them as they please to execute their own applications. This wider scope impairs the effectiveness of static solutions for optimizing system architecture, commonly termed “off-execution” or “off-line” solutions, since the application context cannot be entirely determined in the design phase. Notably, having regard to advances in the capacity of video sensors and fast converters, the type and the volume of data manipulated are difficult to determine. Moreover, in a good many applications, the processing to be carried out varies as a function of the input data. By way of example, video-surveillance applications are typically aimed at searching for objects in a scene and then, when one or more objects have been detected, the application moves on to a phase of tracking the detected object, or indeed of analysis. This again constitutes one of the technical problems that the present invention proposes to solve.
Non-static solutions for optimizing system architecture, commonly termed “during execution” or “on-line” solutions, do not need to predict all the scenarios of use. They essentially entail implementing dynamic mechanisms for controlling resources, for example mechanisms for allocating computation resources or for allocating storage resources in such a way that they adapt rapidly to the application context. Intensive computation tasks then run alongside control-dominated tasks, with very strong interactions between these tasks which communicate with one another. However, to put such control mechanisms in place may be expensive in terms of performance. This again constitutes one of the technical problems that the present invention proposes to solve.
Indeed, the tasks communicate notably by way of data buffers. A buffer is an area of the memory to which a single producer task can write and from which several consumer tasks can potentially read. Within the framework of a very complex application, the number of memory buffers necessary for the execution of the application may thus exceed the total storage capacity of the machine. This again constitutes one of the technical problems that the present invention proposes to solve.
A current solution is to allocate and to deallocate, dynamically, the buffers currently being used by the application. The memory allocation is aimed mainly at reserving a memory space of sufficient size to store a given object which is manipulated in a program, and then at freeing said memory space after its use. But the allocator is also in charge of keeping information up to date which specifies which portions of the memory are used and which portions are free, doing so at the cost of a time penalty. The prior art abounds with techniques aimed at optimizing memory occupancy while minimizing the time penalty. Most known techniques are based on the allocation of contiguous memory areas, according to three types of approach: “Sequential Fit”, “Segregated Free Lists” and “Buddy System”.
Allocators of the “Sequential Fit” type are based on a linear list of all the free blocks in memory. Thus, a phase of allocating an object of nb memory pages consists in sequentially scanning this list until a free memory block of nb pages is found. This algorithm, and its various optimizations, is very simple to implement. But it is extremely penalizing since the whole list may potentially have to be scanned, in a sequential manner, before finding an allocatable memory area. This constitutes a major drawback of the “Sequential Fit” approach.
In order to accelerate allocation, allocators of the “Segregated Free Lists” type consider not a single list of all the free blocks in memory, but several lists of free blocks, each list containing only the free blocks of a certain size. For example, one list can contain the free blocks of 10 to 20 pages, another the free blocks of 20 to 40 pages, etc. During the allocation phase, the search for a free area is done only in the list containing the blocks of suitable size. This approach greatly accelerates the search, but it makes it necessary to maintain numerous lists. This constitutes a major drawback of the “Segregated Free Lists” approach.
In order to further accelerate allocation, allocators of the “Buddy System” type consider lists containing free blocks whose size is a power of 2. If a block does not have a size expressible as a power of 2, its size is approximated to the immediately higher power of 2. This restriction makes it possible to virtually cut the memory space into two sets of half size. Each set is in turn decomposed into two smaller entities until a limit size is reached. This approach decreases the number of lists, but it causes significant fragmentation of the memory. Indeed, rounding to the higher power of 2 gives rise to underutilization of the memory blocks. This constitutes a major drawback of the “Buddy System” approach.
However, be they implemented in software or be they implemented on specific hardware operators to further accelerate allocation, these three types of solutions for dynamically allocating and deallocating memory always suffer from the constraint of contiguity of the allocated memory areas. Indeed, this contiguity constraint leads in all cases to underutilization of the memory, allocation requests possibly failing not because the available memory space is not sufficient, but because no sufficiently wide contiguous area of memory exists. This again constitutes one of the technical problems that the present invention proposes to solve.
Various software solutions make it possible to allocate and to deallocate non-contiguous memory areas dynamically. For example, in the article “Page-Based Non-Contiguous Dynamic Memory Allocator” (J. Chen et al), a hardware allocator uses a data structure of “First In First-Out” (FIFO) type to store all the free memory pages. At each allocation, it pulls a page out of the FIFO structure. During a deallocation, it pushes the freed page into the FIFO structure. This simple solution allows good reactivity. But it requires a FIFO data structure whose size is directly proportional to the number of memory pages in the system. It may therefore have a high silicon cost. Moreover, this solution does not make it possible to optimize the distribution of the pages in the memory space in order to maximize the use of parallelism of access in the case where the memory space is distributed over several memory banks, more commonly designated “banked memory space”. This constitutes a major drawback. Another example, in the article “SOCDMMU Dynamic Memory Management For Embedded Real-Time Multiprocessor System on a Chip” (M. Shahalan), a module called SOCDMMU uses an array describing the state of all the memory pages, be they empty or full. The search for an empty page is done by the “First-fit” algorithm, which searches for the first page available in the array, this page thereafter being allocated. The data structure making it possible to retrieve the free pages is much less voluminous than a FIFO structure, but the search for free pages may also be lengthy since, in the worst case, the whole of the memory state array may have to be scanned before pinpointing a free page. Moreover, neither does this solution make it possible to optimize the distribution of the pages in the memory space in order to best utilize the parallelism of access in the case of banked memories. This again constitutes one of the technical problems that the present invention proposes to solve.
Moreover, the management of data buffers manipulated simultaneously by a producer task and one or more consumer tasks is not optimized in current systems. In order to guarantee compliance with the dependencies of the “read-after-write” or “write-after-react” type, current systems assume roughly that the data producer and consumer tasks are explicitly synchronized. Thus, a page is freed only when it has been entirely consumed. This again constitutes one of the technical problems that the present invention proposes to solve.