1. Field of the Invention
This invention relates to the field of data processing systems. More particularly, this invention relates to the mapping of computer programs to an asymmetric multiprocessing apparatus.
2. Description of the Prior Art
It is known to provide symmetric multiprocessing systems, such as dual-core Intel 80×86-based systems running Linux.
The mapping of portions of the code to be executed to the processors is handled by the operating system and assisted by hardware support. Having identical processors and a single, coherent memory system make it possible for the operating system to dynamically allocate a part of the program to an idle processor. However, such systems represent a significant hardware and power consumption overhead.
Other systems such as MIT's RAW processor, provide a number of identical processors each with a local memory and the ability to read data from any other memory in the system. (IBM's Cell processor has slightly more diversity: it has one control processor and many identical BE engines).
The uniformity of the hardware in such systems greatly simplifies the task of mapping an application onto this hardware allowing the programmer to focus on strategic decisions (e.g., what is the most efficient way to implement an application). However, these approaches are not flexible enough to work well with less uniform hardware.
Asymmetric multiprocessing systems (AMP) have much less uniformity, both in processor type and capability and in the memory hierarchy. This lack of uniformity is typically handled by creating multiple separate programs (one per processor) and creating communication protocols to communicate between these programs. In some systems, communication protocols are provided such as Remote Procedure Calls and data transfer protocols e.g. Phillips' TTL (P van def Wolf et al, Design of embedded microprocessors: An interface-centric approach, In Proceedings of International Conferences on Hardware/Software Codesign and System Synthesis (CODES+ISSS'04), 2004).
These and other mechanisms provide the mechanisms to map an application onto a particular AMP system taking advantage of and coping with any idiosyncrasies of the hardware. However, porting an application to an AMP system or porting a mapped application to a different AMP system or changing the way that an application is mapped onto the current AMP system is both time consuming and very error-prone because the changes needed to map an application to a given system are distributed across the whole system.
Where it is desired to make such applications portable between AMP systems, this is typically achieved by separating configuration information, such as the address range to which a variable is allocated, from the application to make it easy to change the configuration. This requires effort both to make parts of the application configurable to deal with the expected range of system variation and it requires effort to produce configuration data which must accurately reflect each particular system that the system must run on.
Though it is desirable to detect errors in configuration data, this is hard to do because the configuration data lacks the semantic information required to allow an error to be detected. For example, on some AMP systems it is an error for two variables to be assigned to the same address but if the variables are assigned to different memories it is not an error or if the variables are accessed by different processors which have different address maps, then the variables may be at different physical memory locations and, again, it is not an error or if the variables lifetimes do not overlap, then it is not an error. Thus, whilst low-level mechanisms can be used to create portable software, they are time-consuming, error-prone and the software is configurable in only a few dimensions.
Another type of communication model used in AMP systems is distributed object models such as Microsoft's COM and the Object Management Group's CORBA as, for example, in ST Microelectronics MultiFlex (Paulin et al, Parallel Programming Models for a Mulitprocessor SoC Platform Applied to Networking and Multimedia, IEEE Transactions on VLSI, Vol 14, no 7, pp 667-680, July 2006). These higher level models are typically less error-prone and increase portability, but these advantages come at the price of reduced performance or requiring more hardware support. For example, ST Microelectronics MultiFlex makes it easy to move a task from one processor to another by requiring Object Request Broker hardware to route messages to whichever processor is executing a task.
Low power, high performance data processing systems increasingly use asymmetric multiprocessing (AMP) and private memories, lack memory coherence and contain fixed function and/or programmable accelerators. Such systems can provide an advantageous combination of high performance with low cost and low power consumption. However, such systems are complex architecturally and there are a great variety of ways in which such systems may be formed. This causes problems for programmers of such systems.
A programmer of such systems may have to port a given application to a variety of systems which differ in architecture in a manner which requires significant alterations in the program and the way in which the program operates. Such programming of asymmetric multiprocessing systems is time consuming, expensive and error-prone.
Furthermore, there is a wide variety of possible design choices in the way in which a given program can be mapped upon an asymmetric multiprocessing system which is to execute that program. The large number of such possibilities and the effort required to produce programs embodying these possibilities mean that only a small proportion of the number of possible designs tend to be explored. Accordingly, there is a significant likelihood that the way in which a computer program is mapped upon an asymmetric multiprocessing system will be sub-optimal.
Thus, the prior art either relies on hardware support to make multiprocessors look more uniform, imposes a significant performance penalty due to using software libraries to make multiprocessors look more uniform, or they require programming in a way that is time-consuming, error-prone and provides limited portability.