1. Field of the Invention
The present invention relates to the design of real-time distributed embedded systems, and, in particular, to the process of partitioning an embedded system specification into hardware and software modules using hardware-software co-synthesis.
2. Description of the Related Art
The architecture definition of embedded systems has generally largely been dependent on the ingenuity of system architects. However, in addition to a longer architecture definition interval, at times, the resulting architecture is either over-designed or fails to meet the specified constraints. Therefore, design automation in the area of hardware-software co-synthesis is of utmost importance from design time and quality of architecture standpoints. Finding an optimal hardware-software architecture entails selection of processors, application-specific integrated circuits (ASICs), and communication links such that the cost of the architecture is minimum and all real-time constraints are met. Hardware-software co-synthesis involves various steps such as allocation, scheduling, and performance estimation. The allocation step determines the mapping of tasks to processing elements (PEs) and inter-task communications to communication links. The scheduling step determines the sequencing of tasks mapped to a PE and sequencing of communications on a link. The performance estimation step estimates the finish time of each task and determines the overall quality of the architecture in terms of its dollar cost, ability to meet its real-time constraints, power consumption, fault tolerance, etc. Both allocation and scheduling are known to be NP-complete. See References (1)-(2). Therefore, optimal co-synthesis is computationally a very hard problem.
Researchers have primarily focused their interest in the last several years on hardware-software co-synthesis of one-CPU-one-ASIC architectures (see References (3)-(9)), where attempts have been made to move operations from hardware to software or vice versa to minimize cost and meet deadlines.
In the area of distributed system co-synthesis, the target architecture can employ multiple CPUs, ASICs, and field-programmable gate arrays (FPGAs). See Reference (10). Two distinct approaches have been used to solve the distributed system co-synthesis problem: optimal and heuristic.
In the optimal domain, the approaches are: 1) mixed integer linear programming (MILP) and 2) exhaustive. The MILP solution proposed in Reference (11) has the following limitations: 1) it is restricted to one task graph, 2) it does not handle preemptive scheduling, 3) it requires determination of the interconnection topology up front, and 4) because of time complexity, it is suitable only for small task graphs. A configuration-level hardware-software partitioning algorithm is presented in Reference (12) based on an exhaustive enumeration of all possible solutions. Its limitations are: 1) it allows an architecture with at most one CPU, 2) simulation is used for performance evaluation which is very time-consuming, and 3) the communication overheads are ignored.
There are two distinct approaches in the heuristic domain: 1) iterative (see References (13)-20A, (16)), where an initial solution is iteratively improved through various moves, and 2) constructive (see fill References (17)-(19)), where the solution is built step-by-step and the complete solution is not available until the algorithm terminates. The iterative procedure given in References (13)-(15) has the following limitations: 1) it considers only one type of communication link, and 2) it does not allow mapping of each successive copy of a periodic task to different PEs. Another iterative procedure targeted for low power system is proposed in Reference (16). It uses power dissipation as a cost function for allocation and has the following limitations: 1) it ignores inter-task communication scheduling, and 2) it is not suitable for multi-rate systems commonly found in multi-media systems. A constructive co-synthesis procedure for fault-tolerant distributed embedded systems is proposed in Reference (17). However, it does not support communication topologies such as bus, local area network (LAN), etc., and its allocation step uses a pessimistic performance evaluation technique which may increase system cost. Also, it assumes that computation and communication can always be done in parallel, which may not be possible. It is also not suitable for multi-rate embedded systems, e.g., multi-media systems. The optimal approaches are only applicable to task graphs consisting of around 10 tasks, and the heuristic approaches cannot tackle hierarchical task graphs or architectures.
Hierarchical hardware-software architectures have been presented previously in Reference (20). There, a parameterized hierarchical architectural template is specified a priori, with ASICs at the lowest layer, general-purpose processors at the next higher layer, and single-board computers above that. Tasks from the task graphs are then manually allocated to one of these layers. However, such a pre-specified architectural template may not lead to the least expensive architecture, as pointed out in Reference (20) itself.
Large embedded systems are generally specified in terms of hierarchical task graphs. Thus, it is important for a co-synthesis algorithm to exploit and tackle such specifications. Also, non-hierarchical architectures for large embedded systems, such as those used in telecom applications, inherently create processing and communication bottlenecks. This can substantially increase the embedded system cost.