1. Field of the Invention
The present invention relates to the design of real-time distributed embedded systems, and, in particular, to the process of partitioning an embedded system specification into hardware and software modules using hardware-software co-synthesis.
2. Description of the Related Art
Advancements in VLSI, computer-aided design, and packaging areas have resulted in an explosive growth and interest in embedded system development. In real life, many embedded systems perform a multitude of complex tasks. Heterogeneous distributed architectures are very common for telecom and other embedded systems, where several processors and application-specific integrated circuits (ASICs) of different types are interconnected by various types of communication links, and a multitude of tasks are concurrently run on the system. Each of these tasks can be executed on a variety of software and hardware platforms which have different dollar costs. For example: 1) a telecom protocol handling function can be implemented on a general-purpose processor (software) or an ASIC (hardware) 2) an information packet (control or communication data) can be transferred via an I2C link, a 1 Mb/s local area network (LAN), a 680X0 bus, or a 10 Mb/s Ethernet LAN. Each option has varying hardware, delay, and power requirements. Parameters such as hardware, delay, and power are usually estimated by simulation/synthesis or laboratory measurement from previous designs.
The embedded system architecture has generally been defined based on the experience of system architects, and at times, it is either over-designed or fails to meet the requirements. Finding an optimal hardware-software architecture entails selection of processors, ASICs, and communication links such that the cost of the embedded system architecture is minimum and all real-time constraints are met. Hardware-software co-synthesis involves various steps such as allocation, scheduling, and performance estimation. Both allocation and scheduling are known to be NP-complete, where NP stands for non-deterministically polynomial. See References (1)-(2). Therefore, optimal co-synthesis is computationally a very hard problem.
Researchers have primarily focused their interest in last several years on hardware-software co-synthesis of one-CPU-one-ASIC architectures, where attempts have been made to move operations from hardware to software or vice versa to minimize cost and meet deadlines. See References (3)-(9). In the area of distributed system co-synthesis, the target architecture can employ multiple CPUs, ASICs, and field-programmable gate arrays (FPGAs). See Reference (10). Emphasis on distributed embedded system architecture co-synthesis and partitioning is fairly recent. See References (11)-(17). Two distinct approaches have been used to solve the distributed system co-synthesis problem: optimal and heuristic.
In the optimal domain, the approaches are: 1) mixed integer linear programming (MILP) and 2) exhaustive. The MILP solution proposed in Reference (11) has the following limitations: 1) it is restricted to one task graph, 2) it does not handle preemptive scheduling, 3) it requires determination of the interconnection topology up front, and 4) because of time complexity, it is suitable only for small task graphs (having 10 or so tasks). A configuration-level hardware-software partitioning algorithm is presented in Reference (12) based on an exhaustive enumeration of all possible solutions. Its limitations are: 1) it allows an architecture with at most one CPU, 2) simulation is used for performance evaluation which is very time-consuming, and 3) the communication overheads are ignored.
There are two distinct approaches in the heuristic domain: 1) iterative, where an initial solution is iteratively improved through various moves, and 2) constructive, where the solution is built step-by-step and the complete solution is not available until the algorithm terminates. In References (13)-(15), an iterative procedure is given. It considers only one type of communication link and does not allow mapping of each successive copy of a periodic task to a different processing element (PE). Another iterative synthesis technique for low-power systems is presented in Reference (16). Its limitations are: 1) it ignores inter-task communications, and 2) it is restricted to periodic task graphs where deadline is equal to period. A constructive co-synthesis procedure for fault-tolerant distributed embedded systems is proposed in Reference (17). However, it does not support communication topologies such as bus, LAN, etc., and its allocation step uses a pessimistic performance evaluation technique which may increase system cost. Also, it assumes that computation and communication can always be done in parallel, which may not be true. It is also not suitable for multi-rate embedded systems, e.g., multi-media systems.