The invention relates to methods for designing essentially digital devices.
An essentially digital device comprises at least of a memory organization (an amount of memories with their sizes and an interconnection pattern) and registers. Such a memory organization is determined during the design process of said digital device. The operation of an essentially digital system can essentially be described as a set of data access operations or instructions on data structures or variables, being stored in said memories.
In [L.Stok, Data path synthesis, integration, the VLSI journal, Vol.18, pp.1-71, June 1994.] register allocation, starting from a fully scheduled flow graph (thus ordered data access operations or instructions are used as input), are presented. Said allocation techniques are scalar oriented. Many of these techniques construct a scalar conflict or compatibility graph and solve the problem using graph coloring or clique partitioning. This conflict graph is fully determined by the schedule which is fixed before. This means that no effort is spent to come up with an optimal conflict graph and thus the potential optimization by reconsidering the schedule is not exploited. Moreover only register allocation is addressed and not memories.
In the less explored domain of memory allocation and assignment for hardware systems, the current techniques start from a given schedule [L.Ramachandran, D.Gajski, V.Chaiyakul, An algorithm for array variable clustering, Proceedings European Design and Test Conference, pp.262-266, Paris, Mar. 1994.], [P.Lippens, J.van Meerbergen, W.Verhaegh, A.van der Werf, Allocation of multiport memories for hierarchical data streams, Proceedings IEEE International Conference on Computer-Aided Design, pp.728-735, Santa Clara, Nov. 1993.], [O.Sentieys, D. Chillet, J. P.Diguet, J.Philippe, Memory module selection for high-level synthesis, Proceedings IEEE workshop on VLSI signal processing, Monterey Calif., October 1996.] or perform first a bandwidth estimation step [F.Balasa, F.Catthoor, H.DeMan, xe2x80x9cDataflow-driven memory allocation for multi-dimensional processing systems,xe2x80x9d Proceedings IEEE International Conference on Computer Aided Design}, San Jose Calif., Nov. 1994.] which is a kind of crude ordering that does not really optimize the conflict graph either. These techniques have to operate on groups of signals instead of on scalars to keep the complexity acceptable.
In the parallel compiler domain [M.Al-Mouhamed, S.Seiden, A heuristic storage for minimizing access time of arbitrary data patterns, IEEE Transactions on Parallel and Distributed Systems, Vol.8, No.4, pp.441-447, Apr. 1997.] proposes a technique to partition arrays into groups of data that have to be assigned to different memories such that they can be accessed simultaneously for an SIMD architecture. They combine the constraints of a number of given access patterns into a single linear address transformation that calculates for every data element the memory in which it should be stored to minimize the total access time. This technique allows to avoid the allocation of multi-port memories for storing data with self-conflicts, by explicitly splitting arrays into smaller arrays that can be assigned to single port memories. However said method does not exploit all optimization opportunities for instance by rescheduling data access instructions.
[S.Pinter, Register allocation with instruction scheduling: a new approach, ACM SIGPLAN Notices, Vol.28, pp.248-257, Jun. 1993.] optimizes a conflict graph in the context of scalar register allocation by removing weighted edges in a coloring problem prior to scheduling. However, the conflicts in their initial conflict graph are determined by the sequential ordering of the input code. Also this idea was not applied to groups of scalars.
The Improved Force Directed Scheduling (IFDS) [W.Verhaegh, P.Lippens, E.Aarts, J.Korst, J.van Meerbergen, A.van der Werf, Improved force-directed scheduling in high-throughput digital signal processing, IEEE Transactions on CAD and Systems, Vol.14, No.8, Aug. 1995.] shows a method wherein scheduling intervals are gradually reduced until the desired result is obtained. The cost function used to determine which scheduling interval has to be reduced at each iteration only takes the number of parallel data accesses to reduce the required memory bandwidth into account. (1)FDS does not take into account which data is being accessed. Balancing the number of simultaneous data accesses is a local optimization which can be very bad globally. In IFDS all data is treated equally, although in practice some simultaneous data accesses are more expensive in terms of memory cost than other. Also the required number of memories cannot be estimated accurately by looking locally only, as is done in IFDS, because all conflicts have to be considered for this.
One aspect of the invention includes a method of determining optimized scheduling intervals and optimized access conflicts useful for determining an optimized memory organization of an essentially digital device, the method comprising: determining an initial scheduling of the data access instructions for a plurality of disjunct blocks, wherein each of the blocks include part of the data access instructions, and wherein at least one of the blocks is executed a plurality of times that is defined by an iteration count, deriving from the initial scheduling an initial block cycle budget for each block, while the overall cycle budget for performance of the digital device is larger than a predetermined overall cycle budget, repeating the method comprising: (a) for substantially all of the blocks, performing the method comprising: temporarily reducing a block cycle budget for a selected block by a predetermined amount; determining optimized scheduling intervals of the data access instructions such that the performance of the digital device is guaranteed to be within the block cycle budgets, wherein determining the optimized scheduling intervals comprises optimizing access conflicts with respect to an evaluation criterion related to the memory cost of the digital device; computing the overall cycle budget resulting from the optimized scheduling intervals; and (b) reducing the block cycle budget for at least one selected block, the selection of the block being based at least in part upon the memory cost and an overall cycle budget reduction.
Another aspect of the invention includes a method of determining a cost-cycle budget curve for an essentially digital device that is represented by a digital representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times that is indicated by an iteration count, the method comprising: generating a cost-cycle budget curve that compares the cost of a memory organization of the digital device versus the cycle budget, wherein the cost-cycle budget curve is incrementally generated.
Another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that execution of the functionality with the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: determining block cycle budgets while optimizing scheduling intervals.
Yet another aspect of the invention includes a method of determining an optimized memory organization of an essentially digital device represented by a representation describing the functionality of the digital device, the representation comprising data access instructions on basic groups of scalar signals, the data access instructions having scheduling intervals, the representation comprising a plurality of disjunct blocks, each block including part of the data access instructions, at least one block being executed at least a plurality of times, the optimized memory organization being such that the performance of the digital device is guaranteed to be within a predetermined overall cycle budget, the method comprising: (a) determining initial block cycle budgets such that no access conflict exists between the data access instructions within each block; (b) temporarily reducing, in an iterative process, a block cycle budget by a predetermined amount, and computing the access conflict cost and overall cycle budget reduction of such reduced block cycle budget; (c) reducing the block cycle budget for at least one selected block; and (d) returning to act (b) if the overall cycle budget for execution of the functionality with the digital device is larger than the predetermined overall cycle budget.
Yet another aspect of the invention includes a method optimizing the scheduling of data instructions, the method comprising: determining a scheduling of data instructions for a plurality of blocks; determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; calculating a cumulative cycle budget for the blocks; and repeating the identifying, reducing and modifying, wherein the modifying includes modifying a modified scheduling until the cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
Yet another aspect of the invention includes a system for optimizing the scheduling of data instructions, the system comprising: means for determining a scheduling of data instructions for a plurality of blocks; means for determining a cycle budget for substantially all of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of a block; means for identifying one of the blocks for a cycle budget reduction; means for reducing the cycle budget of the identified block; and means for modifying the scheduling and cycle budget of at least one of the blocks until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget.
Yet another aspect of the invention includes a digital device having an optimized memory organization, wherein the design of the memory organization is generated by the method comprising: determining an initial scheduling of data instructions for a plurality of blocks, wherein the data instructions are to be executed by the digital device; determining a cycle budget for substantially each of the blocks, wherein the cycle budget is determined based at least in part upon the determined initial scheduling of the block; identifying one of the blocks for a cycle budget reduction; reducing the cycle budget of the identified block; modifying the scheduling of at least one of the blocks; repeating the identifying, reducing and modifying acts, wherein the modifying act includes modifying a modified scheduling, until a cumulative cycle budget for substantially all of the blocks satisfies a predetermined cycle budget, and wherein the modified scheduling is used to define the design of a memory organization for the digital device.