Due to the enormous progress that has been made during the last few decades in the area of integrated circuits (ICs), the complexity of systems implemented in silicon has been increasing drastically. This has lead to system-on-a-chip designs where all the components previously combined on a board are now integrated on a single die. The design of these systems often requires trade-offs to be made between the cost factors such as: chip area/count, power consumption, design time, and execution speed. The vast majority of recent signal processing systems that have to be implemented in silicon, including multimedia systems, require huge amounts of data to be processed, transferred and stored temporarily. As a result, the largest contributions to the area and power cost factors originate from data storage and transfers. Data-intensive signal processing applications can be subdivided into two classes: most multimedia applications (including video and medical imaging,) and network communications protocols (e.g. ATM network protocols). On application specific integrated circuits (ASICs), typically more than half of the silicon area is being occupied by storage units and related hardware such as address generation logic. Moreover, most of the power consumption in these systems is directly related to data accesses and transfers, both for custom hardware and for processors. Therefore, there is a need to improve data storage and transfer management and to reduce the chip area/count and power consumption.
Unfortunately, any effective optimization would require relatively aggressive global transformations of the system specifications and, due to the high complexity of modern systems, such transformations are often impossible to perform manually in an acceptable time. Many designers are not even aware that such optimizations might affect the area and power cost considerably. Commercially available CAD tools for system synthesis currently offer little or no support for global system optimizations. They usually support system specification and simulation, but lack support for global design exploration and certainly for automated global optimizations. They include many of the currently well-known scalar optimization techniques (e.g. register allocation and assignment) but these are not suited for dealing with large amounts of multi-dimensional data. Also standard software compilers are limited mainly to local (scalar) optimizations.
For instance, storage size requirements for multi-dimensional data have received only little attention from the compiler communities because they were (initially) not seen as a cost (nor a problem). The storage order of data was originally even not seen as a optimization parameter, e.g. it was simply defined by the programming language (e.g. row-major in C or column-major in Fortran), and the optimization efforts were concentrated on obtaining the highest possible degree of parallelism. If a bad storage order is chosen, many "holes" in the memories may be formed, i.e. locations that cannot be used at certain moments in time, resulting in increased storage size requirements. Unfortunately, manual optimization of the storage requirements is very tedious and error-prone for real life applications, because it involves complex bookkeeping. Techniques to help designers take better decisions, or even to automate this difficult task, are therefore certainly desirable.
It is an object of the present invention to provide a method and an apparatus for reducing the storage space required for temporary data when executing a program, particularly to reduce the storage size of multi-dimensional data arrays as these have the largest impact on the storage cost.
Another object of the present invention is to find a storage order for each (part of an) array such that the overall required size (number of locations) of the memories is minimal.
Another object of the present invention is to find an optimal layout of the arrays in the memories such that the reuse of memory locations is maximal.