1. Field of the Invention
The present invention generally relates to processing units, and more particularly to an arithmetic and logic device for the processing units which utilizes auxiliary computing units for performance enhancement and code size reduction.
2. The Prior Arts
Multimedia applications present a significant toll on conventional processing units. For example, the major function blocks of video compression such as motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, and variable length coding, etc. all require a large number of data processing instructions which in turn consume a significant portion of the processing capability of conventional processing units.
Various architectural improvements of central processing units (CPUs) have been proposed to facilitate the processing of multimedia applications. For example, recent commercial CPUs are designed to support various SIMD (single instruction multiple data) instructions such as Intel® Pentium CPU's streaming SIMD extension (SSE). Similarly, the digital signal processors (DSPs) are designed to support MAC instruction so that more data could be processed in a single instruction cycle.
One type of these architectural improvements is to use auxiliary computing units along the data path so as to reduce code size and the overhead of moving data between the CPU and the register file. For a CPU without the auxiliary computing units, the following code segment:
struct test_struct {int x;int y;} t;t.x += 7;t.y += 5;t.x += t.ywould be compiled into the following assembly codes:
movl4(%esp), %edx; point to tmovel(%edx), %eax; x itselfmovel4(%edx), %ebx; y itselfadd#7, (%eax); t.x += 7add#5, (%ebx); t.y += 5add(%ebx), (%eax); t.x += t.y.Obviously, at least three “add” instructions are required. However, for a CPU with appropriate auxiliary computing units, the code segment could be translated to the following assembly codes, which required only one “add” instruction:
movl4(%esp), %edx; point to tmovel(%edx), %eax; x itselfmovel4(%edx), %ebx; y itselfadd(%eax) ADD #7, (%ebx) ADD #5, (%eax); t.x = (t.x + 7) + (t.y + 5).
For the foregoing single “add” instruction to work, there should be some auxiliary computing units to perform the preliminary ADD operations. As such, significant code size and data moving overhead reduction could be achieved, and the performance of the processing unit is greatly enhanced.