I. Field
The present disclosure generally relates to discrete cosine transforms using digital signal processors. More particularly, the disclosure relates to a system and method of single stage discrete cosine transforms for VLIW-based digital signal processors.
II. Description of Related Art
Conventionally, digital signal processors (DSPs) often utilize very long instruction word (VLIW) architectures. A VLIW-based DSP can perform multiple operations within a single clock cycle. For example, a VLIW-based DSP can perform multiply-accumulate (MAC), Arithmetic Logic Unit (ALU), and memory load/store operations in a single cycle. The computing power of such DSPs makes it possible to implement a DSP-based multimedia system, which offers great flexibility and cost effectiveness.
A discrete cosine transform (DCT) is a mathematical operation that can be performed on a signal to convert the signal from the time domain to the frequency domain for further processing. The DCT has become a core technology in both still image and video compression standards, including Joint Photographic Experts Group (JPEG) lossy compression, Moving Picture Experts Group (MPEG) standards 1, 2 and 4, and the like.
Advances in technology have resulted in smaller and more powerful personal computing devices, many of which provide image and/or video capabilities. For example, there currently exist a variety of portable personal computing devices, including wireless computing devices, such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users. Many of such portable personal computing devices include a digital still camera, a digital video camera, a digital recorder, an audio file player, or any combination thereof. Additionally, a portable personal computing device can include a web interface that can be used to access the Internet. Consequently, many of the portable personal computing devices include DSPs.
To improve system performance and to save DSP cycles for use in other processes, various fast algorithms have been proposed to compute the DCT more efficiently by exploiting symmetric properties of the DCT. Conventionally, such fast algorithms have focused on reducing the number of multiplications under the assumption that multiplication takes longer than addition in the core processor. However, the assumption is no longer true for modern DSP architectures with single cycle multiplication instructions. Furthermore, existing DCT algorithms often include multiple stages that inhibit exploitation of the DSPs capabilities due to data dependency between stages.
Accordingly, it would be advantageous to provide an improved DCT algorithm for in a VLIW-based DSP.