1. Field
Example embodiments may relate to image processors, and for example, to a single-instruction multiple-data (SIMD) processor used in processing images, a memory structure, and/or a method for storing data in the memory structure.
2. Description of Related Art
With technical advancements for digital signal processing, storage media, and transmission modes, technical services localized to voice information have evolved into multimedia solutions. For example, in the development of terminal apparatuses and multimedia service areas such as digital televisions, internet protocol televisions (IPTVs), and video-on-demand (VOD), compression/restoration techniques have been utilized, e.g., H.264, for higher quality images or pictures. However, because a higher-quality image compression/restoration technique may consume a higher quantity of operations than a normal mode, a higher-quality image compression/restoration technique may increase a demand for a programmable hardware structure as a video coder-decoder (for example, a video codec) solution for assisting a standard of multiple image compression in accordance with convergence of functions and services.
To meet the demand for the programmable hardware structure, various platform patterns of application-specific integrated circuits (ASICs) or digital signal processors (DSPs) were developed for image compression and/or restoration. However, these hardware structures may have problems of compatibility and/or increased costs. In order to overcome such problems, single-instruction multiple-data (SIMD) processors that offer higher efficiency and/or a lower cost were developed. A conventional SIMD processor may accept a single instruction in common for a plurality of SIMD arrays through processing elements (PEs) each provided in the SIMD array, and/or execute the received instruction in parallel. Each PE may include an operation circuit such as an arithmetic and logic unit (ALU).
As the conventional SIMD processor processes a plurality of data with a single instruction, the conventional SIMD processor may be able to more easily enhance data processing capability. For example, the conventional SIMD processor may be suitable for a repetitive routine with strong parallelism in operation, and the conventional SIMD processor is widely employed in processing multimedia, graphics data, or voice communications. There are various kinds of conventional SIMD processors, e.g., MMX, SSE, MAX-1, MAX-2, and Altivec. In efficiently utilizing a parallel processing architecture, such as an arrayed processing structure or a very long instruction word (VLIW) represented by SIMD, data may need to be supplied with an arrangement that allows available operation units to be normally enabled in operation. Accordingly, a conventional SIMD processor may be internally associated with a permutation unit as a circuit for arranging data within the conventional SIMD processor. FIG. 1 shows a general structure of a prior art SIMD processor 10.
Referring to FIG. 1, the prior art SIMD processor may include a vector arithmetic and logic unit (vector ALU) 20, a permutation unit 30, a vector register 40, and/or a memory 90. While FIG. 1 depicts one of the vector arithmetic and logic units (vector ALUs) 20, processing parallel data at a relatively higher frequency (e.g., fast parallel data processing) requires a plurality of the vector ALUs 20 coupled with each other in parallel. Responding to an input of an instruction, the prior art SIMD processor 10 may load and store data from a memory 90 into a vector register 40. Data stored in the vector register 40 may be rearranged through the permutation unit 30 and provided to the vector ALU 20. The vector ALU 20 may process the rearranged data provided from the permutation unit 30. The permutation unit 30 may be used with a crossbar or battery network to rearrange data for global multimedia operations.
A technique known as the ‘conflict free’ technique may arrange data so as to make access to a memory bank of a simple linear array in various parallel modes permissible. Data rearrangement or reassignment by the conflict free technique is carried out through the permutation unit 30. The permutation unit 30 may execute operations of multiplexing, shifting, and/or rotations by complex permutation networks and plural stages. Therefore, data rearrangement may take a relatively long amount of time. Accordingly, a higher cost for accessing the memory 90 and/or a longer data rearrangement time through the permutation unit 30 may reduce the operational efficiency of the prior art SIMD processor.