Data read-ahead, with respect to a multi-core processor, is a system in which data required for a process for high-speed operation is read ahead and is stored in a cache memory. This system has been widely used (see Patent Document 1 and Patent Document 2).
For instance, a data read-ahead system disclosed in Patent Document 1 is well-known as a system for performing a data read-ahead in motion compensation (see FIG. 15) which is a process of decoding a bitstream encoded by means of inter-frame prediction used for compressing moving images.
FIG. 15 is a block diagram showing a method of decoding a bitstream which is compressed by means of inter-frame prediction. In FIG. 15, a bitstream supplied from the outside is first input to a variable-length code decoder 1001. The variable-length code decoder 1001 performs predetermined variable-length decoding on the input bitstream according to the information stored in the bitstream, and supplies the obtained information regarding a coding mode, a quantization parameter, a quantized orthogonal transform coefficient and the like to an inverse quantization unit 1002.
The variable-length code decoder 1001 also supplies information of a reference picture and a motion vector to a motion compensation unit 1004. The inverse quantization unit 1002 performs predetermined inverse quantization with respect to the quantized orthogonal transform coefficient supplied, and supplies the resulting information of the orthogonal conversion efficient to an inverse orthogonal transducer 1003. The inverse orthogonal transducer 1003 performs predetermined inverse orthogonal transformation on the orthogonal transform coefficient, and supplies the resulting differential image information to an adder 1006.
On the other hand, the motion compensation unit 1004 performs predetermined motion compensation using a reference picture stored in a frame memory 1005 according to the supplied information of the reference picture and the motion vector, and supplies the resulting predicted image information to the adder 1006. The adder 1006 adds the differential image supplied from the inverse orthogonal transducer 1003 and the predicted image supplied from the motion compensation unit 1004, and supplies the resulting decoded image information to the frame memory 1005. The frame memory 1005 stores the predetermined number of pieces of the decoded image supplied from the adder 1006, and supplies them to the motion compensation unit 1004, and also outputs the decoded image to the outside at a predetermined timing.
In general, as the size of a frame memory which stores decoded images is extremely large, a frame memory is seldom accommodated within a cache memory in a system having a strict restriction for resources such as an installed system.
As such, when a decoded image is written into a frame memory, or when a reference picture is referred to, cache errors will be caused frequently, which prevents high-speed decoding. In order to solve this problem, Patent Document 1 discloses that corresponding data is read before a frame memory is accessed and the data is stored in a cache memory to thereby increase the decoding speed.
A data read-ahead system described in Patent Document 2 is characterized as to include a dedicated thread for performing data read-ahead, a device which analyzes a source code and inserts an activating process of the data read-ahead thread to the optimum position, and a unit which measures the execution preference order of the program and the cache utilization. The system attempts to perform the optimum operating process by analyzing the data flow at the time of data compilation, inserting a process of generating a data read-ahead thread at an appropriate position, and measuring the execution preference order of the program and the cache utilization during execution, to thereby determine whether or not to perform data read-ahead.
In the data read-ahead method of Patent Document 1, data read-ahead of a relatively small capacity is performed each time the corresponding data is required.
In contrast, in the data read-ahead method of Patent Document 2, data read-ahead is performed with a dedicated thread for data read-ahead independently of the main thread. Such a difference has significant meaning particularly in a multi-core processor system.
That is, as a multi-core processor system (a reference numeral 100 in FIG. 14) can perform a plurality of threads in parallel simultaneously, and perform data read-ahead using a data read-ahead thread without disturbing the execution flow of the main thread, the method using a data read-ahead thread is capable of performing more effective data read-ahead. Further, it is particularly effective if there is an idle processor when data read-ahead is desired, because a data read-ahead process which is not required primarily can be performed with an extra processor independent of the main thread.
As obvious from the above description, in the case of decoding a compressed moving image, an idle processor is caused when a process is divided by functions and performed in parallel. Referring to FIG. 15, a variable-length code decoding process performed by the variable-length code decoder 1001 is required to be performed sequentially. Further, processes performed by the inverse quantization unit 1002, by the inverse orthogonal transducer 1003, and by the motion compensation unit 1004 respectively do not depend on one another in a macro block unit, and are divided by the functions and can be performed in parallel.
Therefore, after decoding for one frame by the variable-length code decoder 1001 is completed, if the processes with respect to the frame by the inverse quantization unit 1002, the inverse orthogonal transducer 1003 and the motion compensation unit 1004 are performed in parallel while the screen is split, when the variable-length code decoder 1001 performs decoding, the units other than the processor is in an idle state. If a frame memory area for storing the decoded image is desired to be read-ahead and secured in the cache memory, the data read-ahead thread can be executed by an idle processor.
FIG. 14 shows a hardware configuration of a multi-core processor system 100 in the conventional example described above. In FIG. 14, the multi-core processor system 100 includes n numbers of processors 111, 112, 113 . . . , a memory controller 120 shared by the processors 111, 112, 113 . . . , a cache memory 130 in which storing operation is controlled by the memory controller 120, and a main memory 140 required by the cache memory 130.
Reference numerals 151, 152, 153 . . . indicate buses for connecting the memory controller 120 and the respective processors 111, 112, 113 . . . . Further, reference numerals 160 and 170 indicate buses for connecting the memory controller 120 and the cache memory 130, and connecting the cache memory 130 and the main memory 140, respectively. The main memory 140 is also connected to the memory controller 120 with a bus not shown.
The main memory 140 is a large-capacity storage device although memory accessing is performed at a low speed. The cache memory 130 is a storage device having a small capacity but capable of performing high-speed accessing, which temporarily stores a part of commands and data of the main memory 140. The memory controller 120 performs a memory access control between each of the processors and the cache memory 130, and a memory access control between the cache memory 130 and the main memory 140. Each of the processors 111, 112 and 113 is an operating device which executes commands stored in the cache memory 130 or in the main memory 140.
If a program is written to be executed using a plurality of threads, the threads can be executed in parallel by different processors even though they belong to the same program. Further, the respective threads may share data via the cache memory 130 or the main memory 140.    Patent Document 1: Japanese Patent Laid-Open Publications No. 2006-41898    Patent Document 2: Japanese Patent Laid-Open Publications No. 2005-78264