1. Field of the Invention
The present invention relates to a technique to control an external memory in a data driven type information processing apparatus, and more particularly, to an external memory control device permitting efficient data access to an external memory and deriving parallel processing capability of a data driven type information processing apparatus upon execution of operations to the fullest extent, and a data driven type information processing apparatus including the same.
2. Description of the Background Art
In recent years, there has been an increasing demand for improved processor performance in a variety of fields requiring high-speed processing of a large amount of data such as multimedia processing and high-definition image processing. With the current LSI (Large Scale Integrated circuit) manufacturing techniques, however, there is a limit for speeding of devices. Thus, parallel processing has attracted attention, and research and development concerning the parallel processing have vigorously been made.
Among the computer architectures suitable for the parallel processing, a data driven type architectures has attracted the most attention. In a data driven type processing method, processing proceeds in parallel in accordance with a rule that “processing is executed once input data necessary for the relevant processing have all become available and the resources including operating devices necessary for the processing have been allocated”.
FIG. 1 is a block diagram showing a schematic configuration of a conventional data driven type information processing system. This data driven type information processing system includes a data driven type information processing apparatus (hereinafter, “data driven information processor”) 101 and an external memory 102 which stores data processed by data driven information processor 101 and others.
Data driven information processor 101 is provided with input ports IA and IB connected with data transmission paths 103 and 104, respectively, output ports OA and OB connected with data transmission paths 105 and 106, respectively, and an external memory port TM connected with an access control line 107.
Data driven information processor 101 receives packets PA_IO, which will be described later, from input ports IA and IB via data transmission paths 103 and 104 in time series. Data driven information processor 101 prestores prescribed processing contents as programs within, and processes the input packets PA_IO in accordance with the program contents.
External memory 102, in receipt of an access request signal from external memory port TM of data driven information processor 101 via access control line 107, performs an access according to the access request and acknowledges the access request to data driven information processor 101.
After completion of processing of input packet PA_IO, data driven information processor 101 outputs the packet PA_IO including processed contents via output port OA and data transmission path 105, or via output port OB and data transmission path 106.
FIGS. 2A–2C illustrate packets used in data driven information processor 101. FIG. 2A shows a basic configuration of the input/output packet PA_IO of data driven information processor 101. Input/output packet PA_IO includes a field 140 storing a processor number PE (Processor Element), a field 142 storing a core number CI, a field 143 storing a node number ND, a field 144 storing a color CL, and a field 145 storing data D.
Processor number PE is an indicator which identifies, in a system having a plurality of data driven information processors connected thereto, the data driven information processor in which the relevant packet PA_IO is to be processed. Core number CI is an indicator which identifies a processing core within data driven information processor 101, which will be described later.
Node number ND is used as an address for accessing contents stored in a constant memory and a program memory, which will be described later. Color CL is an identifier for uniquely recognizing each packet being input to data driven information processor 101 in time series. Color CL is used for calculation or save of memory addresses in a built-in memory control unit and an external memory control unit, which will be described later. Data D is data to be processed by data driven information processor 101.
FIG. 2B shows a basic configuration of a packet PA_RT which is generated inside data driven information processor 101. Packet PA_RT, generated in an input/output control unit as will be described later, is identical to packet PA_IO except that it has a field 141 storing an instruction code OP replacing the field 140 storing the PE number of packet PA_IO. Instruction OP determines a kind of operation to be performed on data D.
FIG. 2C shows a basic configuration of a packet PA_FC which is generated inside data driven information processor 101. Packet PA_FC, generated in a firing control unit as will be described later, is identical to packet PA_RT except that it has a field 146 storing left data LD and a field 147 storing right data RD in place of the field 145 storing data D in packet PA_RT.
FIG. 3 is a block diagram showing schematic configurations of data driven information processor 101 and external memory 102. Data driven information processor 101 includes: a plurality of processing cores 110 and 111 connected to external memory 102 via external memory port TM; processing cores 112 and 113 not connected to external memory 102; a main router 114, and an input/output control unit 115. External memory 102 includes SDRAM (Synchronous Dynamic Random Access Memories) 116 and 117 connected to processing cores 110 and 111, respectively, via external memory port TM.
When a packet PA_IO is applied via data transmission path 103 or 104 to the data driven information processor designated by processor number PE, input/output control unit 115 inputs the packet PA_IO via input port IA or IB, and generates a packet PA_RT from the relevant packet PA_IO. Specifically, input/output control unit 115 discards the processor number PE of packet PA_IO, acquires an instruction code OP and a new node number ND based on the node number ND of packet PA_IO, and stores them in the fields 141 and 143, respectively, of packet PA_RT. Input/output control unit 115 then sends the generated packet PA_RT to main router 114. Core number CI, color CL and data D are remained unchanged in input/output control unit 115.
Main router 114 selects a next destination of packet PA_RT referring to core number CI thereof, and sends the packet PA_RT to the destination. The field 142 for storage of core number CI stores a number indicating any of processing cores 110–113 and input/output control unit 115, so that the destination can be determined by referring to core number CI.
When main router 114 receives a packet PA_RT generated by any of processing cores 110–113 and storing processed data, it sends the packet PA_RT to input/output control unit 115. Input/output control unit 115, in receipt of packet PA_RT, decodes instruction code OP and node number ND of packet PA_RT, and determines whether a next instruction is to be executed within the relevant data driven information processor 101 or it should be executed in an external data driven information processor.
When determining that it should be executed by an external data driven information processor, input/output control unit 115 generates a packet PA_IO, as shown in FIG. 2A, with the PE number of the external data driven information processor stored in the field 140 for storage of PE number, and sends out the packet PA_IO via output port OA or OB.
On the other hand, if it determines that it should be executed in the relevant data driven information processor 101, input/output control unit 115 sends the packet PA_RT again to main router 114, with the contents of all the fields thereof remained unchanged.
FIG. 4 is a block diagram showing a schematic configuration of processing cores 110, 111. Processing cores 110 and 111 each include: an external memory control unit 121 connected to external memory port TM; a firing control unit 122 which receives packet PA_RT from main router 114; a branch unit 123; a built-in memory control unit 124; a built-in memory 125; a merge unit 126; an operating unit 127; and a program storing unit 128 which sends the processed packet PA_RT to main router 114.
Firing control unit 122 includes a constant memory 131 storing a constant necessary for performing data driven type processing, and a queuing memory 132 used for queuing of packets. Program storing unit 128 includes a program memory 133 storing a program necessary for performing the data driven type processing.
Firing control unit 122, in receipt of packet PA_RT from main router 114, detects constant data or a packet PA_RT making a pair with the relevant packet PA_RT. The packet PA_RT to be paired is detected by match of both node number ND and color CL thereof. If the matching packet PA_RT is not detected, firing control unit 122 temporarily stores packet PA_RT in queuing memory 132 for queuing. That the packet PA_RT to be paired is detected is called “firing”.
Firing control unit 122, upon detection of paired packets PA_RT, stores data D having been stored in field 145 of one packet PA_RT in a packet PA_FC as shown in FIG. 2C in its field 146 for storage of left data LD, and stores data D having been stored in field 145 of the other packet PA_RT in the packet PA_FC in its field 147 for storage of right data RD. Firing control unit 122 sends packet PA_FC thus generated to branch unit 123. The other packet PA_RT is erased at this time.
If the data to be operated together is not a packet PA_RT but constant data, firing control unit 122 reads constant data from constant memory 131, and stores it in one of the fields 146 and 147 of packet PA_FC shown in FIG. 2C, and stores data D of packet PA_RT in the other of the fields 146 and 147. It sends packet PA_FC thus generated to branch unit 123.
Branch unit 123, in receipt of packet PA_FC from firing control unit 122, decodes instruction code OP of packet PA_FC and selects either one of built-in memory control unit 124 and external memory control unit 121 to send the packet PA_FC. Branch unit 123 sends the received packet PA_FC, with the contents of all the fields 141–144, 146 and 147 remained unchanged, to built-in memory control unit 124 or external memory control unit 121.
Built-in memory control unit 124, in receipt of packet PA_FC from branch unit 123, decodes instruction code OP of packet PA_FC, and executes prescribed processing in accordance with the decoded result. For example, if the instruction code OP is a built-in memory access instruction, it accesses built-in memory 125, and changes the content of the field 146 for storage of left data LD or the field 147 for storage of right data RD of packet PA_FC in accordance with the access result. It then sends the resultant packet PA_FC to merge unit 126. If instruction code OP is not the built-in memory access instruction, built-in memory control unit 124 sends the received packet PA_FC to merge unit 126 without alteration, i.e., with the contents of all the fields 141–144, 146 and 147 remained unchanged.
External memory control unit 121, in receipt of packet PA_FC from branch unit 123, decodes instruction code OP of packet PA_FC, and performs prescribed processing in accordance with the decoded result. For example, if instruction code OP is an external memory access instruction, it accesses SDRAM 116 or 117 and changes the content of field 146 for storage of left data LD or field 147 for storage of right data RD of packet PA_FC in accordance with the access result. It then sends the resultant packet PA_FC to merge unit 126. If instruction code OP is not the external memory access instruction, external memory control unit 121 sends the received packet PA_FC, with the contents of all the fields 141–144, 146 and 147 remained unchanged, to merge unit 126.
Merge unit 126 sends the received packet PA_FC without alteration to operating unit 127, with the contents of all the fields 141–144, 146 and 147 of the packet PA_FC being remained unchanged.
Operating unit 127 decodes instruction code OP of packet PA_FC received from merge unit 126, and performs prescribed processing in accordance with the decoded result. For example, if instruction code OP is an operation instruction with respect to a content of packet PA_FC, operating unit 127 performs a prescribed operation in accordance with the relevant instruction code OP using the content (mainly, left data LD, right data RD) of packet PA_FC. It then generates packet PA_RT as shown in FIG. 2B by storing the operation result in its field 145 for storage of data D, and sends the generated packet PA_RT to program storing unit 128. Basically, instruction code OP, node number ND and color CL are remained unchanged.
Program memory 133 within program storing unit 128 stores a plurality of instruction codes OP to be executed subsequently and node numbers ND corresponding thereto. Program storing unit 128, in receipt of packet PA_RT from operating unit 127, reads instruction code OP to be executed next and its corresponding node number ND from program memory 133 in accordance with the addressing by node number ND of packet PA_RT, and stores the read instruction code OP and node number ND to field 141 for storage of instruction code OP and field 143 for storage of node number ND, respectively, of packet PA_RT. Program storing unit 128 then sends the generated packet PA_RT to main router 114. The contents of the field 144 for storage of color CL and the field 145 for storage of data D are remained unchanged.
FIG. 5 is a block diagram showing a schematic configuration of processing cores 112, 113. The configuration of respective processing core 112, 113 shown in FIG. 5 differs from the configuration of processing core 110, 111 in FIG. 4 in that external memory control unit 121, branch unit 123 and merge unit 126 are removed therefrom. Packet PA_FC from firing control unit 122 is sent to built-in memory control unit 124 without exception. That is, an external memory access instruction is not executed in processing core 112, 113. Otherwise, the configuration of processing core 112, 113 is identical to that of processing core 110, 111 in FIG. 4, and thus, detailed description thereof is not repeated.
As such, the processing in accordance with the data flow program prestored in program memory 133 proceeds while packets PA_RT and PA_FC go around or circulate through data driven information processor 101. In data driven information processor 101, packets PA_RT and PA_FC are transferred asynchronously by handshaking.
Pipeline processing, and hence, parallel processing is accomplished with packets PA_RT and PA_FC going around data driven information processor 101 in accordance with the data flow program stored in program memory 133. Accordingly, in data driven information processor 101, parallelism of processing in units of packets and a flow rate of a packet circulating therein constitute an important measure of processing performance of the data driven information processor 101.
In recent years, the data driven information processors having the characteristics described above have been applied to image processing, audio processing, network protocol processing and others requiring high-speed processing of a large amount of data as well as high-speed data transfer. The image processing, audio processing and network protocol processing are common in the sense that they deal with a huge amount of data.
To store such a large amount of data to be processed, the external memory 102 connected to data driven information processor 101 is utilized, since it is difficult to store all the data to be processed in an internal memory of the processor 101. External memory 102 is used as an image frame memory in the image processing, used as a temporary memory of audio data for expressing reverberation effects in the audio processing, and used as a payload memory in the network protocol processing.
Although external memory 102 has large capacity compared to the internal memory of data driven information processor 101, its data access rate is low, causing a bottleneck against distinctive parallel processing capability of data driven information processor 101. Thus, to eliminate the bottleneck, a cache memory having a high access rate is provided within data driven information processor 101 so that data driven information processor 101 can access external memory 102 via the cache memory.
As described above, there is a rule in the data driven type processing method that “processing is performed once input data necessary for the relevant processing have all become available and the resources including operating devices necessary for the processing have been allocated”. Thus, the distinctive parallel processing capability of data driven information processor 101 will be enjoyed to the fullest extent if the necessary input data always reside in the cache memory.
FIG. 6 is a block diagram showing a schematic configuration of an external memory control unit 121 in a conventional data driven information processor 101. The external memory control unit 121 includes a cache memory unit 134 and an external memory interface 135. Cache memory unit 134 and external memory interface 135 operate in synchronization with each other.
External memory interface 135 is connected with an external memory port TM, and has access to an external main memory 137. External main memory 137 is formed of SDRAM 116, 117 and others. Cache memory unit 134 has a cache memory 136 therein, which has small capacity compared to external main memory 137.
Cache memory 136 stores a copy of a portion of data stored in external main memory 137. Cache memory unit 134 keeps track of a data address in external main memory 137 corresponding to the copy held in a respective data region of cache memory 136.
The data stored in external main memory 137 are referred to and updated indirectly via cache memory 136 in accordance with a content of instruction code OP of packet PA_FC flowing in external memory control unit 121. That is, cache memory 136 functions like a peep hole through which external memory 137 is observed. The data stored in external memory 137 cannot be referred to or updated directly by packet PA_FC.
If the content of instruction code OP of packet PA_FC indicates update of data stored in external main memory 137, only the copied data in cache memory 136 is updated, and the data stored in external main memory 137 is not updated. This causes mismatch between the data stored in external main memory 137 and the data stored in cache memory 136.
However, cache memory unit 134 checks for each piece of data whether the data stored in cache memory 136 and the data stored in external main memory 137 match (this state is called “clean”) or mismatch (this state is called “dirty”). External memory control unit 121 performs prescribed operations based on this management information, and maintains so-called data coherency making the indirect data updating operations look like update of the data stored in external main memory 137 over the long term.
If the instruction code OP of packet PA_FC is an access instruction to external main memory 137, external memory control unit 121 refers to the management information held in cache memory unit 134 and determines whether desired data exists in cache memory 136. It is called cache “hit” when there exists the desired data in cache memory 136, while it is called cache “miss hit” if the desired data is not in cache memory 136.
In the case of cache “hit”, external memory control unit 121 executes an access (reference/update) to cache memory 136. In particular, if the access to cache memory 136 is for update, cache memory unit 134 changes the management information of the relevant data to “dirty”.
On the other hand, in the case of cache “miss hit”, external memory control unit 121 copies desired data from external main memory 137 to cache memory 136. This operation is called “upload”. For the upload, it is necessary to eliminate a piece of data from within cache memory 136 to secure a data region for uploading.
Which data region to select is determined by cache memory unit 134. Although there are a variety of selecting methods, it is common to select a piece of data having not been accessed for the longest time. Once the data to be eliminated is determined, cache memory unit 134 refers to the management information of the relevant data, and determines whether the data is “clean” or “dirty”. If the data is “clean”, external memory control unit 121 uploads the desired data to the relevant region of cache memory 136. The processing is thus completed.
If the data is “dirty”, external memory control unit 121 writes the data to be eliminated from cache memory 136 back into its original region in external main memory 137. This operation is called “storeback”. After completion of the storeback, external memory control unit 121 performs upload. At the time of cache “miss hit”, external memory control unit 121 executes an access (reference/update) to cache memory 136 as in the case of cache “hit” after completion of the upload.
With the conventional external memory control unit 121 as described above, however, if packet PA_FC indicates an access instruction to external main memory 137 and cache memory unit 134 determines as cache “miss hit”, then packet PA_FC should be queued in external memory control unit 121 from the time when the determination of cache “miss hit” is made until the time when an access to cache memory 136 is started.
After the determination of cache “miss hit”, packet PA_FC should be queued in the worst case for a total period of time of: time for searching data to be eliminated for the upload to cache memory 136; time for determining “clean/dirty” of the data being eliminated; and, when the data is “dirty”, time for storing the data being eliminated back to external main memory 137 and time for uploading the desired data from external main memory 137 to cache memory 136. Among them, the storeback and upload are performed with respect to external main memory 137 whose access rate is low, making the queuing time of packet PA_FC extremely long.
Such a long queuing time of packet PA_FC within external memory control unit 121 delays the operation in operating unit 127, and hence delays the start time of processing of the subsequent instructions. Further, although data driven information processor 101 is characterized by the parallel processing by pipeline processing, when another packet PA_FC arrives from behind the packet PA_FC being queued in external memory control unit 121 due to the cache “miss hit”, processing of the another packet PA_FC should also be waited. This would degrade the overall parallel processing capability of data driven information processor 101.
The basic characteristics of data accesses in the image processing, audio processing, network protocol processing and others are that they are conducted regularly, and that data to be accessed next is predictable. In the image processing, data are basically accessed sequentially along the scanning line direction of the image frame. In the audio processing, data are basically accessed sequentially along the time axis direction of the audio data. In the network protocol processing, data in a network frame, particularly the payload data, are basically accessed in ascending order of offset value from the header of the frame.
In the conventional data driven information processor as described above, even if the data accesses are performed regularly, cache “miss hit” would steadily occur upon access to external main memory 137, thereby inevitably degrading the parallel processing capability of data driven information processor 101.