1. Field of the Invention
The present invention relates to an output first-in first-out (or FIFO) data transfer control device for controlling transfer of arithmetic results by a geometric arithmetic core included in a geometric arithmetic processor for performing 3D graphic processing to an output FIFO and to outside the geometric arithmetic processor.
2. Description of the Prior Art
Referring now to FIG. 21, there is illustrated a block diagram showing the structure of a geometric arithmetic processor including a prior art output FIFO data transfer control device. In the figure, reference numeral 10 denotes a geometric arithmetic core (or geometric arithmetic engine) for performing 3D graphic processing, 20 denotes an AGP port that is an interface for connecting a host CPU (not shown) disposed outside the geometric arithmetic processor 100 with the geometric arithmetic processor 100, 5 denotes a RC port or output control unit that is an interface to a secondary bus 6, 30 denotes a PCI bridge between the AGP port 20 and the output control unit 5, and 40 denotes an output FIFO (or OFIFO) data transfer control unit for controlling data transfer from each processing unit included in the geometric arithmetic core 10 to the output control unit 5 and data transfer from the output control unit 5 to the secondary bus 6. FIG. 22 shows a block diagram showing the structures of the prior art geometric arithmetic core 10, the OFIFO data transfer control unit 40, and the output control unit 5 as shown in FIG. 21. In FIG. 22, reference numeral 11 denotes an integer processing unit or IPU, 111 denotes a data output register (or DRIA) for storing data on an arithmetic result from an integer ALU of the IPU 11, 112 denotes a data output register (or DRIS) for use with a shifter used for performing integer arithmetic operations, 114 denotes a tristate buffer, 12a to 12d denote first to fourth floating-point arithmetic units (or FPU0 to FPU3), 121a denotes a data output register (or DRFA) for storing data on an arithmetic result from a floating-point ALU of the FPU012a, 122a denotes a data output register (or DRFM) for storing data on an arithmetic result from a floating-point multiplier of the FPU012a, and 124a denotes a tristate buffer. Needless to say, each of the remaining floating-point processing units FPU112b to FPU312d includes a DRFA, a DRFM, and a tristate buffer.
Reference numeral 421 denotes a transfer mode setting section for setting a transfer mode identifying which at least one of the IPU 11 and the plurality of floating-point processing units FPU012a to FPU312d is to transfer data to the output control unit 5, 431 denotes a Full flag checking section for receiving a Full flag from the output control unit 5 and for determining if the OFIFO data transfer control unit can transfer data to the output control unit 5, 441 denotes an O-bus data input section for writing data furnished onto the O-bus 3 into an address register thereof if the data is an address, and for writing the data into a data register thereof otherwise, 451 denotes a WCR control section for controlling a word counter or WCR showing the size of each burst upon the data transfer to the output control unit 5, 461 denotes a data output section for performing the data transfer while controlling the data transfer to the output control unit 5, 51 denotes an output FIFO (or OFIFO) section included in the output control unit 5, and 511 denotes an address storage section for storing a starting address of data stored in one corresponding OFIFO 512. The output control unit 5 can include eight OFIFO sections 51. The data output section 461 can generate and furnish a Valid flag indicating whether or not the value of the data register within the O-bus data input section 441 is valid, an address flag indicating whether or not the data temporarily stored in the O-bus data input section 441 is an address, and a final flag indicating whether or not the data is the last one of each burst, to the output control unit 5 including the eight OFIFO units 51, as well as the data. The final flag is also a kickoff signal for triggering the output control unit 5 to transfer the data to a rendering LSI (not shown) by way of the secondary bus 6.
Next, a description will be made as to the operation of the prior art output FIFO data transfer control device according with reference to FIG. 23. FIG. 23 is a timing chart showing the operation of the prior art output FIFO data transfer control device. Assume that instructions of data transfer to one OFIFO 512 are sequentially issued as follows:
(1) data transfer instruction (A); destination code ofifo0: from IPU to OFIFO (data 1)
(2) data transfer instruction (B); destination code ofifo7: from FPU0, FPU1, and FPU2 to OFIFO (data 2, 3, and 4)
(3) data transfer instruction (C); destination code ofifo3: from FPU0 and FPU1 to OFIFO (data 5 and 6)
(4) data transfer instruction (D); destination code ofifo0: from IPU to OFIFO (data 7)
(5) data transfer instruction (E);
destination code ofifof: from FPU0, FPU1, FPU2, and FPU3 to OFIFO (data 8, 9, 10, and 11)
(6) data transfer instruction (F); destination code ofifo3: from FPU0 and FPU1 to OFIFO (data 12 and 13)
Each of the plurality of floating-point processing units FPU012a to FPU312d can operate according to SIMD (single instruction stream, multiple data stream) instructions and process a plurality of data when one instruction is issued. Each of the plurality of data transfer instructions (A) to (F) shown can be issued by one microcode. For example, the data transfer instruction (B) directs FPU012a, FPU112b, and FPU212c to simultaneously perform arithmetic operations and to furnish arithmetic result (i.e., data 2, 3, and 4) to one or more OFIFOs 512 within the output control unit 5 in the order of FPU0, FPU1, and FPU2.
Every time a microcode is executed and a data transfer instruction such as one of the plurality of data transfer instructions (A) to (F) as mentioned above is issued, either the IPU 11 or at least one of the plurality of floating-point processing units FPU012a to FPU312d associated with the data transfer instruction can furnish IPUouse or FPUouse to the OFIFO data transfer control unit 40. As shown in FIG. 23, when the data transfer instruction (B) is executed and FPUouse is asserted so that FPUouse becomes state 1, and, after that, FPUouse is negated after the data transfer instruction (A) is executed first and IPUouse is asserted so that IPUouse becomes 1, and, after that, IPUouse is negated, a hold signal is asserted low. Since a plurality of data processed according to an SIMD instruction can be sent out on the signal O-bus 3 when arithmetic instructions for the same processing unit are issued sequentially as in the case that the data transfer instruction (C) is executed immediately after the execution of the data transfer instruction (B), the execution of the next data transfer instruction before reading out all data associated with the previous data transfer instruction can result in overwriting all the data stored in the corresponding data output registers, such as DRFA and DRFM of each floating-point processing unit, with new arithmetic results. To avoid the overwriting, it is necessary to assert the hold signal so as to cause the geometric arithmetic core 10 to enter the wait state in which it stops instruction pipeline processing, as shown in FIG. 23. If the hold signal is asserted after either the IPU 11 or at least one of the plurality of floating-point processing units FPU012a to FPU312d associated with a data transfer instruction issued furnishes IPUouse or FPUouse to the OFIFO data transfer control unit 40, the OFIFO data transfer control unit furnishes a read enable signal to sequentially read all data associated with the data transfer instruction from all corresponding processing units. The prior art output FIFO data transfer control device thus needs much time for data transfer to the OFIFO because it frequently needs to cause the geometric arithmetic core 10 to enter the wait state in which it stops instruction pipeline processing to avoid the overwriting of old arithmetic results associated with a previous data transfer instruction, which has been executed immediately before the execution of a current data transfer instruction, with new arithmetic results produced by the execution of the current data transfer instruction.
A problem with the prior art output FIFO data transfer control device constructed as above is therefore that although the IPU and the plurality of floating-point processing units FPU0 to FPU3 included in the geometric arithmetic core can simultaneously perform arithmetic operations, the geometric arithmetic core frequently needs to enter the wait state in which it stops instruction pipeline processing until it sends out data from each processing unit onto the O-bus, thus increasing the time required for data transfer to the FIFO and decreasing the processing capabilities of the geometric arithmetic core.
The present invention is made to overcome the above problem. It is therefore an object of the present invention to provide an output FIFO data transfer control device capable of continuously executing instructions while transferring data on arithmetic results to an OFIFO without having to cause a geometric arithmetic core to enter the wait state in which it stops instruction pipeline processing.
In accordance with one aspect of the present invention, there is provided an output FIFO data transfer control device comprising: a plurality of intermediate buffers respectively disposed in a plurality of processing units included in an arithmetic core that operates based on an instruction pipeline, each of the plurality of intermediate buffers storing data on an arithmetic result produced by each of the plurality of processing units; an output control unit including one or more output FIFOs each of which receives data furnished by each of the plurality of processing units and temporarily stores the data therein, the output control unit furnishing data stored in the output FIFOs to outside the output FIFO data transfer control device in response to a predetermined signal applied thereto; a write/read pointer generating unit for, when an instruction of data transfer from at least one of the plurality of processing units to the output FIFOs is issued upon execution of a microcode, generating a write pointer identifying a specific location where data on an arithmetic result produced by at least one of the plurality of processing units associated with the instruction is to be stored in the intermediate buffer of at least one of the plurality of processing units, and for generating a read pointer identifying a specific location where data, which is written into the intermediate buffer according to the write pointer, is to be read out of the intermediate buffer of at least one of the plurality of processing units; a transfer mode setting unit for setting a transfer mode identifying which at least one of the plurality of processing units is to transfer data on an arithmetic result upon the execution of the microcode, and for sequentially furnishing a read enable signal to at least one of the plurality of processing units so as to read out the data from the intermediate buffer of at least one of the plurality of processing units; at least a bus on which the data is sent out in response to the read enable signal by at least one of the plurality of processing units; a data input unit for receiving the data sent out on the bus unit and for writing the data into a register thereof; and a data output unit for furnishing the data written into the register of the data input. unit to the output FIFOs of the output control unit.
In accordance with a preferred embodiment of the present invention, the write/read pointer generating unit can cause the arithmetic core to stop instruction pipeline processing to inhibit overwriting of old data with new data when the write/read pointer generating unit determines that the intermediate buffer of each of the plurality of processing units is full. Preferably, the write/read pointer generating unit determines whether or not the intermediate buffer of each of the plurality of processing units is full, according to a relationship between the write pointer and the read pointer.
In accordance with another preferred embodiment of the present invention, the intermediate buffer of each of the plurality of processing units has a size of 8 words.
In accordance with another preferred embodiment of the present invention, the arithmetic core includes one integer processing unit and a plurality of floating-point processing units. The write/read pointer generating unit can generate a set of write and read pointers for use with the intermediate buffer of the integer processing unit, and generate another set of write and read pointers for use with the intermediate buffers of the plurality of floating-point processing units.
In accordance with another preferred embodiment of the present invention, the output FIFO data transfer control device includes a bus for use with the integer processing unit and another bus for use with the plurality of floating-point processing units.
Preferably, the transfer mode setting unit can set a transfer mode according to a multiple-bit signal furnished by the arithmetic core upon the execution of the microcode, the multiple-bit signal identifying which at least one of the plurality of processing units is to transfer data on an arithmetic result.
In accordance with another preferred embodiment of the present invention, the output FIFO data transfer control device further comprises a DMA/posting transfer switching unit for switching between DMA transfer and posting transfer when transferring the data from the output FIFOs to outside the output FIFO data transfer control device.
In accordance with another preferred embodiment of the present invention, the output FIFO data transfer control device further comprises a word-counter register control unit including a word counter showing the size of each burst of data transfer to the output FIFOs, and a register whose one bit is assigned to a final flag indicating that data to be transferred is the last data of each burst. The DMA/posting transfer switching unit can switch between DMA transfer and posting transfer by determining whether the final flag is furnished to the output control unit including the output FIFOs according to a value set to the register of the word-counter register control unit upon the execution of the microcode. The output control unit can transfer data stored in the output FIFOs to outside the output FIFO data transfer control device in response to the final flag.
In accordance with another preferred embodiment of the present invention, the output FIFO data transfer control device further comprises a full checking unit for determining whether or not each of the output FIFOs is full with the time determined by predetermined information indicating whether an interface for use with data transfer from the output FIFOs to outside the output FIFO data transfer control device is an AGP or PCI bus.
Further objects and advantages of the present invention will be apparent from the following description of the preferred embodiments of the invention as illustrated in the accompanying drawings.