The present invention relates to a multiprocessor system that distributes sequentially received data among a plurality of parallel processors, and particularly to a multiprocessor system usable for graphic processing for generating and displaying computer graphics images on a display. More particularly, the present invention relates to a multiprocessor system usable for texture mapping so as to paste a pattern or texture (e.g. marble, bark, aluminum, etc.) on a surface of a three-dimensional object.
Applicable fields of commuters have been expanded along with technical innovations in recent years. One example of such fields is xe2x80x9ccomputer graphicsxe2x80x9d for creating and processing graphics and images by a computer. Recently xe2x80x9cthree-dimensional graphicsxe2x80x9d that generates and displays a two-dimensional image of a three-dimensional object has come to be highlighted as computer display capability has been enhanced and graphic processing has been advanced. The three-dimensional graphics mentioned here represents by a mathematical model an optical phenomenon occurring, for example, when a three-dimensional object is illuminated by a light source, and generates an image by applying shading or gradation to the surface of the object based on the mathematical model, thereby displaying the object as a three-dimensional like image on a screen. Such a three-dimensional graphics technique has been used more and more in CAD/CAM in scientific, engineering, manufacturing and other application fields, and in various software development fields.
Generally, three-dimensional graphics processing includes two processes; xe2x80x9cmodelingxe2x80x9d and xe2x80x9crenderingxe2x80x9d. xe2x80x9cModelingxe2x80x9d is a process for inputting to a computer and editing data such as shape, color, and surface characteristics of a three-dimensional object (e.g. an aircraft, a building, a cat, etc.) to be rendered on a screen.
In other words, modeling is a process for fetching data associated with the object into a computer in a format usable for the subsequent rendering. There are several methods for the modeling, such as CSG (Constructive Solid Geometry), polygon, Bezier, metaball, etc.
xe2x80x9cRenderingxe2x80x9d is a process for generating an image according to what the object looks like when it is seen from a particular position. Specifically, it is a process for coloring and shading the surface of a three-dimensional object on the basis of three-dimensional data (e.g. a position of a light source relative to the object, highlight, shade, and color) created by a modeler. The rendering process is subdivided into the respective operations xe2x80x9ccoordinate conversionxe2x80x9d, xe2x80x9chidden surface removalxe2x80x9d, xe2x80x9cshadingxe2x80x9d, and xe2x80x9cmeasure for giving realityxe2x80x9d. The xe2x80x9ccoordinate conversionxe2x80x9d converts each coordinate value used to define a model to a coordinate value on a screen as seen from a viewpoint position. The xe2x80x9chidden surface removalxe2x80x9d determines portions of the model which are either visible or hidden from a current viewpoint. A typical example thereof is the Z Buffer method. The xe2x80x9cshadingxe2x80x9d determines color and brightness of each portion of the object to be seen under the lighting, and applies the determined color to a corresponding pixel on the screen. The xe2x80x9cmeasure for giving realityxe2x80x9d is usually carried out after rendering. This measure is required because (1) each graphics processing step up to the rendering is based on the assumption that a surface of an object is a completely smooth curved surface which can be represented by ideal planes or mathematical expressions or that colors on the surface are constant for each plane, and (2) an image obtained by the steps of coordinate conversion xe2x86x92 hidden surface removal xe2x86x92 shading is inorganic and far from the real object. One example of the xe2x80x9cmeasure for giving realityxe2x80x9d is mapping, that is, a process for pasting pre-created pattern data on a surface and/or plane of an object.
Mapping is important for realistically representing the material characteristics of an object. One example thereof is texture mapping in which texture means a pattern or image representing a feeling of material of an object surface (or design of the surface) without a thickness. Texture mapping is done by preparing a texture of each material (e.g. marble stone, bark, aluminum, or the like) as a bitmap in advance, and pasting the texture on a relatively smooth plane or curved surface of the object immediately after the rendering process is ended. According to such a texture mapping, the object having a monotonous surface can be made look like a real one having a complex surface. For example, in a flight simulator, an image of a scene photographed beforehand is texture-mapped on the background to generate a virtual reality picture quickly. It is also possible to make a simple solid model look like a metallic one or stone.
Texture mapping requires access to an enormous amount of data and involves a great deal of computation processing. This is because the amount of texture data (i.e. two-dimensional array data representing an image such as patterns to be pasted, background, etc.) is enormous. Consequently, real time operations for the texture mapping must unavoidably be done in parallel by providing a plurality of pipelines; it is almost impossible to cope with such texture mapping by a single unit. Parallel processing for the texture mapping is done, for example, by dividing a screen into a plurality of areas and distributing processing operations for each of the divided areas among parallel processors.
FIG. 6 shows a schematic block diagram of a hardware configuration of a multiprocessor system 100. In FIG. 6, a multiprocessor system 100 comprises a dispatching processor 10; a plurality of parallel processors 30-1 through 30-4 (four processors in FIG. 6); first-in first-out (FIFO) buffers 50-1 through 50-4, one for each of the parallel processors 30-1 through 30-4; and a merging processor 40. The dispatching processor 10 is a computing unit for distributing data sets (a unit of data to be distributed is referred to as a xe2x80x9cdata setxe2x80x9d in this specification) received sequentially to each of the parallel processors 30-1 through 30-4 according to the attribute of data, etc. The FIFOs 50-1 through 50-4 are disposed before the corresponding parallel processors 30-1 through 30-4, respectively, and enabled to store distributed data sets temporarily and send the data sets sequentially to the parallel processors 30-1 through 30-4 as they complete data processing for the preceding data sets. The merging processor 40 is a computing unit for integrating data sets distributed by the dispatching processor 10 and processed by the parallel processors 30-1 through 30-4 again and outputting the integrated data sets.
In the multiprocessor system, it is desirable that a load (i.e. an amount of data processed per unit time) is evenly imposed on each of the parallel processors. If loads are evenly distributed, it is possible to perform efficient parallel processing and the overall performance of the system would be improved in proportion to the number of pipelines (i.e. parallel processors). For example, the system 100 shown in FIG. 6 is considered to be well-balanced in design in case the total processing speed of all the parallel processors is equal to the speed of data input to the system 100. In other words, if the loads are unevenly distributed, it would be impossible to benefit from the distributed processing.
In the multiprocessor system 100 shown in FIG. 6, the amount of data (load) distributed to each of the parallel processors 30-1 through 30-4 may become uneven relatively often. For example, this occurs when data sets are unevenly distributed among the parallel processors as shown in Cases (a), (b), and (c) in FIG. 7. Also, even if the load imposed on each of the parallel processors 30-1 through 30-4 is even or uniform on average in a long time period, the load may often become uneven at a certain point of time depending on the order of data sets arranged. Such data sets distributed unevenly, when they are accumulated, may cause the FIFO of a particular pipeline to overflow with outstanding data sets. As a matter of course, the dispatching processor 10 cannot distribute succeeding data sets to the filled FIFO. Consequently, the flow of data sets in the whole system is stopped due to some busy pipelines. Thus, the system cannot benefit from the distributed processing.
If even distribution of loads is assured in a long time period, it would be possible to compensate a momentary uneven distribution of the loads with a design that increases the number of stages of each FIFO 50-1 through 50-4 infinitely (or large enough to be assumed as infinite). For example, in order to accommodate unevenness of loads shown in Case (a) in FIG. 7, the FIFO will have to be provided with at Least three stages or storage positions. Similarly, in order to accommodate unevenness of loads shown in Cases (b) and (c) in FIG. 7, the FIFOs will have to be provided with at least five and eight stages, respectively. Increasing the number of stages in an FIFO would be an easy way for accommodating unevenly distributed data.
However, even when one or some parallel processors are busier than other parallel processors and their FIFOs overflow, all the FIFOs of other parallel processors do not always overflow. Consequently, increasing the number of stages in an FIFO would result in increasing the number of unused or surplus stages. For example, when data sets are distributed to a multiprocessor system provided with FIFOs, each having eight stages, in a manner shown in Case (c) in FIG. 7, the number of used stages is fourteen while the number of unused stages is eighteen. The reason why a large number of stages are provided for an FIFO is mainly in preparation for possible uneven distribution of the data sets, rather than expecting that all the storage positions are always used. Mounting storage elements having low usage efficiency would be wasteful in circuit design and manufacturing.
Increasing the number of FIFO stages will also increase the gate size of the circuit because the size of FIFO is proportional to the product of data bit width, the number of stages, and the number of pipelines (=bit width (W)xc3x97number of stages (D)xc3x97number of pipelines (N)). A texture mapping LSI may be implemented by, for example, the ASIC technique. An FIFO having a huge number of stages (or bits) occupies a large chip area which becomes an obstacle to circuit design and causes the manufacturing cost to be increased accordingly.
In summary, the solution that increases the number of stages of an FIFO (or expands a data buffer) has a tradeoff, that is, a throughput of the entire system would be improved while some disadvantages would come out in the circuit design.
In order to solve the above problems, Japanese Published Unexamined Patent Application No. 9-185593 (U.S. Pat. No. 5,724,602) discloses a multiprocessor system 100 shown in FIG. 8 which includes:
(a) a dispatching processor 10 for distributing received data sets to each pipeline;
(b) a plurality of parallel processors 30 (30-1 through 30-4) provided for the respective pipelines to process data sets distributed thereto;
(c) a data FIFO 22 for temporarily storing one or more data sets received sequentially from the dispatching processor 10;
(d) a plurality of pointer FIFOs 60 (60-1 through 60-4), each of which is disposed at a pipeline, before a corresponding parallel processor 30 so as to temporarily store storage positions in the data FIFO 22 where data sets distributed to the corresponding parallel processor 30 are stored;
(e) a priority encoder 24 for determining the storage positions of data sets in the data FIFO 22 and writing the determined storage positions in a pointer FIFO 60 of the corresponding pipeline;
(f) a plurality of multiplexers 21 (21-1 through 21-4), each of which is disposed at a pipeline, between a corresponding parallel processor 30 and pointer FIFO 60 and enabled to pass a data set read from a storage position in the data FIFO 22 according to the output of the pointer FIFO 60 to the parallel processor 30; and
(g) a merging processor 40 for integrating data sets processed by each of the parallel processors 30.
The multiprocessor system 100 disclosed in the above-described Japanese Published Unexamined Patent Application No. 9-185593 enables the data FIFO 22 used to input and output data sets to be shared by all the pipelines, instead of providing a data FIFO 22 for each of the pipelines. Each pipeline is provided with a pointer FIFO 60 used to input and output the storage positions of data sets in the shared data FIFO 22. When a parallel processor 30 processes data, it first takes out a storage position from its own pointer FIFO 60, and then reads a data set from the storage position in the data FIFO 22. The pointer FIFO 60 provided for each pipeline may have a bit width enough to be able to identify the storage position in the data FIFO 22, and its size is thus smaller than that required to store data sets per se. Only one data FIFO 22 is required for the system 100 since it is shared by all the pipelines. According to the multiprocessor system 100 disclosed in Japanese Published Unexamined Patent Application No. 9-185593, therefore, a load imbalance can be eliminated without expanding the size of the data buffer unnecessarily.
Furthermore, due to the sharing, the multiprocessor system 100 disclosed in Japanese Published Unexamined Patent Application No. 9-185593 has no need to implement FIFOs having low usage efficiency so that its design and manufacturing costs are much reduced. Viewing from a different point, the multiprocessor system 100 disclosed in Japanese Published Unexamined Patent Application No. 9-185593 is faster than others having the same gate size.
However, the multiprocessor system 100 disclosed in Japanese Published Unexamined Patent Application No. 9-185593 still requires a pointer FIFO 60 for each parallel processor 30. It is also required to analyze from data characteristics how many stages the pointer FIFO 60 should have for each parallel processor 30. If the number of stages is not optimum, the performance of the system 100 is affected adversely.
In view of the above, it is an object of the present invention to provide a data processing system and a multiprocessor system that can have the optimal number of FIFO stages dynamically so that there is no need to analyze from data characteristics the number of FIFO stages so as to improve the system performance.
In order to achieve the above object, the data processing system of the present invention for distributing data sets received sequentially to a plurality of pipelines includes a data buffer having a plurality of storage positions for temporarily storing the data sets by defining target pipelines to which the data sets are to be distributed, a next pointer having a plurality of storage positions for temporarily storing second information relating to a storage position for a subsequent data set in said data buffer, and a read pointer for temporarily storing first information relating to a storage position for a preceding data set in said data buffer and for storing said second information after said preceding data set is read out from said data buffer. In the data processing system of the present invention, both of the data buffer and the next pointer are shared by the pipelines. The preceding and subsequent data sets can be read out sequentially from the data buffer according to the first and second information stored in the next pointer.
Furthermore, in the data processing system of the present invention, the read pointer is provided for each pipeline, and consists of a single storage position. The first information is updated to the second information after the preceding data set is read. Thus, the data sets can be read out from the data buffer and processed according to such information updated sequentially.
According to the present invention, a data processing system is provided which comprises a first buffer having N storage positions for storing data sets, and a second buffer having M storage positions each of which is associated with a storage position in said first buffer, wherein, when a preceding data set is stored in the n-th storage position of said first buffer and a subsequent data set is stored in the (n+a)-th storage position of said first buffer, the value (n+a) is stored in the n-th storage position of said second buffer as storage position information for said subsequent data set.
In this data processing system of the present invention, it is possible to specify that a data set to be read next is stored in the (n+a)-th storage position of the first buffer by reading the storage position information stored in the n-th storage position of the second buffer when a data set stored in the n-th storage position of the first buffer is read. The data processing system further comprises a read pointer for storing the storage position information read out from the second buffer, and a predetermined data set is read out from a storage position of the first buffer specified by the storage position information stored in the read pointer. In this case, the value (n+a) is stored in the read pointer and the subsequent data set is read out from the (n+a)-th storage position in the first buffer.
In the above data processing system, the number M of storage positions in the second buffer should be equal to or greater than the number N of storage positions in the first buffer. However, it is preferable that M is equal to N so that no wasteful storage position is provided.
The present invention further provide a data processing system for distributing data sets received sequentially to a plurality of pipelines which comprises a data buffer having a plurality of storage positions for temporarily storing said data sets by defining target pipelines to which said data sets are to be distributed, respectively, and a pointer having a plurality of storage positions corresponding to the storage positions of said data buffer, respectively, wherein, when a data set is stored in said data buffer, information relating to an empty storage position is stored in a storage position of said pointer corresponding to a storage position in which said data set is to be stored.
In this data processing system, the empty storage position information stored in the pointer indicates a storage position in the data buffer in which a data set is to be stored next. In order to store information on this storage position, a write pointer is provided for each pipeline. It is possible to store a data set received next in a predetermined storage position in the data buffer according to the information stored in the write pointer, by storing the information relating to the empty storage position in the write pointer.
The data processing system of the present invention may be used as a multiprocessor system for distributing sequentially received data sets to a plurality of pipelines, which comprises a dispatching processor for distributing the received data sets to the pipelines, a plurality of parallel processors, each of which is disposed at each of the pipelines and enabled to process a data set distributed thereto, a data buffer having a plurality of storage positions for temporarily storing one or more data sets output sequentially from said dispatching processor, a next pointer for storing information on a first storage position for a first data set in said data buffer and information on a second storage position for a second data set in said data buffer, said second data set being to be processed by a parallel processor after said parallel processor processes said first data set, a read pointer disposed before a parallel processor for each pipeline for sequentially storing said first storage position information and said second storage position information, a priority encoder for determining storage positions for said first and second data sets in said data buffer, and a plurality of multiplexers each of which is disposed between a parallel processor and said read pointer for each pipeline to read said first and second data sets sequentially from their storage positions in said data buffer according to said first and second storage position information stored in said read pointer, and pass said first and second data sets to the parallel processor.
In the multiprocessor system of the present invention, the next pointer has the same number of storage positions as that of the data buffer, and the storage positions of the next pointer are respectively associated with the storage positions of the data buffer. More specifically, the second storage position information is stored in a storage position of the next pointer associated with a storage position of the data buffer in which the first data set is stored. Then, after the first data set is read out from the data buffer, the second storage position information is read out from the next pointer, and a predetermined data set is read out from the data buffer according to this second storage position information.
The multiprocessor system of the present invention may further comprises a write pointer for temporarily storing information on a storage position in the data buffer in which a data set to be distributed to a parallel processor is stored. A data set received is stored in a predetermined storage position of the data buffer according to the information stored in this write pointer.
The multiprocessor system of the present invention may further comprises a merging processor for merging the data sets processed by the parallel processors.
Other objects, features, and advantages of the present invention will be recognized from the following detailed description of the preferred embodiments with reference to the accompanying drawings.