The present invention relates to computer architecture and, in particular, to computer architectures dedicated to the printing or other display of graphical images.
An important aspect of the design of any computer architecture is the instruction set, or the way in which instructions are formatted or encoded. Each instruction normally consists of an opcodexe2x80x94that is an instruction as to what is to be done to some data, and one or more operand(s)xe2x80x94that is the data itself or the address at which the relevant data can be located. Generally the operand(s) occupy a fixed length since this provides the advantage that the length of the instruction is fixed with a large resulting simplification of the associated hardware.
In the environment of the present invention it is necessary to perform calculations on large streams of data of variable length. This is different from typical prior art arrangements which typically have fixed quantities of data. Therefore in the present invention there is a requirement that the length of the stream of data be specified in some way.
The present invention is based on the realisation that a convention instruction set requires the processor to (1) fetch the instruction, (2) decode the instruction, (3) fetch the operand, (4) carry out the calculation, and (5) store the result. There is an appreciable amount of time spent on steps (1) and (2) because they are inherently slow and must be performed for each instruction.
However, if these two steps could be combined into a single operation which was performed only once for a relatively long stream of data, then the average of this overhead for each calculation would be substantially reduced since the overhead would be amortized over a large number of calculations. This would reduce the average time for all calculations and result in faster overall operation.
In accordance with the present invention there is disclosed an image processor for executing a computer instruction set comprising a plurality of instructions each of which has an instruction opcode and at least one operand, wherein said opcode corresponds to a type of calculation to be performed on the operand(s), each operand is data to be processed in said calculation or specifies the address of said data, the result of said calculation represents processed image data, and each instruction includes a length field containing data specifying the number of items of data to be processed or, if said number exceeds the size of said length field, a predetermined location of a previously allocated storage area at which said number is stored, whereby said processor for each instruction processes the corresponding said number of data items to thereby facilitate processing of variable length streams of data.
Preferably the length of all said instructions is both fixed and equal and the time required for the stream data processing exceeds that the time to fetch and decode each of the instructions.
In the following detailed description, the reader""s attention is directed, in particular, to FIGS. 10 and 11 and their associated description without intending to detract from the disclosure of the remainder of the description.
1.0 Brief Description of the Drawings
2.0 List of Tables
3.0 Description of the Preferred and Other Embodiments
3.1 General Arrangement of Plural Stream Architecture
3.2 Host/Co-processor Queuing
3.3 Register Description of Co-processor
3.4 Format of Plural Streams
3.5 Determine Current Active Stream
3.6 Fetch Instruction of Current Active Stream
3.7 Decode and Execute Instruction
3.8 Update Registers of Instruction Controller
3.9 Semantics of the Register Access Semaphore
3.10 Instruction Controller
3.11 Description of a Modules Local Register File
3.12 Register Read/Write Handling
3.13 Memory Area Read/Write Handling
3.14 CBus Structure
3.15 Co-processor Data Types and Data Manipulation
3.16 Data Normalization Circuit
3.17 Image Processing Operations of Accelerator Card
3.17.1 Compositing
3.17.2 Color Space Conversion Instructions
a. Single Output General Color Space (SOGCS) Conversion Mode
b. Multiple Output General Color Space Mode
3.17.3 JPEG Coding/Decoding
a. Encoding
b. Decoding
3.17.4 Table Indexing
3.17.5 Data Coding Instructions
3.17.6 A Fast DCT Apparatus
3.17.7 Huffman Decoder
3.17.8 Image Transformation Instructions
3.17.9 Convolution Instructions
3.17.10 Matrix Multiplication
3.17.11 Halftoning
3.17.12 Hierarchial Image Format Decompression
3.17.13 Memory Copy Instructions
a. General purpose data movement instructions
b. Local DMA instructions
3.17.14 Flow Control Instructions
3.18 Modules of the Accelerator Card
3.18.1 Pixel Organizer
3.18.2 MUV Buffer
3.18.3 Result Organizer
3.18.4 Operand Organizers B and C
3.18.5 Main Data Path Unit
3.18.6 Data Cache Controller and Cache
a. Normal Cache Mode
b. The Single Output General Color Space Conversion Mode
c. Multiple Output General Color Space Conversion Mode
d. JPEG Encoding Mode
e. Slow JPEG Decoding Mode
f. Matrix Multiplication Mode
g. Disabled Mode
h. Invalidate Mode
3.18.7 Input Interface Switch
3.18.8 Local Memory Controller
3.18.9 Miscellaneous Module
3.18.10 External Interface Controller
3.18.11 Peripheral Interface Controller
Notwithstanding any other forms which may fall within the scope of the present invention, preferred forms of the invention will now be described, by way of example only, with reference to the accompanying drawings:
FIG. 1 illustrates the operation of a raster image co-processor within a host computer environment;
FIG. 2 illustrates the raster image co-processor of FIG. 1 in further detail;
FIG. 3 illustrates the memory map of the raster image co-processor;
FIG. 4 shows the relationship between a CPU, instruction queue, instruction operands and results in shared memory, and a co-processor;
FIG. 5 shows the relationship between an instruction generator, memory manager, queue manager and co-processor;
FIG. 6 shows the operation of the graphics co-processor reading instructions for execution from the pending instruction queue and placing them on the completed instruction queue;
FIG. 7 shows a fixed length circular buffer implementation of the instruction queue, indicating the need to wait when the buffer fills;
FIG. 8 illustrates to instruction execution streams as utilized by the co-processor;
FIG. 9 illustrates an instruction execution flow chart;
FIG. 10 illustrates the standard instruction word format utilized by the co-processor;
FIG. 11 illustrates the instruction word fields of a standard instruction;
FIG. 12 illustrates the data word fields of a standard instruction;
FIG. 13 illustrates schematically the instruction controller of FIG. 2;
FIG. 14 illustrates the execution controller of FIG. 13 in more detail;
FIG. 15 illustrates a state transition diagram of the instruction controller;
FIG. 16 illustrates the instruction decoder of FIG. 13;
FIG. 17 illustrates the instruction sequencer of FIG. 16 in more detail;
FIG. 18 illustrates a transition diagram for the ID sequencer of FIG. 16;
FIG. 19 illustrates schematically the prefetch buffer controller of FIG. 13 in more detail;
FIG. 20, comprised of FIGS. 20A and 20B, illustrates the standard form of register storage and module interaction as utilized in the co-processor;
FIG. 21 illustrates the format of control bus transactions as utilized in the co- processor;
FIG. 22 illustrates the data flow through a portion of the co-processor;
FIG. 23 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 24 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 25 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 26 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 27 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 28 illustrates an example of data reformatting as utilized in the co-processor;
FIG. 29 illustrates an example of data reformatting as utilized in the co-processor;
FIGS. 30 and 31 illustrate the format conversions carried out by the co-processor;
FIG. 32 illustrates the process of input data transformation as carried out in the co-processor;
FIG. 33 illustrates a further data transformation as carried out by the co-processor;
FIG. 34 illustrates a further data transformation as carried out by the co-processor;
FIG. 35 illustrates a further data transformation as carried out by the co-processor;
FIG. 36 illustrates a further data transformation as carried out by the co-processor;
FIG. 37 illustrates a further data transformation as carried out by the co-processor;
FIG. 38 illustrates a further data transformation as carried out by the co-processor;
FIG. 39 illustrates a further data transformation as carried out by the co-processor;
FIG. 40 illustrates a further data transformation as carried out by the co-processor;
FIG. 41 illustrates a further data transformation as carried out by the co-processor;
FIG. 42 illustrates various internal to output data transformations carried out by the co-processor;
FIG. 43 illustrates a further example of data transformation carried out by the co-processor;
FIG. 44 illustrates a further example of data transformation carried out by the co-processor;
FIG. 45 illustrates a further example of data transformation carried out by the co-processor;
FIG. 46 illustrates a further example of data transformation carried out by the co-processor;
FIG. 47 illustrates a further example of data transformation carried out by the co-processor;
FIG. 48 illustrates various fields utilized by internal registers to determine what data transformations should be carried out;
FIG. 49 depicts a block diagram of a graphics subsystem that uses data normalization.;
FIG. 50 illustrates a circuit diagram of a data normalization apparatus;
FIG. 51 illustrates the pixel processing carried out for compositing operations;
FIG. 52 illustrates the instruction word format for compositing operations;
FIG. 53 illustrates the data word format for compositing operations;
FIG. 54 illustrates the instruction word format for tiling operations;
FIG. 55 illustrates the operation of a tiling instruction on an image;
FIG. 56 illustrates the process of utilization of interval and fractional tables to re-map color gamuts;
FIG. 57 illustrates the form of storage of interval and fractional tables within the MUV buffer of the co-processor:
FIG. 58 illustrates the process of color conversion utilising interpolation as carried out in the co-processor;
FIG. 59 illustrates the refinements to the rest of the color conversion process at gamut edges as carried out by the co-processor;
FIG. 60 illustrates the process of color space conversion for one output color as implemented in the co-processor;
FIG. 61 illustrates the memory storage within a cache of the co-processor when utilising single color output color space conversion;
FIG. 62 illustrates the methodology utilized for multiple color space conversion;
FIG. 63 illustrates the process of address re-mapping for the cache when utilized during the process of multiple color space conversion;
FIG. 64 illustrates the instruction word format for color space conversion instructions;
FIG. 65 illustrates a method of multiple color conversion;
FIGS. 66 and 67 illustrate the formation of MCU""s during the process of JPEG conversion as carried out in the co-processor;
FIG. 68 illustrates the structure of the JPEG coder of the co-processor;
FIG. 69 illustrates the quantizer portion of FIG. 68 in more detail;
FIG. 70 illustrates the Huffman coder of FIG. 68 in more detail;
FIGS. 71 and 72 illustrate the Huffman coder and decoder in more detail;
FIG. 73 illustrates the process of cutting and limiting of JPEG data as utilized in the co-processor;
FIG. 74 illustrates the process of cutting and limiting of JPEG data as utilized in the co-processor;
FIG. 75 illustrates the process of cutting and limiting of JPEG data as utilized in the co-processor;
FIG. 76 illustrates the instruction word format for JPEG instructions;
FIG. 77 shows a block diagram of a typical discrete cosine transform apparatus (prior art);
FIG. 78 illustrates an arithmetic data path of a prior art DCT apparatus;
FIG. 79 shows a block diagram of a DCT apparatus utilized in the co-processor;
FIG. 80 depicts a block diagram of the arithmetic circuit of FIG. 79 in more detail;
FIG. 81 illustrates an arithmetic data path of the DCT apparatus of FIG. 79;
FIG. 82 presents a representational stream of Huffman-encoded data units interleaved with not encoded bit fields, both byte aligned and not, as in JPEG format;
FIG. 83, comprised of FIGS. 83A and 83B illustrates the overall architecture of a Huffman decoder of JPEG date of FIG. 84 in more detail;
FIG. 84 illustrates the overall architecture of the Huffman decoder of JPEG data;
FIG. 85 illustrates data processing in the stripper block which removes byte aligned not encoded bit fields from the input data. Examples of the coding of tags corresponding to the data outputted by the stripper are also shown;
FIG. 86, comprised of FIGS. 86A and 86B shows the organization and the data flow in the data preshifter;
FIG. 87, comprised of FIGS. 87A and 87B shows control logic for the decoder of FIG. 81;
FIG. 88, comprised of FIGS. 88A and 88B shows the organization and the data flow in the marker preshifter;
FIG. 89 shows a block diagram of a combinatorial unit decoding Huffman encoded values in JPEG context;
FIG. 90 illustrates the concept of a padding zone and a block diagram of the decoder of padding bits;
FIG. 91 shows an example of a format of data outputted by the decoder, the format being used in the co-processor;
FIG. 92 illustrates methodology utilized in image transformation instructions;
FIG. 93 illustrates the instruction word format for image transformation instructions;
FIGS. 94 and 95 illustrate the format of an image transformation kernal as utilized in the co-processor;
FIG. 96 illustrates the process of utilising an index table for image transformations as utilized in the co-processor;
FIG. 97 illustrates the data field format for instructions utilising transformations and convolutions;
FIG. 98 illustrates the process of interpretation of the bp field of instruction words;
FIG. 99 illustrates the process of convolution as utilized in the co-processor;
FIG. 100 illustrates the instruction word format for convolution instructions as utilized in the co-processor;
FIG. 101 illustrates the instruction word format for matrix multiplication as utilized in the co-processor;
FIG. 102 illustrates the process utilized for hierarchical image manipulation as utilized in the co-processor;
FIG. 103 illustrates the process utilized for hierarchical image manipulation as utilized in the co-processor;
FIG. 104 illustrates the process utilized for hierarchical image manipulation as utilized in the co-processor;
FIG. 105 illustrates the process utilized for hierarchical image manipulation as utilized in the co-processor;
FIG. 106 illustrates the instruction word coding for hierarchial image instructions;
FIG. 107 illustrates the instruction word coding for flow control instructions as illustrated in the co-processor;
FIG. 108 illustrates the pixel organizer in more detail;
FIG. 109 illustrates the operand fetch unit of the pixel organizer in more detail;
FIG. 110 illustrates a storage format as utilized by the co-processor;
FIG. 111 illustrates a storage format as utilized by the co-processor;
FIG. 112 illustrates a storage format as utilized by the co-processor;
FIG. 113 illustrates a storage format as utilized by the co-processor;
FIG. 114 illustrates a storage format as utilized by the co-processor;
FIG. 115 illustrates the MUV address generator of the pixel organizer of the co-processor in more detail;
FIG. 116 is a block diagram of a multiple value (MUV) buffer utilized in the co-processor;
FIG. 117 illustrates a structure of the encoder of FIG. 116;
FIG. 118 illustrates a structure of the decoder of FIG. 116;
FIG. 119 illustrates a structure of an address generator of FIG. 116 for generating read addresses when in JPEG mode (pixel decomposition);
FIG. 120 illustrates a structure of an address generator of FIG. 116 for generating read addresses when in JPEG mode (pixel reconstruction);
FIG. 121 illustrates an organization of memory modules comprising the storage device of FIG. 116;
FIG. 122 illustrates a structure of a circuit that multiplexes read addresses to memory modules;
FIG. 123 illustrates a representation of how lookup table entries are stored in the buffer operating in a single lookup table mode;
FIG. 124 illustrates a representation of how lookup table entries are stored in the buffer operating in a multiple lookup table mode;
FIG. 125 illustrates a representation of how pixels are stored in the buffer operating in JPEG mode (pixel decomposition);
FIG. 126 illustrate a representation of how single color data blocks are retrieved from the buffer operating in JPEG mode (pixel reconstruction);
FIG. 127 illustrates the structure of the result organizer of the co-processor in more detail;
FIG. 128 illustrates the structure of the operand organizers of the co-processor in more detail;
FIG. 129 is a block diagram of a computer architecture for the main data path unit utilized in the co-processor;
FIG. 130 is a block diagram of a input interface for accepting, storing and rearranging input data objects for further processing;
FIG. 131 is a block diagram of a image data processor for performing arithmetic operations on incoming data objects;
FIG. 132 is a block diagram of a color channel processor for performing arithmetic operations on one channel of the incoming data objects;
FIG. 133 is a block diagram of a multifunction block in a color channel processor;
FIG. 134 illustrates a block diagram for compositing operations;
FIG. 135 shows an inverse transform of the scanline;
FIG. 136 shows a block diagram of the steps required to calculate the value for a designation pixel;
FIG. 137 illustrates a block diagram of the image transformation engine;
FIG. 138 illustrates the two formats of kernel descriptions;
FIG. 139 shows the definition and interpretation of a bp field;
FIG. 140 shows a block diagram of multiplier-adders that perform matrix multiplication;
FIG. 141 illustrates the control, address and data flow of the cache and cache controller of the co-processor;
FIG. 142 illustrates the memory organization of the cache;
FIG. 143 illustrates the address format for the cache controller of the co-processor;
FIG. 144 comprised of FIGS. 144A and 144B is a block diagram of a multifunction block in a color channel processor;
FIG. 145 illustrates the input interface switch of the co-processor in more FIG. 144 illustrates, a block diagram of the cache and cache controller;
FIG. 146 illustrates a four-port dynamic local memory controller of the co-processor showing the main address and data paths;
FIG. 147 illustrates a state machine diagram for the controller of FIG. 146;
FIG. 148 is a pseudo code listing detailing the function of the arbitrator of FIG. 146;
FIG. 149 depicts the structure of the requester priority bits and the terminology used in FIG. 146.
FIG. 150 illustrates the external interface controller of the co-processor in more detail;
FIG. 151 illustrates the process of virtual to/from physical address mapping as utilized by the co-processor;
FIG. 152 illustrates the process of virtual to/from physical address mapping as utilized by the co-processor;
FIG. 153 illustrates the process of virtual to/from physical address mapping as utilized by the co-processor;
FIG. 154 illustrates the process of virtual to/from physical address mapping as utilized by the co-processor;
FIG. 155 comprised of FIGS. 155A and 155B illustrates the IBus receiver unit of FIG. 150 in more detail;
FIG. 156 comprised of FIGS. 156A and 156B illustrates the RBus receiver unit of FIG. 2 in more detail;
FIG. 157 comprised of FIGS. 157A and 157B illustrates the memory management unit of FIG. 150 in more detail;
FIG. 158 illustrates the peripheral interface controller of FIG. 2 in more detail.