a. Rendition processes of various kinds are widely practiced. Their purpose is as follows.
Typical current-day input images to be printed are sometimes represented by data for three additive primary color lights, typically red, green and blue; and sometimes instead for three or four subtractive primary colorants, typically cyan, magenta, yellow and black; and sometimes for other color spaces, including for example the Dillinger hue-plus-gray or hnk system originating at the Hewlett Packard Company. In all such cases most typically the data are stated for multiple, typically 255, visual levels.
Such high-bandwidth input is impossible to reproduce dot for dot, or pixel for pixel, with a low-bandwidth printing machine. By this I mean any machine that operates in binary color--or even in multilevel color but with only relatively few levels (e. g. a maximum of a few dots) per pixel.
A process that deals with this problem is called "rendition". In general, rendition forces the average color of at least several printed pixels to approximate the originally desired input colors averaged over a roughly corresponding group of positions.
Rendition includes so-called "dithering" methods, generally used for graphics such as broad fields of solid color, and "error diffusion"--which as mentioned above ordinarily provides better results for photograph-like images.
b. Error diffusion is a familiar conventional rendition process generally attributed to Floyd and Steinberg, but modified by many other workers for application to color images. In printing or copying machines of the type under consideration, heretofore the universal way of implementing Floyd-Steinberg error diffusion is to process one row 10 (FIG. 1) at a time, starting at one side of the image pixel-data array 21 and moving 22 along that row--dealing with each generalized pixel 11 in turn--to the other side.
After processing of each row 10 is complete, the next row below is taken up in a like manner. Accordingly the overall processing propagates row-wise downward 23.
In this procedure, as is well known to those skilled in the art, each pixel 11 starts out with its initial color values established in a data file or otherwise, as for instance by acquisition in real time from a scanner. (The term "values" here merely means numbers, as distinguished from the "value" parameter used in some color spaces.) For example each pixel may have three, four or five associated values, one for each color or calorimetric parameter employed by the particular system that is in use.
To these values in each pixel 11 will be added--unless the pixel is in the top row of the image--so-called "errors" for the same colors respectively, generated by certain previously processed nearby pixels 12-15 (FIG. 2). Most usually the number of such earlier-processed contributors is four, but some systems operate using more, or fewer--even one.
These additions from the nearby pixels 12-15 are summed into separate memories for the various colors, for the pixel 11 of interest during (in systems of which I am aware) the times when the system is processing those earlier pixels. It is particularly important to note that, while an earlier pixel 12 is being processed, the system reaches forward in the image data array to place errors for the various colors into memory cells for a pixel 11 that will be processed later; and similarly for the other earlier pixels 13-15.
For each pixel 11 of interest, error values are thus received from several (four, in the drawings) specific previously processed pixels 12-15 in turn, as those particular pixels are taken up for processing. This leads to an accumulated error sum, for each color, which is completed only during processing of the last of those earlier pixels to be processed--i. e., the pixel 15 which is immediately upstream (to the left, in the drawing) in the processing sequence, along the same row 10.
Now when it is time for the pixel 11 of interest to be processed, this accumulated error sum is added into the original input image values (typically much larger), and then the grand total is evaluated. If the sum exceeds one or more of certain programed thresholds, the system prints a dot (or in some cases plural dots, or no dot) of a particular color in the current pixel. That dot has a specific color shade, and corresponding value (again, simply a number).
The difference between the accumulated input sum and the color value of the actual printed dot is denominated the "whole error". This "whole error" is next divided among neighboring pixels 16-19 (FIG. 3) which have not yet been processed. Analogous distributions, as will be understood, were the origins of the errors inherited by each pixel 11 from previously processed pixels 12-15 as mentioned above.
The number of destination pixels 16-19, into which error is distributed, typically equals the number of previously mentioned pixels 12-15 from which error is distributed (i. e., again, most usually four). For each particular one of the target pixels 16-19, the "whole error" is multiplied by a respective fraction and then truncated to the next smaller integer, which is then added into that particular one of the target pixels 16-19.
In such a process, error values propagate through the data array in a manner reminiscent of fluid diffusion through a permeable medium--whence the term "error diffusion".
Fractions are commonly selected that add up to roughly unity, so that the error flow neither blows up nor damps out. By setting thresholds and maximum-print levels, the designer or user can program the output to have--for example--two, three or four levels.
In at least some prior software, and for purposes of the present document including the claims, a particular pixel 11 being processed is called the "current pixel". By the phrase "being processed" I mean at least the operation of distributing error out to other pixels. The four neighbors into which error is distributed are called the "next pixel" 16, "left pixel" 17, "center pixel" 18 and the "right pixel" 19.
The first-mentioned of these, the "next pixel", is traditionally "next" in two different senses: it is immediately adjacent to the current pixel, and it is also the next to be processed--after the current pixel. (As will be seen, I retain this historical nomenclature even though in accordance with my invention the meaning in one of these senses is not applicable.)
The second of these meanings is related very closely to the order of processing, namely along the row 10 longitudinally and--from the earliest convention--in the direction 22 left to right. As the shading (FIGS. 2 and 3) suggests, processing actually progresses along a path through a series of "next pixels", each "current pixel" being the final contributor to its "next pixel".
As soon as that final contribution to a "next pixel" is made, that "next pixel" has received all the error contributions it will ever receive--so a natural next step is to process that "next pixel" as the succeeding "current pixel". In other words, from an opposite perspective: to implement a strategy of longitudinal processing along each row 10 in turn, in a natural manner, advantageously the final error distribution into each pixel should be made at a time when that pixel is a "next pixel". This process, of distributing into the next pixel last, runs right into the processes of print decision for that pixel and then distribution of remaining error from that pixel.
When that next pixel thus becomes the current pixel, it similarly distributes error into the new left, center and right pixels 17-19 (FIG. 3) and into the new next pixel 16. The shading suggesting the path 22 of processing is accordingly extended by one pixel rightward along the row 10, in a succession of natural steps.
Row-wise longitudinal processing certainly qualifies as conventional wisdom, for it arises in an extremely pervasive fashion: it is the manner in which data are generated by a row scanner (by far the best known and most common type) and the manner in which data are queued for display (from the earliest video), and also the manner in which data are most commonly stored and transfered.
For two reasons, however, left-to-right processing is not at all exclusive. First, it is arbitrary: an image pixel grid (though of course not the image information itself) is perfectly symmetrical, and the data are therefore amenable to perusal and processing in either direction. Second, even within an established processing algorithm or apparatus it is known to analyze the data along a processing path that is left-to-right sometimes and right-to-left at other times.
Indeed alternation between the two directions, as for instance in a serpentine path that reverses direction in every other row, reduces directional-pattern artifacts in certain types of error diffusion. Such bidirectional processing is therefore popular for error diffusion in some schools of thought.
Therefore for purposes of this document it will be understood that the "next pixel" is not always to the right of the current pixel. Rather it is in the same row, and adjacent, and in a certain sense along a direction of processing. (That sense, however, as will be seen later does not necessarily mean along a row which is being processed.)
Analogously the designations "right pixel" and "left pixel" are not to be limited to their literal meanings. Rather they are hereby defined as pixels disposed relative to the current pixel downward and, respectively, in and opposite to a direction of processing (where once again the "direction of processing" does not necessarily mean along a row which is being processed).
Because error diffusion is inherently progressive, and therefore to a necessarily large extent directional, the occurrence of directionality artifacts has always posed problems. A major line of development in this field is the introduction of techniques for breaking up or concealing directionality.
In addition to the use of serpentine paths an important technique is introduction of randomness into the processing. Random sequencing, random weights, random spatial distribution of the error, and numerous other methods are extensively discussed in the patent and other literature. It is important that any new overall strategy for implementing error diffusion be compatible with at least some such randomization techniques.
Understanding of these and many other sophisticated enhancements of error diffusion is beyond the scope of this document, as those theoretical analyses--while in general compatible with my invention--are not at all necessary to understanding or practice of my invention. For further details on those advanced theories I therefore refer the reader to the literature, an excellent starting point being Ulichney, Digital Halftoning (MIT 1987, 1993).
c. Conventional software implementations of error diffusion, as mentioned above, call for processing one row 10 at a time, from one side of the image to the other. Such processing is commonly performed by a printer driver (software) in a host computer.
In this environment the number of memory cells, both short- and longterm memory--and also the amount of hardware available for the desired computations--is ordinarily ample for the tasks at hand. We need not discuss such considerations as conserving memory of various types, or the cost of computing-hardware elements needed to perform the work.
The absolute speed of error-diffusion processing in this environment of course varies with the performance of the host computer. Considered instead in terms of relative speed or efficiency, this type of image rendition--in comparison with, for example, dither methods of rendition--by the general-purpose host CPU is relatively very slow.
Consumer demand for higher throughput in economical printers, copiers etc. continues to rise, and the market-place for these machines intensely competitive. Therefore it is very undesirable to be limited by the speed of a host computer.
This is a particularly serious matter for manufacturers and vendors of printers and copiers, since it interferes with ability to advertise a particular throughput--without having to hedge as to the unknown speed of the host. Therefore it is extremely desirable to increase overall printing or copying throughput in inexpensive image-related devices, and also to isolate this performance to a certain extent from that of a computer to be obtained and connected by the customer.
In thus increasing throughput, however, it is crucial to refrain from running up the price of the printer or copier unduly, since this would degrade the overall competitive position of the product in the market. These considerations are particularly aggravated by the fact that four-pixel error-diffusion itself, as already indicated, is notoriously slow.
Its slowness is due partly to performing a significant amount of arithmetic for each pixel--but this is accomplished rather quickly using internal registers of the system central processor. Of equal or greater interest for present purposes is the time required for reading information from relatively longterm memory, and also for storage, as such, of the results of calculation--back into the relatively longterm memory.
I refer to time taken up directly by the electronic processes involved in copying or shifting each of the several error-data values into several pixels other than the one being processed. There are three main types of transactions of the sort here under discussion: reading image data from a source device (e. g., scanner etc.) or more commonly some sort of input storage, reading error data from earlier-processed pixels 12-15, and writing (summing) error data into later-to-be-processed pixels 16-19. Image processing in general, whether by software or hardware, tends to require ample data storage and correspondingly large numbers of storage transactions.
d. Substitution of hardware/firmware--It is known, in general, to replace a software module with dedicated hardware, fabricated with instructions for data processing. Indeed this has been done for error diffusion--first in so-called "firmware", usually a read-only memory (ROM) integrated circuit feeding the instructions to a microprocessor; and later in an application-specific integrated circuit (ASIC) that not only has the instructions but also obeys them. Possibly such hardware implementations might instead be carried out in, merely by way of example, a field-programmable gate array (FPGA), or a programmable logic array (PLA).
Any of such devices when manufactured with a suitable program can typically achieve higher speed than corresponding or analogous software running in a general-purpose computer. Such a speed improvement may be accomplished in part simply through design of the circuit particularly for a given task--but speed may be enhanced also through various specific strategies.
One such known strategy, for example, is direct memory access (DMA), which moves data to and from external, intermediate storage devices such as dynamic random-access memory (DRAM) directly on command. In this type of data access it is not necessary to wait for a processor to receive and process instructions from an applications program to make a data transfer--which instructions sometimes in turn may entail still other steps required by an operating system (such as testing the propriety of the data transfer, or determining the best way of accomplishing it).
Another strategy available in custom hardware is performance of many tasks (including to an extent certain types of data transfer) simultaneously. Thus, implementing error diffusion in hardware starts out with some weighty advantages.
Modernly it is known to provide relatively large dedicated hardware/firmware modules that control numerous functions in a printer, copier or multifunction machine. This too may be deemed an advantage, since it facilitates sharing the cost of some hardware and this may confer a price advantage on the error-diffusion part of the overall module.
A straightforward hardware implementation of error-diffusion sequencing will now be analyzed in detail, particularly to facilitate later comparison with the methods of my invention. Here the focus, unlike that in analyzing software implementation, necessarily includes careful attention to the use of all the hardware resources.
In single-row longitudinal processing, a typical sequence may be as follows, given a representative series of three rows of pixels diagramed thus--
______________________________________ 1 2 3 4 5 6 7 8 a b C d e f g h m n o p q r s t. ______________________________________
We take up the sequence as the processing reaches current pixel C. (Another conceptualization of the flow of error data into DRAM for later analysis, and into flipflops or other storage for use in the actual printing process, appears in FIG. 4.)
Original image data for that pixel C were initially placed in DRAM cells for that pixel, and later read from DRAM into internal memory--not in the part of the hardware that performs error-diffusion calculations, but rather in one of its earlier blocks which performs preliminary processes such as color correction or adjustment, filtering or the like. Pixels are handed down from all such earlier blocks sequentially via internal registers to the error-diffusion modules.
Error from pixels 2, 3 and 4 has subsequently been summed into registers (for the several colors) for that same pixel C. Error was first distributed into pixel C at a time when pixel 2 was the current pixel and pixel C was the right pixel. Additional error later was similarly distributed into pixel C from pixel 3 (when C was the center pixel), and then still later from pixel 4 (when C was the left pixel).
At that last-mentioned point the sum of these three distributions into the pixel-C registers was stored into DRAM cells temporarily assigned to that pixel, releasing the working registers for other use. At this point the initial image value too, in purest principle, could have been summed into the DRAM cells--but this step, in consideration of the greater number of bits reserved for each image value and therefore the register sizes that must be carried along thereafter, is advantageously deferred until pixel-C processing time.
The memory transactions for all such distributions will be made more clear momentarily. They occurred during processing of "current pixels" in the row above pixel C, typically two or three thousand pixels earlier than the processing of pixel C.
Moreover, much more recently, error from pixel b has just been stored (or possibly just retained, as will be seen) in new temporary working registers for pixel C. That occurred when pixel C was the next pixel--during processing of pixel b as the current pixel.
Now to open processing of C as the current pixel, the sum of its previously accumulated error contributions is fetched from its DRAM cells and summed into the just-mentioned temporary pixel-C working registers. Also preferably added in at this point, though not necessarily into the selfsame registers, is the initial image value for the pixel.
That value is most typically handed down at this moment from an upstream processing block in the ASIC. The resulting new sum is compared with the system threshold or thresholds that have been worked out, for the number of levels to be printed, generally as in FIG. 5.
The printing decision is thereby made and stored or implemented as in FIGS. 4 and 5--leaving for pixel C, after subtraction of the print value, a residual or error value. That pixel-C "whole error" is multiplied by the planned weights, and its resulting fractional parts stored into respective working registers for the right pixel p, and also center pixel o (where it is summed into a previously stored right-pixel error component from pixel b).
The fractional part for left pixel n, however, is added to contributions into that pixel from pixels a and b (kept in temporary pixel-n working registers for the purpose), and the sum transfered to DRAM cells for pixel n. The pixel-n registers just mentioned were first opened when pixel n was the right pixel, for pixel a. The right-pixel error from pixel a, stored in the pixel-n registers, was then incremented, when pixel n was the center pixel, by the amount of the center-pixel error from pixel b.
DRAM storage, rather than registers, is used for the now-accumulated error for pixel n--when it is the left pixel, because this is the final contribution from the current row for pixel n. This pixel will not be taken up again until the system is working in the row "mnopqrst", at the time when pixel n is the next pixel for then-current pixel m.
If instead the left-pixel error were kept in registers for pixel n, this would imply that the same treatment would be appropriate for pixels o, p, q etc.--i. e. for virtually all the pixels in each row. This would require provision of an undesirably very high number of registers, equal (for each color) to the number of pixels in a row.
The rest of the errors in the pixel-C registers are the distributions for the next pixel d, which error fraction may be transfered to pixel-d registers. (Alternatively it is possible to roll over the registers used in processing each current pixel--i. e. to only, in general effect, rename the pixel-C registers as the pixel-d registers. As will be understood, such a simplified description omits many timing and housekeeping considerations.) Also summed into them are the image data and previously accumulated errors retrieved from DRAM for pixel d.
The immediately foregoing discussion of the mechanisms for storage of error from successive current pixels should make clear that error flows first into working registers, for the right and center pixels, and then (as a consolidated transfer of the cumulation) into the left-pixel DRAM cells as those pixels in turn progressively become the left pixel. This analysis thus shows how the previously discussed errors were summed into the pixel-C DRAM cells, too.
(The right- and center-pixel working registers, along with the registers for the several colors of the current pixel, amount to a group of just three working memories per color, for the cluster of pixels in FIG. 4. These preferably are, or can be, in effect rolled over or recycled into corresponding functions for all the other current pixels in the current row and the row just below--and eventually the entire image.)
In current-day systems with which I am involved, each DRAM so-called "fetch" or "store" typically transfers sixteen bits, and data for each color most commonly require only eight bits. Therefore each transaction for a pixel in a four-color system takes two (not one, or four) fetches or stores. The number of fetches or stores thus depends upon the number of colors or other color parameters in the system, and also on whether the sixteen-bit custom just described is being followed in the particular hardware involved. To avoid the resulting confusion, and to generalize the discussions in this document, in passages that require specifying or comparing how many DRAM transactions occur in prior-art systems or in my invention, I shall refer to DRAM retrievals and deposits--which I hereby define for purposes of this document to mean respectively the fetches and stores required to deal with whatever number of colors is present in one generalized pixel. This is not standard terminology, but is adopted in this document because the utility and basic sense of my invention relate most fundamentally to transfer of information per pixel--and do not depend on the system bit or byte architecture, or the number of bits per color, or indeed even the color space in use--i. e., the number of colors or color parameters which are being carried along in the calculations. Further I define a DRAM transaction as a retrieval or deposit.
In summary therefore the system in performing processing of one current pixel:
(1) performs one retrieval from DRAM--of the errors previously summed due to the roles of that same pixel as right, center and left pixel respectively; and PA1 (2) performs one deposit into DRAM--of the left-pixel error with previously summed errors for the same (left) pixel in its roles as right and center pixel; PA1 (3) typically receives and adds in the handed-down original image data; and PA1 (4) maintains open storage registers for at least the current, next, right, and center pixels. PA1 data-receiving means for receiving, from the input-data means, a group of plural rows of the data at a time, and PA1 calculating circuitry for processing the plural rows of data by error-diffusion procedures, to generate printing-decision data. PA1 data-receiving means for receiving, from the input-data means, a group of plural rows of the data at a time, PA1 error-receiving means for receiving the accumulated error from the previous-error memory means, and PA1 calculating circuitry for combining the plural rows of data and the accumulated error by error-diffusion procedures, to generate printing-decision data.
Thus two DRAM transactions, in different directions and at slightly different moments, are required in processing (e. g., four total fetches or stores, in a sixteen-bit CMYK system) of each pixel as a current pixel.
A straightforward implementation such as just described has certain undesirable characteristics or limitations. Recognition of these undesirable features has required extremely extensive experience and analysis; I therefore consider this recognition to be a part of the inventive contribution which is the subject of this document. Accordingly I shall reserve for the "SUMMARY OF THE DISCLOSURE" section, which follows, introduction of these limitations.
For purposes of definiteness, in this document operation such as just described is called "processing one row at a time" even though some of the error from the row containing the current pixel is distributed into the row below. This operation is also called "processing longitudinally along rows"--even though some of the error flow is generally transverse to the rows.
Directions of error flow are determined by the basic error-diffusion protocol itself; the present invention for the most part retains that protocol and the resulting error-flow directions, and as will be seen focuses on the sequenced selection of "current pixels". Thus the terminologies "one row at a time" and "longitudinally" are hereby defined specifically with reference to the successive selection of "current pixels".
e. Other kinds of processes--In various types of image processing other than error diffusion, strategies for working with several rows at a time are known. Examples are dithering (another kind of image rendition, as mentioned earlier) and filtering.
Each such strategy, however, requires very detailed and intricate implementation tactics which necessarily are extremely specific to the particular process involved. Heretofore to the best of my knowledge it has not been suggested to devise or apply any such plural-row strategy or tactics to error diffusion.
f. Conclusion--As suggested by the foregoing, speed of error-diffusion rendition has continued to impede achievement of excellent color printing of near-continuous-tone or other photograph-like images--in an economical printer or copier, at high throughput. Thus important aspects of the technology used in the field of the invention remain amenable to useful refinement.