1. Field of the Invention
The present invention relates to a method and apparatus for optimizing pagemaps in memory for printing and more particularly relates to a novel method of organizing the bit map presentation of the pagemap in memory of a RISC architectured processor utilized in conjunction with a high speed printer.
2. Description of the Related Art
The example system considered in this application is an IBM (International Business Machines Corporation) RISC (Reduced Instruction Set Computer) System/6000 processor controlling a print engine such as employed in the IBM Model 3900D Fanfold Duplex Printer. This is a high throughput (300 pages/minute) system printer. However, it should be understood that the novel method of organizing the bit map presentation of the pagemap in memory is applicable, with obvious modifications, to other RISC architecture processors being utilized for either printing or other applications requiring high throughput.
Many high speed printers use a Printer Control Unit for printing with custom hardware and logic for building a pagemap. Traditionally, for special jobs they work very fast when merging (OR'ing) data into the pagemap. (Called BitBLT, or Bit BLock Transfer.) Because of the turn-around-time in the design and implementation of custom hardware, it would be preferable to build the pagemap solely by use of the processing power of the RISC processor. This means that design changes and differing functional results may be implemented and tested in a far shorter time than with more traditional, custom hardware implementations. However, even with the power of a RISC architecture processor, performance can degrade seriously in special situations such as Advanced Function printing control. (The Advanced Function Presentation interface, which is employed when writing applications for the above-identified printer, is fully described in "AFP Application Programming Interface Programming Guide and Reference", and available as publication S544-3872-00, with the above title, from International Business Machines Corporation.)
The problem is exacerbated when the application has to access large amounts of memory in a random or pseudo-random manner. For example in printing, a pagemap requires access to large amounts of memory where the pagemap contains a binary bit image (bit map) representing the page. As an example, assuming a resolution of 300 dpi (drops or dots per inch), (110 dots/cm) and a pagemap representing an 8.5 by 11 inch printed page, (21.6 by .infin.28 cm), and 1 drop=dot=1 bit, then 8.5 times 300 times 11 times 300=8,415,000 bits or/8=greater than 1 mega-byte (1 MB) of memory. Moreover, if the resolution doubles, i.e., increases to 600 dpi, memory usage quadruples. This means a pagemap consumes over 4 MB of memory.
With the RISC System/6000 processor architecture (which along with memory will be discussed more fully hereinafter), whenever data is not available in a "layer" of memory, additional cycles of processor time are expended until the necessary or called for data is fetched for processing. For example, if the processor calls for data from the cache, one cycle is expended, if the data is not in the cache and must be retrieved from memory, then anywhere from 10 to 100 cycles of processor time may be expended. If the information must be retrieved from disk to memory, to cache to processor, then tens of thousands of cycles may be expended. Moreover, if this situation occurs often, thrashing occurs and the situation gets worse. (Thrashing can either occur between cache and memory or memory and disk. In either case, the processor is unable do useful work because it must wait until the data is loaded from a lower level of storage.) Many applications easily avoid this problem because of the nature of the data they manipulate. For example, the matrices used during matrix multiplication can be extremely large. By transforming one of the matrices from row major order to column major order allows the program to be written so that the memory is accessed sequentially. Sequential access is the fastest way to access memory using present day RISC processors.
So how should the pagemap be represented in memory? As a first way, a two dimensional array in row major order could be employed, which is the best organization for processors without data cache or virtual memory. This approach works well for processing data that runs horizontal to the pagemap; e.g., horizontal rows of text. However, if the rows of text happen to run vertical to the pagemap, there isn't enough data cache for more than an inch of text because the cache "lines" run perpendicular to the row of text. (Cache lines, which will be discussed later, run horizontally relative to a pagemap in row-major order.) Moreover, lines of text that extend from top to bottom of the page would suffer a data cache miss for ever single word load of the pagemap. In this kind of situation, the data cache misses cost about 18-20 cycles loss in processor time to effect a reload. This kind of performance degradation is substantial. (In certain models of the RISC System/6000, a data cache miss takes about 8 to 9 cycles if the cache line selected for replacement is "clean"; i.e., the cache line doesn't need to be written back to memory. It takes about 18 to 20 cycles if the cache line is "dirty". Since data is being accessed down the page, most cache lines are going to be "dirty"; i.e., contain data that needs to be written to memory before new data is loaded.)
The next best approach is to organize the pagemap as a two-dimensional array of words in column major order. With this approach, the words of memory are addressed sequentially down the page, rather than across the page. (This is the preferred organization for processors with data cache, but with no virtual memory.) As may be imagined, this approach works well for processing lines of text that are vertical to the pagemap, and with a sufficiently large number of Table look aside buffers TLB's, this approach would also work well for processing horizontal data.
(While TLB's will be discussed later in the section of this specification where memory organization is discussed, suffice at this time that the TLB entries point to the page numbers of recently-referenced virtual memory pages which have been loaded into real memory.)
Unfortunately, there are only enough TLB's to address a small fraction of the pagemap. As a result, when processing data that lays horizontal to the pagemap, a TLB miss will occur each time a new column of words is accessed. Every TLB miss can cost up to 100 cycles of lost processor time. Assuming that the width of the pagemap is about 11 inches by 300 dpi or 3300 pels or bits, then 3300/bits per word (32)=104 columns of data with each column of data being addressed by a different TLB. If the processor has only 64 TLB's (common for many RISC System/6000 models), then a TLB miss occurs every time for the next column because the TLB's are replaced on an approximation of a Least Recently Used (LRU) basis. The problem is worse for wider pagemaps and higher resolutions.
The problem in the processor is caused by a limitation of processor data cache available and Translation Lookaside Buffers (TLB's). For example, most models of the RISC System/6000 have 8 to 64 KB (kilo bytes) of L1 cache (highest level of cache memory) memory, and only 64 to 128 TLB entries capable of addressing respectively 256 to 512 KB's of virtual memory simultaneously. Thus, it is not possible to address an entire pagemap (1 to 4 MB of memory) simultaneously with only 64 to 128 TLB entries.
In certain of the prior art methods of printing pagemaps comprising bit images, U.S. Pat. No. 5,084,831, Jan. 28, 1992, to Morikawa et al describes a printing method in which a full bit map mode is employed for storing a page of data, and a secondary mode for a smaller quantity of data is illustrated. The selection of the strip map (secondary mode) or page mode is selected in accordance with the size of image data. In the strip map mode, the output of band buffers is alternately fed to the print engine, while one is printing the other is loading until the entire page is fully printed. (See col. 8 and FIG. 11.) There is no teaching of bit map data structure, nor how to account for data caching problems in a very high speed printer.
In U.S. Pat. 5,163,123 issued on Nov. 10, 1992, there is shown a printer having a bit map memory for storing dot image data corresponding to image data of one page sent from an external apparatus. In this scheme, a buffer memory is employed intermediate the bit map memory and the image forming means, such as a printhead. The writing means utilized takes the data from the bit map memory one scan at a time for reading into the buffer memory and outputs to the printhead at the same rate in response to a start signal from a signal generator. Since the buffer memory (which could be equated to cache memory even though it is DRAM, as set forth in the text of the patent), is outputted one scan at a time, there is no recognition of the problems outlined above and as resolved by the present invention, including data cache misses, etc., and special arrangement of the pagemap to help avoid the problem when data is fed to the print engine.
In U.S. Pat. 4,825,386, issued on Apr. 25, 1989 to Bogacki, there is disclosed a horizontal line processor of data to be printed out dot sequentially. Here, the horizontal line processor loads print instruction commands into selected locations in a full page bit map memory. However, there is nothing indicating the desired structure of the bit map memory to alleviate the problems alluded to heretofore.