1. Field of the Invention
This invention relates generally to methods and apparatus for printer performance tuning and in particular, it relates to methods and apparatus for printer performance tuning based on statistical analysis of page contents.
2. Description of Related Art
Page Description Language (PDL) interpreters and renderers are widely used in modern printers from desktop publishing to print shop production. For example, PostScript (PS) is an object-based PDL primarily for printing documents on laser printers, but it has also been adapted to produce images on other types of printers. In a printer using PostScript language, the PostScript language is ran by an interpreter, also known as a raster image processor (“RIP”), to render an image. Examples of PostScript language interpreters and renderers include Adobe's Configurable Postscript Interpreter (CPSI), Aladdin®'s GhostScript, etc. Other examples of PDLs include HP®'s Printer Control Language (PCL), Adobe®'s Portable Document Format (PDF), Microsoft®'s XML Paper Specification (XPS), etc.,
PDL interpreters and renderers have many possible settings that may affect a printer's performance. Some of these settings may be ideal for certain print jobs while being less than ideal for other print jobs. However, finding the best set of tuning parameters for all possible settings is an act of compromising among different objectives and guessing what a typical job will contain (e.g., text, image, or vector graphics).
While printer performance may be somewhat “manually” tuned by adjusting the settings of the interpreter on the fly to find the best performance for a particular job, it is not very practical in many cases. For example, a “manual” tuning may be achieved by first rendering some pages using default settings of the interpreter and collecting some data about the content and performance of those pages, then making some adjustment and printing more pages to see whether the performance has increased or decreased. This may work if the print job is very large and consisted of very similar pages, such that performance tuning that works well for the initial pages at the beginning of the print job will also be expected to work well for the remaining pages in the print job. Unfortunately this is often not the case, and the manual process of repeated printing, measuring, adjusting, printing, measuring, readjusting . . . is very labor intensive and time consuming before best performance settings may be found. Moreover, the problem with “manual” tuning is that it is a “static” tuning process that applies the same result for all pages of all printing jobs. It is not a dynamic process and it is virtually impossible to find a “one-setting-fits-all” solution.
It would be preferable that instead of the “manual” process described above, printer performance tuning can be a dynamic process based on a “self-learning” capability of a PDL interpreter, where the interpreter can learn and remember what settings work best for each page based on the content of the page, and applies this “self-learning” process while each page is being processed rather than after some set of pages has already been rendered.
The concept of self-learning and adjusting based on statistical data has been applied in other applications using the Bayes' theorem. For example, in email spam filtering technologies, “Bayesian spam filtering” which uses the Bayesian analysis has become a popular mechanism to distinguish illegitimate, or “spam”, emails from legitimate emails.
Bayesian spam filtering is based on the statistical finding that particular words have particular probabilities of occurring in spam emails and in legitimate emails. For example, the word “refinance” may have a high occurrence in spam emails but a very low occurrence in other emails. An email filter would not know these probabilities in advance, and must first be trained so it can build up the knowledge. To train the email filter, a user usually first needs to manually indicate whether a new email is spam or not. For all words in each training email, the filter will adjust the probabilities that each word will appear in spam or ham emails in its database. For example , Bayesian spam filters will typically have learned a very high spam probability for the word “refinance”, but a very low spam probability for words seen only in legitimate emails, such as the names of friends and family members.
After training, the word probabilities (also known as “likelihood functions”) are used to compute the probability that an email with a particular set of words in it belongs to either the spam or legitimate email category. Each word in the email contributes (or only the most interesting words contribute) to the email's spam probability. This contribution is called the “posterior probability” and is computed using Bayes' theorem. Then the email's spam probability is computed over all words in the email, and if the total exceeds a certain threshold (e.g. 95%), the filter will mark the email as a spam.
The initial training can usually be refined when wrong judgments from the software are identified (false positives or false negatives) when the user manually changes an email from the spam category to the legitimate email category or vice versa, which allows the filter software to dynamically improve its accuracy and adapt to new development in the spam emails.
It would be preferable to have a method of performance tuning where the printer can learn and remember what settings work best for each page based on the page content, where the process can be applied while each page is being processed rather than after some set of pages has already been rendered, and also the tuning data can be saved from one job and applied to future jobs.
It would also be preferable to have an improved printer performance tuning method that uses Bayesian analysis, which could provide a dynamic performance tuning based on statistical analysis of page contents.