Color jobs in production printing environments often require job specific settings in order to obtain a satisfactory rendition. When a job is repeated, it is highly desirable that the same settings be replicated so that the customer sees consistency from job to job. Color control settings required for a job are a common example of such job specific settings. Repeat jobs are commonly seen in production printing environments because print job submissions have a significant number of color critical elements in common with a previously completed print job. These repeat jobs can span the gamut from explicit requests for additional re-prints, identical re-submissions of the same job, and the submission of a fresh job that incorporates minor modifications over a previous submission, or even a new job that shares some crucial elements with a previous job. An example could be a logo image used for corporate identity that is contained in several jobs.
For instance, a customer ordering calendars utilizing a different personalized image for each of the months can request different versions of the calendar where a few of the images can be changed from version to version. Finally, a customer can also re-submit a complete job for which they have already received prints from a previous order without explicitly indicating a connection with the previous order. It is highly desirable that consistency is maintained across such repeat jobs in job-specific settings, such as those relating to color rendition.
One prior art approach utilizes a test pattern, which is printed with the original job and then reprinted with the reprint job. The measurements from these test patterns are then utilized to derive a transformation that ensures a consistent reproduction of the reprint job. The identification of a job as a reprint job is assumed to be accomplished by other means such as a repeat order of an archived job at the print shop. In all of these scenarios, it is desirable that the repeat job be automatically identified as such so that care may be taken to ensure that the prints for the new submission are consistent with those from the original order.
A print server can be utilized for storing print jobs printed by a printer. This print server enables each print job stored therein to be reprinted by the printer, in response to a user's particular reprinting request. The print server utilized for such prior art applications requires a high-capacity storage device for storing print jobs of high-volume image data, such as, for example, photographic images. In such cases, the number of print jobs stored in the print server may be restricted. A technique used for identification of repeat jobs must provide for a meaningful way to compare jobs in scenarios when they share significant number of critical elements, e.g. color, images, graphics etc, but are not necessarily identical. Additionally, the scheme should be efficient in the sense of memory and computation to facilitate scalability for large databases. Hash functions as described next, present an apposite solution.
A hash function may be described as a map from a “large” to a “small” set. In practice, a hash function is designed to map arbitrary digital inputs to a fixed length output binary string. The key idea behind hashing is that not all possible versions of the digital inputs can be encountered in practice and therefore the hash function can be designed such that, with high probability, the fixed length output binary strings are distinct for distinct inputs. Hash functions are widely used in compilers, databases, and cryptography.
In order to appreciate the use of hash functions, it is helpful to refer to a general mathematical model. For example, the variable X can be utilized to denote a set of inputs, and for any x in X, the function h(x) can represent an output binary hash value. For an n-bit hash value with a binary string of length n, the output binary hash value can be expressed in the form of equation (1) as follows:h(x)∈{0,1}n ∀ x∈X  (1)where 2n<|X|
In equation (1) above, |X| represents the number of elements in the set of inputs X. Note that the number of hash realizations is much lesser than the cardinality of the set of inputs X. A target application hence guides the construction of hash functions and their properties.
In particular, let X represent the set of all character strings with a maximum length. Let h(x) be defined as shown in equation (2)h(x)=f(x)mod M,∀ x∈X  (2)f(x)—sum of ASCII codes corresponding to each character in the string xM—a prime number
The hash can be simply computed as the remainder obtained upon division of a positive integer (sum of ASCII codes) by a prime number. Hence, in this case valid hash values are 0, 1, 2 . . . , M−1.
FIG. 1 illustrates a prior art representation 100 of a hash function in querying employee records. Consider the example depicted in FIG. 1, where a hash value is utilized to search employee records in a database by using the employees' names as query data. In FIG. 1, a group of keys 10, indexes 20 and key-value pairs 30 are depicted. The keys 10 are associated with name values 12, 14, 16, and the indexes 20 include indexes 18, 21, 22. Index 18 includes index values 24, 26 and index 21 includes index values 28 and 31. Similarly, index 22 includes index values 32 and 34. The key-value pairs 36 (e.g., records) include records 36, 38, and 40.
Thus, a table of hash values or indexes 20 can be maintained where the indexes 20 can be utilized to fetch the employee information using keys 10 and key value pairs 30 can be recorded. When a new “employee name” is queried, the hash function from the same is computed and used as an indexes 20. Given n-bit hashes, and appropriate data structures to store them, binary search can facilitate search that is O (log n) in most cases.
The cardinality of X is much larger than 2n, and the size of each x in X is large enough so that comparing x, x′ may be prohibitively slow. The two properties that such a hash function is desired to satisfy are uniform distribution and collision resistance. The hash function “uniformly” distributes the data across the entire set of possible hash values as illustrated, for example, by equation (3) below:
                                          Pr            ⁡                          (                                                h                  ⁡                                      (                    x                    )                                                  =                v                            )                                ≈                      1                          2              n                                      ,                  ∀                      v            ∈                                          {                                  0                  ,                  1                                }                            n                                                          (        3        )            
The probability space as shown in equation (3) is given by all possible realizations of the hash function over the set X. The collision resistance should be difficult ideally computationally infeasible to find/generate distinct inputs x, x′ such that h(x)=h(x′). In addition to the aforementioned mathematical requirements that are crucial for scalability across large data sets, the most significant practical requirement is for the hash computation to be extremely fast. The aforementioned example illustrates the virtues of hashing in retrieving large digital objects. Print jobs may be viewed as composed of a multitude of digital objects and hence a scheme based on object-level hashes can present a viable solution to enable their search and retrieval.
Based on the foregoing it is believed that a need exists for an improved method and system that achieves consistency across repeat jobs without requiring archival of the complete jobs available in order to ensure highly reliable identification of repeat jobs. Additionally, a need exists for providing a methodology that enables a time and memory efficient solution to the problem of identifying repeat print jobs utilizing object level hash tables.