Whenever information is electronically encoded as original, or clean, data, and then transferred from the data source to a data destination, noise may be introduced by the transfer process, resulting in alteration of the original, clean data and reception of the data by the data destination as noisy data. For example, when information is electronically encoded as a sequence of binary bits and sent through a communications network, such as a local Ethernet, to a destination node, there is a small probability that any given bit within the original, or clean, sequence of binary bits ends up being corrupted during transfer through the Ethernet, resulting in a “0” bit in the clean data being altered to a “1” bit in the noisy data received at the destination node, or a “1” bit in the clean data altered to a “0” bit in the noisy data received at the destination node. Although electronic communications media are classic examples of noisy channels, almost any type of data transfer or storage may result in data corruption, and therefore may be modeled as a noisy channel. For example, there is a small probability, associated with each bit of a block of binary data, that the bit will be altered when the block of data is stored and then retrieved from a hard disk, or even when the block of data is transferred from local cache memory to global random-access memory within a computer system. In general, redundant data, including check sums and cyclical redundancy codes, are embedded into data encodings to allow corrupted data to be detected and repaired. However, the amount of redundant data needed, and the accompanying costs and inefficiencies associated with redundant data, grows as the acceptable level of undetectable and/or unrepairable data corruption decreases.
In many cases, data corruption may occur prior to a point in a process at which redundant information can be embedded into a data signal to facilitate error detection and correction. As one example, a scanner that optically scans a printed document to produce a digital, electronic encoding of an image of the document can be viewed as a noisy channel in which discrepancies between the digitally encoded image of the document and the original document may arise. Such discrepancies may be introduced by a variety of optical and electronic components within the scanner that focus an optical image of the document onto a light-detecting component that transforms the detected optical image into an electronically encoded image of the document. When the digitally encoded image of the document is displayed or printed, different types of noise may be perceived as graininess, irregularities along the edges of text characters or objects within graphical images, uneven shading or coloration, random speckling, or other such visually distinguishable differences between the printed or displayed version of the digitally encoded data and the original document.
Denoising techniques can be applied to a noisy, digitally encoded image in order to produce a denoised, digitally encoded image that more accurately represents the original document that was scanned to produce the noisy, digitally encoded image. Denoising techniques may also be applied to data received over channels that are too noisy for recovery of the original data using the redundant data incorporated within the data to facilitate error correction. A wide variety of additional applications of denoising techniques have been identified and are well known. Recently, a discrete universal denoiser method (“DUDE”) has been developed for denoising the noisy output signal of a discrete, memoryless data-transmission channel without relying on knowledge of, or assumptions concerning, the statistical properties of the original, or clean, signal input to the discrete, memory-less channel. Even more recently, the DUDE method has been extended for denoising continuous tone images, such as scanned documents or images. The extended DUDE method is referred to as the “DUDE-CTI method,” or simply as the “DUDE-CTI.” The DUDE-CTI method is intended for use in a variety of image and data scanning, processing, and transfer applications. The DUDE-CTI method has shown promising results for certain types of noisy channels. An efficient DUDE-CTI depends on collections of symbol-occurrence statistics for each of a large number of different pixel contexts observed within an image. Because of the large number of possible contexts, an expedient approach is to coalesce individual contexts into groups, or classes, of contexts, and to then collect statistics on a context-class basis, rather than for individual contexts. However, such methods depend on efficient and effective assignment of contexts to context classes. Information-theory researchers, denoising-method developers, and manufacturers and users of a variety of data acquisition, storage, processing, and transfer devices that employ denoisers, continue to seek efficient and effective context-modeling methods and context-modeling components for generating context classes and symbol-prediction classes, assigning contexts to context classes and symbol-prediction classes, gathering statistics related to symbol occurrence within context-classes and symbol-prediction errors related to symbol-prediction-classes, and for noisy-symbol prediction.