The present invention relates to a document analysis method and, more particularly, to a document analysis method to detect BW/color areas.
Moreover, the invention relates to a scanning device to acquire documents.
Finally, the invention relates to a method for acquiring a document based on the analysis of the content of the document itself.
As is well known in the technical field of image processing, during its life an image is processed by a plurality of electronic devices, that create, acquire, display store, read and write the image itself.
The image data processing device, and the corresponding processing method deal with an image acquired by means of an image acquisition device, for example a scanner.
The image data so obtained are usually organized into a raster of pixels, each pixels providing an elementary image information.
In other words, images are, at the most basic level, arrays of digital values, where a value is a collection of numbers describing the attributes of a pixel in the image. For example, in bitmaps, the above mentioned values are single binary digits.
Often, these numbers are fixed-point representation of a range of real number; for example, the integers 0 through 255 are often used to represent the numbers from 0.0 to 1.0. Often too, these numbers represent the intensity at a point of the image (gray scale) or the intensity of one color component at that point.
An important distinction has to be made in the images to be processed between achromatic and colored images.
In fact, achromatic light has only one attribute, which is the quantity of light. This attribute can be discussed in the physic sense of energy, in which case the terms intensity and luminance are used, or in the psychological sense of perceived intensity, in which case the term brightness is used.
It is useful to associate a scale with different intensity levels, for instance defining 0 as black and 1 as white; intensity levels between 0 and 1 represent different levels of grays.
The visual sensations caused by colored light are much more richer than those caused by achromatic light. Discussion on color perception usually involves three quantities, known as hue, saturation and lightness.
1. Hue distinguishes among colors such as red, green, purple and yellow.
2. Saturation refers to how far a color is from a gray of equal intensity. Red is highly saturated; pink is relatively unsaturated; royal blue is highly saturated; sky blue is relatively unsaturated. Pastel colors are relatively unsaturated; unsaturated colors include more white light than do the vivid, saturated colors.
3. Lightness embodies the achromatic notion of perceived intensity of a reflecting object.
A fourth term, brightness, is used instead of lightness to refer to the perceived intensity of a self-luminous object (i.e. an object emitting rather than reflecting light), such as a light bulb, the sun or a CRT.
The above mentioned features of colors seem to be subjective: they depend on human observers"" judgment. In reality, the branch of physics known as colorimetry provides for an objective and quantitative way of specifying colors, which can be correlated to the above perceptual classification.
A color can be represented by means of its dominant wavelength, which corresponds to the perceptual notion of hue; excitation purity corresponds to the saturation of the color; luminance is the amount or intensity of light. The excitation purity of a colored light is the proportion of pure light of the dominant wavelength and of white light needed to define the color.
A completely pure color is 100% saturated and thus contains no white light, whereas mixtures of a pure color and white light have saturations somewhere between 0 and 100%. White light and hence gray are 0% saturated, contains no color of any dominant wavelength.
Furthermore, light is fundamentally electromagnetic energy in the 400-700 nm wavelength part of the spectrum, which is perceived as the colors from violet through indigo, blue, green, yellow and orange to red. The amount of energy present at each wavelength is represented by a spectral energy distribution P(I), as shown in FIG. 1.
The visual effect of any spectral distribution can be described by means of three values, i.e. the dominant wavelength, the excitation purity, and the luminance. FIG. 2 shows the spectral distribution of FIG. 1, illustrating such three value. In particular, it should be noted that at the dominant wavelength there is a spike of energy of level e2. White light, the uniform distribution of energy level e1 is also present.
The excitation purity depends on the relation between e1 and e2: when e1=e2, excitation purity is 0%; when e1=0, excitation purity is 100%.
Luminance, which is proportional to the integral of the area under such curve, depends on both e1 and e2.
A color model is a specification of a 3D color coordinate system and a visible subset in the coordinate system within which all colors in a particular range lie. For instance, the RGB (red, green, blue) color model is the unit cube subset of a 3D Cartesian coordinate system, as shown in FIG. 3.
More specifically, three hardware-oriented color models are RGB, used with color CRT monitors, YIQ, i.e. the broadcast TV color system that is a re-coding of RGB transmission efficiency and for downward compatibility with black and white television and CMY (cyan, magenta, yellow) for some color-printing devices. Unfortunately none of these models are particularly easy to use because they do not relate directly to intuitive color notions of hue, saturation, and brightness. Therefore, another class of models has been developed with ease of use as a goal, such as the HSV (hue, saturation, value)xe2x80x94sometimes called HSB (hue, saturation, brightness) HLS (hue, lightness, saturation) and HVC (hue, value, chroma) models.
With each model is also given a means of converting to some other specification.
As stated above, the RGB color model used in color CRT monitors and color raster graphics employs a Cartesian coordinate system. The RGB primaries are additive primaries; that is the individual contributions of each primary are added together to yield the result. The main diagonal of the cube, with equal amounts of each primary, represents the gray levels: black is (0,0,0); white is (1,1,1).
Following such gray line implies the change of the three Cartesian value R, G and B at the same time, as shown with a point-dotted line in FIG. 4A; this situation weights the computational charge of the image processing steps requiring the individuation of gray regions.
The RGB model is hardware-oriented. By contrast HSV (as well as HSB or HLC) model is user-oriented, being based on the intuitive appeal of the artist""s tint, shade, and tone. The coordinate system is cylindrical, as shown in FIG. 4B.
The HSV model (like the HLC model) is easy to use. The grays all have S=0 and they can be removed from an image data raster by means of a cylindrical filter in proximity of the V axes, as shown in FIG. 5; moreover, the maximally saturated hues are at S=1, L=0.5.
The HLS color model is a reduced model obtained from the HSV cylindrical model, as shown in FIG. 6; the reduction of the color space is due to the fact that some colors cannot be saturated. Such space subset is defined is a hexcone or six-sided pyramid, as shown in FIG. 7. The top of the hexcone corresponds to V=1 which contains the relatively bright colors. The colors of the V=1 plane are not all of the same perceived brightness however.
Hue or H, is measured by the angle around the vertical axis with red at 0xc2x0 green at 120xc2x0 and so on (see FIG. 7), Complementary colors in the HSV hexcone are 180xc2x0 opposite one another. The value of S is a ratio ranging from 0 on the center line (V axis) to 1 on the triangular sides of the hexcone.
The hexcone is one unit high in V, with the apex at the origin. The point at the apex is black and has a V coordinate of 0. At this point, the values of H and S are irrelevant. The point S=0, V=1 is white. Intermediate values of V or S=0 (on the center line) are the grays. It is therefore immediately apparent the simplicity of use of the HSV or equivalent color space in order to obtain the gray regions.
Adding a white pigment corresponds to decreasing S (without changing V). Shades are created by keeping S=1 and decreasing V. Tones are created by decreasing both S and V. Of course, changing H corresponds to selecting the pure pigment with which to start. Thus, H, S, and V correspond to concepts from the perceptive color system.
The top of the HSV hexcone corresponds to the projection seen by looking along the principal diagonal of the RGB color cube from white toward black, as shown in FIG. 8.
In FIG. 9 is shown the HLS color model, which is defined in a double-hexcone subset of the cylindrical space. Hue is the angle around the vertical axis of the double hexcone, with red at 0xc2x0. The colors occur around the perimeter: red, yellow, green, cyan, blue and magenta. The HLS space can be considerated as a deformation of HSV space, in which white is pulled upward to form the upper hexcone from the V=1 plane. As with the single-hexcone model, the complement of any hue is located 180xc2x0 farther around the double hexcone, and saturation is measured radially from the vertical axis form 0 on the axis to 1 on the surface. Lightness is 0 for black (at the lower tip of the double hexcone) to 1 for white (at the upper tip).
Many hardware and software packages are currently available in the technical field of the electronic image processing which provide for image data processing methods and corresponding devices. However, it should be noted that only few, if any, operate in both the personal computer/work station field as well as in the embedded devices field.
In fact, the embedded devices have a plurality of needs which turn into tight limitations for the image processing devices themselves. Particularly, the image processing in an embedded environment seeks:
to reduce the size of the image data in order to limit the memory area employed by the image data processing devices;
to increase the amount of any text portion comprised in a document that can be OCR""able, i.e. it should be possible to acquire and understand such portion by means of an Optical Characters Recognitor (OCR);
to get as final result of the image data processing device an image viewable and printable, which is close to the original acquired image.
Known document analysis that tried to fit the above requirements have the problem of being computationally very heavy and not suited for embedded applications where processing power and memory requirements are stringent and important.
So, even if these solutions may perform an acceptable analysis of the document, they are not applicable in an embedded environment.
The main purpose of the known document analysis is the extraction of features and the classification of text and images in the analyzed documents. Examples of analysis used in this technical field are known from the publication xe2x80x9cDocument Image Analysisxe2x80x9d to L. O""Gorman and R. Kasturi, IEEE Computer Society Press, which is a collection of all the most relevant papers regarding document analysis.
All the known approaches deal with the recognition of different types of areas on a page. The areas are normally classified into regions of text, photo and line art. The page is then divided into these different areas (normally in a mutually exclusive way) and each is treated in a different way. In other terms, the known document analysis deal with understanding the xe2x80x9ctypexe2x80x9d of information that is on the page.
These solutions tend to sub-divide the page into mutually exclusive regions that contain different type of information.
Other known devices deal with decomposed documents, i.e. documents translated into a plurality of elementary image information called pixels. Such devices provide a treatment of the decomposed document as a whole, or at least are able to reconstruct the information they need only reprocessing the input document format.
An illustrative and not limiting example is a BW fax machine. If such device can deal only with BW data and the document contains a mixture of sparse color and BW data, the fax machine image processing device must be able to reconstruct a single BW page from the pieces of the decomposed original document.
A known way to comply with the embedded environment requirements leads to peripheral devices that support only the specified features of a particular product; that is how cost and performance are satisfied.
However, none of the known solutions deals with the problem of maintaining the original appearance of the document, and therefore no accent is posed on the recognition of the color itself on the document and what can be done once this color content is known.
One object of the present invention is that of providing a dual path distinction method for two different layers, i.e. the BW and color layer, identifying the features used to classify as colorful or not a certain group of pixel of a raster image.
The reason for doing this can be explained in the following way. As an example, in a document as a magazine article, there are areas of color, for example photographs and colored text and highlighted areas which include bright colors and which a user would like to retain as colors. There are also areas, typically backgrounds areas which are either very light or dark, that even if one could argue that they have a color content, can be equally be well represented with only two colors, i.e. black and white.
Moreover, the color information content of background area, even if not negligible, could be of no interest with respect to the BW content. This is the case of the so-called xe2x80x9cbusiness textxe2x80x9d: the information content of the image data is superimposed to a color background content which can be ignored, without loosing any useful information.
After the separation between these areas, the data in each area could be processed differently: color data could be compressed in a lossy fashion, whereas the BW data could be binarized, and the user would not see a big difference in the quality of the document.
The solution idea behind this invention is that of providing a dual path distinction method which could create a BW and a color layer starting from a single input data sheet.
According to this solution idea, the invention relates to a document analysis method using BW/color areas detection as defined in the enclosed claim 1.
Moreover, the invention relates to a scanning device, as defined in the enclosed claim 9.
Finally, the invention relates to a method for acquiring a document based on the analysis of the content of the document itself, as defined in the enclosed claim 15.