1. Field of Invention
The invention relates to an image processing method and, in particular, to a method that separates photo/text from a document image using their color/intensity properties.
2. Related Art
In the coming digital era, the document image analysis technology has been widely used in processing digital images, such as identifying license plates, fingerprints, military satellite pictures, scanners, printers, words, and digital data. The quality and accuracy of the document image analysis technology directly affect the results of subsequent processing and whether one can save a lot of time and space. It can even affect the processing capability of the whole system. Therefore, all document image analysis techniques achieving similar functions are hopefully to find a method that can correctly and rapidly identify a document image.
Most of the current document image analyses focus on the text separation technique. Since most users select to use a source color or grey level as the color for text display, the color settings in text document are simpler. The research and development in text separation techniques are thus earlier and more complete. Nowadays, researches in text separation techniques are still focused on: using local grey-level statistics and the property that objects have concentrated grey-level energy to analyze a document image. For example, the methods of identifying text data in a black-and-white (BW) image remove extra information of each word to find representative stable parameters for text identification. In order to single out Chinese or English text from a document image with mixed Chinese and English, the user does not need to select one by one. The current document image analysis technique researches even include the identification of italic words, speeding up the identification speed and the accuracy.
Because of the advance in digitized information, most document images include text and photos in pure or mixed colors. As described above, the techniques in the prior art often focus only on grey-level document images; therefore, they are not suitable for those with text and photos in pure or mixed colors. It is not useful for subsequent processing. For example, when a color printer prints a document image with text in a source color and photos in mixed colors, the conventional document image analysis techniques are not very sensitive to the edges of source-color text data (e.g., black) and thus consider the source-color text as mixed-color photo data. Consequently, the printer has to use inks of the CMY colors to print even when printing a source-color text document. This does not only waste the color inks, the ink mixture also slows down the printing speed. The net effect on the printed text is only a mixture of three color inks, close to a source color but not exactly a source color. One therefore obtains a printed document image with color distortion, which is not allowed for research results that use different colors to represent numerical data.
In summary, the convention methods of using the property of concentrated grey-level energy to process document images are not sensitive to and thus not suitable for the identification of pure and mixed color data.
It is desirable to provide a photo/text separation method that can separate source-color and mixed-color data. Not only can it save processing time, it also reduces the waste of color inks.