The present invention relates to a bitmap image processing technology, and more particularly, to a system and method thereof for text character identification.
A bitmap image is a map or organized array of pixels of bit-based information, often mapped at hundreds of pixels per inch. For example, a bitmap image with a resolution of 300 pixels per inch in an image that is 3 inches by 2 inches produces 900 pixels by 600 pixels for 540000 pixels total.
The bitmap image contains three types of image, black and white, grayscale, and RGB. A black and white bitmap image at least contains information for each pixel. A bit to a computer has only one value, 0 or 1, “yes” or “no”, and for a black and white bitmap image, black or white. Using the binary counting method, a grayscale bitmap (black to white) has a color depth of 256. Each pixel is represented as having one of 256 different grays (values), including black and white. A RGB bitmap image has a color depth of 16.77 million colors, 256×256×256. Each pixel has information for each of the 3 RGB colors. Each pixel in the bitmap array describes a layer of one of 256 values for red, one of 256 values for green and one of 256 values for blue.
Image processing systems/methods for reflective scanners, photocopiers, facsimile machines or digital cameras have been used in a variety of processes for digitalizing original documents into machine-readable versions (i.e., a bitmap image). The bitmap image usually comprises text, graphics or others. Extracting text from a bitmap image where the text is integrated with graphics is useful for optical character recognition (OCR), information retrieval (IR) or printing.
Two types of text extraction algorithms, such as a bottom-up approach and a top-down approach, have been developed. In the bottom-up approach, text regions are constructed by an agglomeration process that merges pixels to regions when those pixels are both adjacent to the regions and similar in property (most simply intensity). Each pixel in the bitmap image receives a label from the region growing process; pixels will have the same label if and only if they belong to the same region. In the top-down approach, text regions are constructed by recursively segmenting a whole or a portion of a bitmap image into smaller divisions, classifying some of the divisions into text regions or graphic regions, and re-segmenting the remaining divisions until sufficient text and graphics are classified.
The conventional extraction methods entail several limitations, particularly reduced accuracy. In addition, accuracy reduction is often caused by characteristics of an original document, such as variations of text size, text color, background images, languages, or oblique images. For example, in U.S. Pat. No. 6,519,362, “METHOD OF EXTRACTING TEXT PRESENT IN A COLOR IMAGE,” method of extracting text from a color image by performing one to five conversion methods to maximize the contrast between any text in the image and the rest of the image is disclosed. The above method fails to correctly extract text when the color of the text is near to that of the background. U.S. Pat. No. 6,574,375, entitled “METHOD OF DETECTING INVERTED TEXT IMAGES ON A DIGITAL SCANNING DEVICE,” discloses a method of extracting text based on predetermined parameters at 300 dpi resolution. The predetermined parameters are automatically adjusted according to variation of the resolution of the image to provide improved extraction accuracy. The resolution, however, may be independent from some of the predetermined parameters, i.e., a lower resolution image may not absolutely contain a smaller size of text than a higher resolution image, and thus, the above adjustment may cause an incorrect result. In view of these limitations, a need exists for a system and method of text character identification with improved accuracy.