1. Field of the Invention
The present invention generally relates to an image processing method for processing image data obtained from a ruled document and, more particularly, to a method for deleting ruled lines contained in image data obtained by scanning a document and a character recognition method using such a method for accurately recognizing characters surrounded by or adjacent to the ruled lines.
2. Description of the Related Art
Many documents or preformatted sheets are provided with ruled lines or frame lines. In order to read characters within an area defined by the ruled lines or frame lines, it is common to distinguish the characters from the ruled lines or frame lines by providing different colors to the rules lines or frame lines and the characters, as is suggested in Japanese Laid-Open Patent Application No.56-9877. In this method, image information obtained by optically scanning a document having ruled lines or frame lines is classified into each color image information. To use this method, ruled lines or frame lines must be printed with a predetermined specific color by considering the spectral characteristic of scanners to be used. Thus, there is a problem in that cost for preparing or printing a document with colored ruled lines or frame lines must be added to the cost of the blank document. That is, an additional cost is required for printing the ruled lines or frame lines with a specific color. Additionally, it is inconvenient for a user to use a writing tool of a specific color which is different from the color of the ruled lines or frame lines when writing characters within the area defined by the ruled lines or frame lines.
On the other hand, in recent years, it has become common to print the ruled lines or frame lines on a regular print paper by a word processor. Japanese Laid-Open Patent Applications No. 61-196382 and No. 2-7183 suggest methods for separating characters from ruled lines or frame lines. In these methods, a histogram of high-intensity pixels in the image obtained from the document is produced. The ruled lines or frame lines are extracted from a peak of the histogram so as to delete the ruled lines or frame lines prior to performing a recognition process of characters in the area defined by the ruled lines or frame lines.
However, the method suggested in Japanese Laid-Open Patent Application No. 61-196382 has a problem in that an area defined by the ruled lines or frame lines cannot be accurately detected when the document provided with the ruled lines or frame lines is inclined with respect to the scanning direction since positions of the ruled lines or frame lines are detected from the histogram of high-intensity pixels (black pixels). Additionally, the method suggested in Japanese Laid-Open Patent Application No. 2-7183 has a problem in that an area of the ruled lines or frame lines for an unknown document cannot be accurately detected for an unknown document since an area divided by the ruled lines or frame lines cannot be accurately recognized. This is because, in the method suggested in Japanese Laid-Open Patent Application No. 2-7183, extending ranges of the ruled lines or frame lines are detected from projection patterns obtained by dividing the document area along a direction perpendicular to a direction of lines in the document.
When character recognition is performed on a document image including ruled lines or frame lines, the characters should be recognized, in most cases, by an area (hereinafter referred to as a ruled area) encircled by ruled lines or frame lines. In order to achieve this, the ruled lines or frame lines must be detected before the character recognition is performed.
As a method for recognizing the ruled area, it is common to recognize the ruled area as rectangular areas by extracting the vertical and horizontal ruled lines and using coordinate values of the inner sides of the rectangular areas. However, there is a problem in that the ruled area is recognized as an area smaller than the actual ruled area when the scanned image or the document is inclined, as described below.
A description will now be given in more detail, with reference to FIG. 1, of the above-mentioned problem. FIG. 1 shows the ruled area defined by horizontal ruled lines 300 and 301, and vertical ruled lines 303 and 304. If the ruled area is inclined as shown in FIG. 1, a width of each the rectangular areas 300a, 301a, 303a and 304a corresponding to the respective ruled lines 300, 301, 303 and 304 is increased in response to a degree of inclination. In FIG. 1, the ruled area recognized by using coordinate values of the inner sides of the rectangular areas 300a, 301a, 303a and 304a corresponds to the rectangular area defined by points Aa, Ba, Ca and Da. Since the actual ruled area corresponds to a rectangular area defined by points A, B, C and D, the recognized ruled area is smaller than the actual ruled area. Thus, in this method, parts of characters adjacent to the ruled lines may be encompassed within the rectangular areas, resulting in an inaccurate recognition of the characters.
Japanese Laid-Open Patent Application No. 3-172984 suggests a different method. In this method, similarly to the above-mentioned method, rectangular areas corresponding to vertical and horizontal ruled lines are extracted so as to define the ruled area. However, in this method, the ruled area is recognized based on the outer sides of the rectangular areas. That is, in this method, the ruled area is recognized as a rectangular area defined by points Ab, Bb, Cb and Db shown in FIG. 1. This rectangular area recognized as the ruled area is larger than the actual ruled area defined by the points A, B, C and D. In this method, characters are extracted by using the rectangular area defined by the points Ab, Bb, Cb and Db. Thus, the number of characters which are not accurately recognized are decreased.
However, in the method suggested in Japanese Laid-Open Patent Application No. 3-172984, when inclination of the document is large, the black runs corresponding to the characters contacting or adjacent to a ruled line may be combined with the black runs corresponding to that ruled line. As a result, the characters contacting or adjacent to the ruled line may not be accurately recognized. Additionally, there is a problem in that characters outside the ruled area defined by the actual ruled lines may be erroneously recognized as characters included in the ruled area since the assumed ruled area is larger than the actual ruled area.