1. Field of the Invention
The present invention relates to a page segmentor for dividing a document page into areas each having a single feature, the division being performed as preprocessing for pattern recognition.
2. Description of the Prior Art Conventional page segmentors are known wherein a page is divided into areas each having character strings written in one direction, such as a laterally (i.e., character strings across the page) or vertically (i.e., character strings written down the page, as in a Japanese document) written document.
In such a page segmentor, a page is divided into predetermined areas, and character strings within each area are scanned. In this case, an accumulated value of information representing the number of character portions and an accumulated value of information representing the number of noncharacter portions are calculated. The direction of the character string is discriminated according to the presence or absence of a position where the accumulated value changes. The dividing method may be corrected, or the divided areas may be merged to obtain areas consisting of only character strings of one direction.
In a conventional page segmentor of this type, pages including figures and tables cannot be classified because it is difficult to discriminate, e.g., a bar graph from a vertically written character string.
When a character string is inclined with respect to the standard direction, the scanning direction is corrected to a direction along the inclination direction of the character string, or the inclination direction is calculated according to the accumulated values. Therefore, page segmentation requires a long period of time.
In addition, the character string is scanned time-serially, which results in a longer page segmentation time.