1. Field of the Invention
The present invention relates to an image processing apparatus and method, which apply predetermined image processing to an optically scanned document image.
2. Description of the Related Art
Conventionally, a form recognition technique for recognizing an image of a form (to be also referred to as a query form hereinafter) input via a scanner or the like, and automatically classifying the query form to similar forms is popularly used. In such form recognition technique, image data of the query form scanned by, for example, a scanner is divided into a plurality of areas having attributes, and feature amounts are extracted from the respective areas. The plurality of areas may often have different attributes. Furthermore, form format data of the query form is generated based on the extracted feature amounts. Similarities between the form format data of the query form and those of registered forms which are registered in advance are calculated, and a registered form with the highest similarity is determined as the form recognition result. In the method of calculating the similarities of form format data, various techniques have been developed.
Japanese Patent Laid-Open No. 2000-285187 discloses a form recognition apparatus which executes form recognition based on form format data obtained by adding an interest level of a table block obtained by image feature amount extraction means to the form format data. As described in this reference, the similarity output as the form recognition result can assume a value approximate to sensuous impressions.
Japanese Patent Laid-Open No. 2001-109842 discloses an optical character scanning apparatus which exploits an identification color area on a form in form identification. As described in this reference, since the identification color area on a form is used in form identification in place of a form ID using letters insignificant for a user, easy-to-see forms separated by colors are provided to the user, and scanning processing can be executes simultaneously.
Furthermore, Japanese Patent Laid-Open No. 2000-285190 discloses a form identification apparatus which identifies a form by narrowing down a search range by classifying, in advance, registered forms having a plurality of features such as recognition results of color information, ruled line information, and the like using common features. As described in this reference, a large number and many types of forms having a plurality of features can be identified.
However, stamps such as “urgent”, “approved”, and the like may be put on or seals may be attached to forms as paper media in terms of management. When such forms are scanned by a scanner or the like, a part where a stamp or seal is put is selected to undergo area division, and form format data including an unwanted element is generated. A technique for dividing an image into areas is an elemental technology, which is popularly used not only in form identification processing, but also in general image processing such as character recognition processing, tilt correction processing, and the like. Therefore, unwanted elements such as a stamp, seal, and the like are desirably removed before various kinds of image processing are executed. However, the aforementioned references do not particularly describe such issues.