Field of the Invention
The present invention relates in general to classification of images using Optical Character Recognition (OCR) technologies.
Description of the Related Art
In today's society, a wide variety of different kinds of documents are being used. The documents may be text-only, may contain text with various pictures, such as photographs, illustrations, or other graphical content, or may contain only graphical contents, depending on the documents' type and/or their usage. There is a need to automatically classify documents according to their types. Before classifying paper-based documents, these documents are usually processed to generate their digital representations (images). For example, paper-based documents may be scanned or photographed to generate their electronic image. Subsequently the images are classified. Such classification is an essential part of processing a stream of documents including various document types.
In general, a variety of methodologies may be used for classifying obtained images. Image classification can be based, for example, on an analysis of geometrical structures within the digital images. Some classification methodologies utilize mathematical morphological analysis. Some features, such as graphical features, may be identified within images and thereby an assumption may be made about the document containing some textual contents or some other type of contents. In some embodiments, if it is determined that the document image contains text, this text is subsequently recognized to obtain digital content of the document an ASCII code representation of the text.
Existing methods of document classification are often unreliable and inefficient. A continuing need exists for the advancement of classification methodologies that result in more efficient and reproducible classification of images across a wide variety of document types.