This invention relates generally to classifying documents and more particularly to using a plurality of classifier engines to classify documents.
The growth of computer technology has provided users with many capabilities for creating electronic documents. Electronic documents can also be obtained from image acquisition devices such as scanners or digital cameras, or read into memory from a data storage device (e.g., in the form of a file). Modern computers enable users to electronically obtain or create vast numbers of documents varying in size, subject matter, and format. These documents may be located on a personal computer, network, or other storage medium.
With the large number of electronic documents accessible on computers, particularly through the use of networks such as the Internet, classifying these documents enables users to more easily locate related documents. Document classification is an important step in a variety of document processing tasks such as archiving, indexing, re-purposing, data extraction, and other automated document understanding operations. In classification, a document is assigned to one or more sets or classes of documents with which it has commonality—usually as a consequence of shared topics, concepts, ideas and subject areas.
A variety of document classification engines or algorithms have been developed in recent years. Performance of these engines varies from engine to engine, but is generally limited due to computers being historically poor at performing heuristic tasks. Common commercial document classifier engines use a single technology and approach to solve the classification problem. Expert users can tune these classifier engines to obtain better results, but this requires significant classifier training with high quality example documents.
Accordingly, it would be desirable to have a document classifier that overcomes these drawbacks.