As more business is being conducted electronically, documents are increasingly being converted into electronic form. For example, documents may be scanned by a document scanner to produce an electronic document including digital images of the documents. Electronic documents are beneficial because they require less physical space then paper documents. Furthermore, electronic documents can be easily backed up to prevent accidental loss.
However, as the volume of electronic documents increases, it becomes more difficult to organize the documents. Manually organizing the documents is burdensome and inefficient. One solution to the problem is to perform optical character recognition (OCR) on the electronic documents to extract text in the electronic documents. The extracted text may then be analyzed to determine and/or classify the content of the electronic documents. For example, the content may be classified by topics (e.g., an electronic document may include information about George Washington's birthplace and therefore may be classified under the topic of “George Washington”). Unfortunately, OCR techniques are computationally expensive.
Thus, it is highly desirable to classify documents without the aforementioned problems.