1. Field of the Invention
This invention relates to optical character recognition. More particularly, this invention relates to adaptive optical character recognition for books and other documents written in multiple fonts and languages.
2. Description of the Related Art
Optical Character Recognition (OCR) has become a widely used tool in modern document processing. Typical commercial OCR engines are designed for the recognition of a wide variety of text images ranging from letters and business forms to scientific papers. Large digitization projects typically include digitization of library collections and are carried out at archive centers. These organizations scan books, newspapers and other documents, subject them to OCR, and create an electronic representation of the content. Hence, the importance of OCR quality is growing. Unfortunately, libraries and archive centers must either tolerate low quality data or make large investments in manually correcting OCR results.