Various projects, such as the Google Books Library Project, are underway to create a comprehensive, searchable, electronic card catalog of all documents (e.g., books) in all languages that helps users discover new books and publishers discover new readers. Optical character recognition (OCR) of printed books, or manuscripts can be used to convert books in print format into electronic format. Once the text of the books are scanned and recognized through the OCR operation, it can be easier for people to find relevant books through web searches. The correct OCR parameter configuration can be important in achieving an efficient and accurate OCR process. An English OCR package can deal very well with English, and may be able to cope passably with some other Roman alphabet languages such as French or German. However, it will not be very helpful in recognizing a document written in Chinese. In a large scale OCR operation, such as the Google Books Library Project, it can be less than desirable to rely on human intervention to select the appropriate OCR parameter configuration.