In the modern era of digitization, OCR has emerged as a convenient solution for converting various images, such as scanned documents and photos of documents, into electronic documents. Typically, OCR engines are trained for recognizing specific languages. These engines use the trained features of a selected language for recognizing text.
In certain scenarios, images may comprise multi-lingual text content. In such scenarios, the OCR engines take more time for processing and generating the electronic documents with reduced accuracy. For example, an OCR engine, trained in English, may be utilized to generate an electronic document from a scanned document, which comprises multi-lingual text content such as English and Russian. In this example, the OCR engine may recognize English text content with a higher accuracy compared with Russian text content in the scanned document. One solution to such a problem is to use different OCR engines trained for different languages for a single multi-lingual document. In another solution, an individual, such as a subject matter expert, may manually go through each paragraph of the multi-lingual document, and then may translate the document. However, such solutions may be unrealistic and impractical when thousands of multi-lingual documents are to be processed. Therefore, an automatic and robust OCR processing technique for images with more than one language is required.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to those skilled in the art, through a comparison of described system with some aspects of the present disclosure, as set forth in the remainder of the present application and with reference to the drawings.