With the popularity of smart phones and portable devices, optical character recognition (OCR) is more widely used. It can be used to reduce or replace the complex text input. A user only needs to shoot an image containing text, and the text in the image can be recognized automatically by using the OCR technology, for subsequent processing (for example, search and translation).
Conventional OCR technologies include two categories. In the first category, a text line is segmented into a plurality of candidate text areas. Then each candidate text area is recognized using a trained single-character recognition engine (for example, a convolutional neural network), and a plurality of candidates is produced. Finally, the text line is decoded based on a language model and information such as the text recognition confidence level, to obtain an output. The second category includes popular techniques in recent years, it avoids the text segmentation module used in the techniques of the first category, and a character string output is obtained from a line image using a recursive neural network (RNN). The techniques in the second category are more advanced, through which a text string output can be directly obtained from line-level images.
However, in real situations, there are far more horizontal text lines than vertically-oriented text. Existing text line images cannot be directly used to train a vertically-oriented text recognition model. As a result, a large number of vertically-oriented text images need to be collected in order to ensure the recognition model training performance, wasting a lot of manpower and material resources.