Images of documents taken by end-users with mobile devices are typically oriented “right side up.” This is not always the case, however, as documents wider than they are tall, such as checks, may be rotated 90° to better fit the frame. Additionally, when a mobile device is held nearly flat, the device might just add an attribute to the image that indicates its orientation without changing the contents of the image. In either case, when the image is received by a system that needs to extract the text, the image may need to be rotated before optical character recognition (OCR) can succeed. Likewise, when document images are captured with a scanning device (e.g., a scanner, a multi-function printer, etc.), the output from the scanning device will match the input. This means that if a paper document was placed on the scanning device upside down, the captured image will be upside down as well.
OCR software today can correct the rotation of an image. However, rotation correction is often slow as it may require testing each possible orientation to determine which orientation returns the highest-quality OCR results. Additionally, although it is often helpful to show a mobile user the document with its proper orientation, OCR-based rotation correction is generally only available on a server as the memory and performance limitations of mobile devices are too constraining.
Another approach to correcting orientation is to use machine learning to detect the rotation of the document. With neural networks, no OCR is necessary, and the system returns the orientation directly given the entire document as the input. Unfortunately, like the OCR-based rotation correction, the memory and performance limitations of high-quality neural networks limit their use on mobile devices.