The present disclosure generally relates to a device, system, method, and a computer program in the field of document image processing. Mobile devices (e.g., smartphones, mobile phones, laptops, tablet computers, notebooks, personal digital assistants, etc.) are becoming increasingly available worldwide. Moreover, mobile devices are becoming more portable and more powerful. They are always at hand and their capabilities are comparable with PC capabilities. As a result, the mobile electronic devices have become indispensable assistants in business, education, communication, in travel or in everyday life. Especially since most mobile electronic devices have embedded photo and/or video cameras, they are often used for capturing images. In the mobile market, there are many software applications that process namely already captured images, but most of them have several deficiencies. Cameras in the mobile devices have an electronic viewfinder (EVF), where the image captured by the lens is projected electronically onto a display screen in such a manner that the user sees a preview of a future image or video frame. The preview on the display is used to facilitate in directing the camera at the scene to be photographed. In our application the scene of interest is one or more documents.
The process of capturing an image with mobile devices often requires a user's participation, for example, by requiring a user to manually adjust settings to accommodate prevailing light conditions, stabilize the mobile device to avoid blur or defocusing of text to be captured or object to be captured. It is also important to place an object (for example a document) exactly within the limits of a viewfinder. A user photographing a document in a hurry can result in distorted bounds of the photographed document (e.g., the bounds can be excessively cropped). However, often the user does not have a chance to quickly select the right settings on the device in order to capture an ideal photograph of a document. As a result, the user may need to capture several photographs or images of a document in order to be able to select the best shot with the least number of defects or distortions. This is time consuming and requires a lot of user effort, especially when the user needs to capture a large number of text documents in a limited amount of time. When there are many documents of different sizes to be captured, time is needed for sorting them before capturing the desired documents. Different types of documents exist, for example, business cards, financial checks, bills, and printed forms. Placing each document from a collection of documents in front of a camera of a device and capturing an acceptable text image from each one can be a troublesome task.