One use of optical character recognition (OCR) is recognizing text present in a video source, such as a video recording, or a live video captured by a user with a video camera, a mobile device equipped with video capturing capabilities, etc. The text present in the video source is usually present on an object being video captured, such as a page of text, a sign on the street, a restaurant menu, etc. Once the text presence is detected, its information can be used in the future. The detected text can be recognized and/or used for video compression or any other future uses. The recognized text can be further read and converted into audio, using one of a variety of existing text-to-speech technologies for people having difficulties reading the text. The recognized text can also be translated into a different language immediately upon the capture, so that the user can read or listen (using a text-to-speech technology) to a translation of the text as it is being captured by the camera pointing at the text in the foreign language.
The processing for video can be implemented by simply applying any of the existing document layout analysis technologies to each sequential individual frame or to each frame in a selection of video frames in the video source. However, this approach is too slow and/or too wasteful in terms of processing resources and power consumption, because it fails to utilize the redundancy present in most realistic video capture scenarios. Accordingly, there is a need to improve effectiveness and efficiency of the processing of video sources. To illustrate the present disclosure, the example video frames will be further used for character recognition (OCR).