The present invention relates to an apparatus and concomitant method for digital image processing. More specifically, the present invention provides text recognition in an image sequence of scene imagery, e.g., three-dimensional (3D) scenes of the real world.
Video and scene imagery are increasingly important sources of information. The proliferation and availability of devices such as digital still cameras and digital video cameras are clear evidence of this trend.
Aside from the general scenery, e.g., people, and the surrounding landscape, many captured imagery often contain text information (e.g., broadly including letters, numbers, punctuation and symbols). Although the captured text information is easily recognizable by a human viewer, this important text information is often not detected and deciphered by the portable image capturing device and therefore is not immediately utilized by the operator of the portable image capturing device.
However, it has been noted that recognizing text that appears in real-world scenery is potentially useful for characterizing the contents of video imagery, i.e., gaining insights about the imagery. In fact, the ability to accurately deduce text information within real-world scenery will enable the creation of new applications that gather, process, and disseminate information about the contents of captured imagery.
Additionally, the volume of collected multimedia data is expanding at a tremendous rate. Data collection is often performed without real time processing to deduce the text information within the captured data. For example, captured imagery can be stored in a portable device, but no processing is performed to detect and extract text information within the captured imagery. Thus, benefits associated with real time text detection and extraction are not realized in portable imagery capturing devices.
Therefore, a need exists in the art for an apparatus and method to portably detect and extract text information from captured imagery, thereby allowing new implementations for the gathering, processing, and dissemination of information relating to the contents of captured imagery.
The present invention is an apparatus and a concomitant method for portably detecting and recognizing text information in captured imagery. In one embodiment, the present invention is a portable device that is capable of capturing imagery and is also capable of detecting and extracting text information from the captured imagery. The portable device contains an image capturing sensor, a text detection module, an OCR module, and means for presenting the output to the user or other devices. Additional modules may be necessary for different embodiments as described below.
In a first embodiment, the present device is deployed as a portable language translator. For example, a user travelling in a foreign country can capture an imagery having text (e.g., taking a picture of a restaurant menu). The text within the captured imagery is detected and translated to a native language of the user. A pertinent language translator can be loaded into the portable device.
In a second embodiment, the present device is deployed as a portable assistant to an individual who is visually impaired or who needs reading assistance. For example, a user shopping in a store can capture an imagery having text (e.g., taking a picture of the label of a product). Another example is a child taking a picture of a page in a book. The text within the captured imagery is detected and audibly broadcasted to the user via a speaker.
In a third embodiment, the present device is deployed as a portable notebook. For example, a user in an educational environment can capture an imagery having text (e.g., taking a picture of a white board, view graph or a screen). The text within the captured imagery is detected and stored in a format that can be retrieved later for text processing, e.g., in a word processor format.
In a fourth embodiment, the present device is deployed as a portable auxiliary information accessor. For example, a user in a business environment can capture an imagery having text (e.g., taking a picture of a billboard or a business card having an Internet or web address). The text within the captured imagery is detected and the Internet address is accessed to acquire additional information.
In a fifth embodiment, the present device is deployed as a portable navigation assistant. For example, the portable unit is deployed in a vehicle for automatic reading of road signs and speed limit signs. The text within the captured imagery is detected and is provided to the computer in the vehicle for assisting the vehicle""s navigation system or as a warning indicator to the driver on an instrument panel.
In a sixth embodiment, the present device is deployed as a portable law enforcement assistant. For example, the portable unit is deployed in a police vehicle or in a hand-held device for reading license plates, vehicle identification numbers (VINs) or driver licenses and registrations. The text within the captured imagery is detected and is used to provide information to a law enforcement officer as to the status of a vehicle or a driver.
In a seventh embodiment, the present device is deployed as a portable inventory assistant. For example, a user in a store or a warehouse can capture an imagery having text (e.g., taking a picture of a product on a shelf or high up on a scaffold). In another example, the odometer reading for a returned rental car could be automatically captured. The text within the captured imagery is detected and is used for inventory control.