Video and scene imagery are increasingly important sources of information. The proliferation and availability of devices such as digital still cameras and digital video cameras are clear evidence of this trend.
Aside from the general scenery, e.g., people, and the surrounding landscape, many captured imagery often contain text information (e.g., broadly including letters, numbers, punctuation and symbols). Although the captured text information is easily recognizable by a human viewer, this important text information is often not detected and deciphered by the portable image capturing device and therefore is not immediately utilized by the operator of the portable image capturing device.
However, it has been noted that recognizing text that appears in real-world scenery is potentially useful for characterizing the contents of video imagery, i.e., gaining insights about the imagery. In fact, the ability to accurately deduce text information within real-world scenery will enable the creation of new applications that gather, process, and disseminate information about the contents of captured imagery.
Additionally, the volume of collected multimedia data is expanding at a tremendous rate. Data collection is often performed without real time processing to deduce the text information within the captured data. For example, captured imagery can be stored in a portable device, but no processing is performed to detect and extract text information within the captured imagery. Thus, benefits associated with real time text detection and extraction are not realized in portable imagery capturing devices.
Therefore, a need exists in the art for an apparatus and method to portably detect and extract text information from captured imagery, thereby allowing new implementations for the gathering, processing, and dissemination of information relating to the contents of captured imagery.