Digital still cameras and digital video cameras permit imagery to be stored and displayed for human viewing. However, the captured digital imagery contains information that, if automatically extracted, could be used for other purposes. Information about real-world scenery imaged by the camera, such as text (e.g., words and symbols) appearing in the scene, could then be processed and/or disseminated by new computing-based applications.
For example, automatic characterization of automobiles (e.g., for security purposes) could be enhanced by providing a means of recognizing symbols, such as manufacturer's insignia or logos, in a captured image of an automobile (e.g., in addition to providing automatic recognition of the license plate number). That is, by recognizing symbols in the image, more information about an automobile in question can be obtained (e.g., a description such as, “a white Toyota Corolla with the license plate number XYZ123” provides more information than the description, “a white car with the license plate number XYZ123”). Similarly, symbol recognition could enhance characterization of documents such as business cards or stationery, which may include corporate logos.
Research in text and symbol recognition for both printed documents and other sources of imagery has generally assumed that the text and symbols lie in a plane that is orientated roughly perpendicular to the optical axis of the camera and has also assumed that the text is roughly parallel to the horizontal edge of the imaging frame. However, text and symbols printed on objects such as street signs, license plates, business cards and billboards appearing in captured video imagery often: (a) lie in planes that are orientated at oblique angles; (b) are rotated; and/or (c) are scaled, and therefore may not be recognized very accurately by conventional optical character recognition (OCR) or symbol recognition methods. Therefore, a need exists in the art for an apparatus and method to take advantage of 3-D scene geometry to detect distortions including: (a) the perspective orientations of the planes on which text and symbols are printed; (b) the rotation angles of the text and symbols in those planes; and (c) a scale or zoom factor, thereby improving text and symbol recognition. For the purposes of the present invention, the term “rectification” refers to the removal of the effects of some or all of the distortion factors discussed above from an image or sequence of three-dimensional scene imagery.