This invention relates to optical character recognition generally, and more particularly to using shape suppression to identify areas in images that include particular shapes.
Computer technology is continually advancing, providing computers with continually increasing capabilities. General-purpose multimedia personal computers (PCs) have now become commonplace, providing a broad range of functionality to their users, including the ability to manipulate visual images. Further advances are being made in more specialized computing devices, such as set-top boxes (STBs) that operate in conjunction with a traditional television and make specialized computer-type functionality available to the users (such as accessing the Internet).
Many such general-purpose and specialized computing devices allow for the display of visual images with text. Given the additional abilities of such devices in comparison to conventional televisions, many situations arise where it would be beneficial to be able to identify the text within a particular visual image. For example, a video image may include a Uniform Resource Locator (URL) identifying a particular web page that can be accessed via the Internet. If the URL text could be identified, then the text could be input to a web browser and the corresponding web page accessed without requiring manual input of the URL by the user.
Identification of text or characters is typically referred to as Optical Character Recognition (OCR). Various techniques are known for performing OCR. However, many OCR techniques require, or their accuracy can be improved by, identifying specific areas within a visual image that contain text prior to application of the OCR technique (that is, only the specific areas that might contain text are input to the OCR process). The accuracy of current techniques for identifying such specific areas is poor, often due to the nature of the underlying video images. Text can be xe2x80x9con topxe2x80x9d of a wide range of different backgrounds and textures of the underlying video image. Distinguishing such background from text can be very difficult.
The invention described below addresses these disadvantages, providing text frame detection in video images using shape suppression.
The use of shape suppression to identify areas of images that include particular shapes is described herein. Such shapes can be, for example, letters, numbers, punctuation marks, or other symbols in any of a wide variety of languages.
According to one embodiment, a set of shape characteristics that identify the vertical edges of a set of shapes (e.g., English letters and numbers) is maintained. The vertical edges in the image are analyzed and compared to the set of shape characteristics using a Vector Quantization (VQ)-based shape classifier. Areas in which these edges match any of the shape characteristics are identified as potential areas of the image that include one or more of the set of shapes.
According to another embodiment, a vertical differential filter is applied to a received image to generate a horizontal edge map, and a non-maxima suppression filter is applied to the horizontal edge map to generate a thinned horizontal edge map. Similarly, a horizontal differential filter is applied to the received image to generate a vertical edge map and is applied to the vertical edge map to generate a thinned vertical edge map. A segmentation process then determines a set of areas that are candidates for including particular shapes (e.g., text) based on the density of edges in the areas of the vertical edge map. The portions of the horizontal edge map corresponding to the candidate areas are then analyzed to determine whether there are a sufficient number of horizontal edges in verification windows of the candidate areas. If there are a sufficient number of horizontal edges in the verification window of a candidate area, then that candidate area is output to a shape suppression filter. The vertical edges in the candidate areas are then compared to a set of shape characteristics by using a VQ-based shape classifier. For each vertical edge, if it is classified as a shape, then the edge is kept; otherwise the edge is removed. Based on the remaining edges, the probable areas are then selected and output as shape (e.g., text) areas.