Handheld devices such as a cell phone 108 (FIG. 1A) include a digital camera for use by a person 110 with their hands to capture an image of a real world scene 100, such as image 107, shown displayed on a screen 106 of the cell phone 108 in FIG. 1. Image 107 is also referred to as a handheld camera captured image, or a natural image or a real world image, to distinguish it from an image formed by an optical scanner from a document that is printed on paper (e.g. scanned by a flatbed scanner of a photocopier).
Recognition of text in handheld camera captured image 107 (FIG. 1A) may be based on regions (also called “blobs”) with boundaries that differ significantly from surrounding pixels in one or more properties, such as intensity and/or color. Some prior art methods first identify a pixel of local minima or maxima (also called “extrema”) of a property (such as intensity) in the image (as per act 112 in FIG. 1B), followed by identifying pixels that are located around the identified extrema pixel, within a predetermined range of values of the property, so as to identify a region (as per act 113 in FIG. 1B), known in the prior art as maximally stable extremal region or MSER.
MSERs are regions that are geometrically contiguous (and one can go from one pixel to any other pixel by traversing neighbors) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). Boundaries of MSERs may be used in the prior art as connected components (see act 114 in FIG. 1B), to identify candidates for recognition as text. Connected components may be subject to on one or more geometric tests, to identify a rectangular portion 103 (FIG. 1A) in such a region that is then sliced or segmented into a number of blocks, with each block being a candidate to be recognized, as a character of text. Such a candidate block may be recognized using optical character recognition (OCR) methods.
One such method is described in, for example, an article entitled “Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions” by Chen et al, believed to be published in IEEE International Conference on Image Processing (ICIP), September 2011 that is incorporated by reference herein in its entirety as background. MSERs are believed to have been first described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. Of British Machine Vision Conference, 2002, pages 384-393 that is incorporated by reference herein in its entirety. The method described by Matas et al. is known to be computationally expensive because the time taken to identify MSERs in an image. The time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety.
The current inventors note that prior art methods of the type described by Chen et al. or by Matas et al. or by Nister et al. identify hundreds of MSERs, and sometimes identify thousands of MSERs in an image 107 (FIG. 1A) that includes details of natural features, such as leaves of a tree or leaves of plants, shrubs, and bushes. For example, numerous MSERs may be generated from one version of an image (also called MSER+ image) by use of a method of the type described above on natural image 107. Also, another image (also called MSER− image), may be similarly generated by use of the just-described method, after inverting intensity values of pixels in image 107, to obtain numerous additional MSERs.
OCR methods of the prior art originate in the field of document processing, wherein the document image contains a series of lines of text oriented parallel to one another (e.g. 20 lines of text on a page). Such OCR methods extract a vector (called “feature vector”) from binary values in each block and this vector that is then compared with a library of reference vectors generated ahead of time (based on training images of letters of an alphabet to be recognized). Next, a letter of the alphabet which is represented by a reference vector in the library that most closely matches the vector of the block is identified as recognized, to conclude OCR (“document” OCR).
The current inventors believe that MSER processing of the type described above, to detect a connected component for use in OCR, requires memory and processing power that is not normally available in today's handheld devices, such as a smart phone. Hence, there appears to be a need for methods and apparatuses to speed up MSER processing, of the type described below.