Mobile devices such as a cell phone 108 (FIG. 1A) include a camera that can be used by a person 110 to capture an image (also called “natural image” or “real world image”) of a scene 100 in real world (as per act 111 in FIG. 1B), such as image 107 (shown displayed on a screen 106 of cell phone 108 in FIG. 1A). Natural image 107 may be uploaded to a computer for recognition of text therein, based on regions (also called “blobs”) whose boundaries differ significantly from surrounding pixels in one or more properties, such as intensity and/or color. Some prior art methods first identify a pixel of local minima or maxima (also called “extrema”) of a property (such as intensity) in the image (as per act 112 in FIG. 1B), followed by identifying pixels that are located around the identified extrema pixel, within a predetermined range of values of the property, so as to identify a region (as per act 113 in FIG. 1B), known in the prior art as maximally stable extremal region or MSER.
MSERs are regions that are geometrically contiguous (and one can go from one pixel to any other pixel by traversing neighboring pixels in such a region) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). In prior art methods known to the current inventors, MSER detection evaluates intensities of all pixels in such a region (e.g. to ensure that the pixels contact one another, so that the region is contiguous).
After MSERs are identified, boundaries of MSERs may be used in the prior art as connected components (see act 114 in FIG. 1B), to identify candidates for recognition as text. The text candidates may then be subject to optical character recognition (OCR) in the normal manner. One such method is described in, for example, an article entitled “Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions” by Chen et al, believed to be published in IEEE International Conference on Image Processing (ICIP), September 2011 that is incorporated by reference herein in its entirety as background.
MSERs are believed to have been first described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions”, Proc. Of British Machine Vision Conference, 2002, pages 384-393 that is incorporated by reference herein in its entirety. The method described by Matas et al. is known to be computationally expensive and a lot of time is normally taken to identify MSERs in an image. The time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety.
The current inventors note that prior art methods of the type described by Chen et al. or by Matas et al. or by Nister et al. identify hundreds of MSERs in an image. Such methods sometimes identify thousands of MSERs in an image 107 that includes details of natural features, such as leaves of a tree or leaves of plants, shrubs, and bushes.
Identifying such large numbers of MSERs in today's computers, using methods of the type described above, while being accurate, takes a significant amount of time, depending on the amount of detail in portions of the image that contain natural features. The current inventors find such methods impractical for use in recognition of text by handheld devices, such as smart phones, due to inherent limitations of such devices, on computation power and memory, relative to computers. Hence, there appears to be a need for methods and apparatuses of the type described below.