Text data detection assumes primary importance in many applications such as image indexing. Indexing such an image in general is resolved into four stages:                detecting text data in the image, generally including a sub-stage of classification of the image's pixels,        retrieving the detected text data,        character recognition in the retrieved text data, and        indexing the image as a function of the recognized text data.        
In general the character recognition stage is carried out by an optical character recognition (OCR) system.
The text data contained in images are of two types, namely “scene text” or “artificial text”. The artificial text is the text that was added to the original image, for instance a sub-title. The scene text is the text already contained in the original image, for instance that of advertising or of a T-shirt. The scene text is more difficult to detect than the artificial text because scene text has more complex characteristics, such as orientation, color and size of text fonts.
The text data detecting algorithms are more or less effective depending on the characteristics of the text which must be detected.
First text data detecting algorithms performed character segmentation before grouping the characters to detect words and lines. The purpose of segmentation is to divide an original image into several distinct zones.
Some text data detection algorithms are based on classifying image data line by line into “text” and “non-text” pixels. Illustratively a colorimetric grouping of pixels in each line is carried out to ascertain those pixels which belong to one “text” zone. Bar graphs of uniform color line segments are determined and compared to each other in order to form rectangles comprising image text.
In an embodiment variation, gradients of luminous brightness of each line are evaluated and the lines are considered in segments that exhibit a similar gray shade. The adjacent lines are merged according to a statistical similarity criterion.
In other variations, related image components are processed hierarchically in order to determine the text zones.
The text data detection algorithms based on segmentation are effective on high resolution images retrieved from newspapers, but their performance is much less for low resolution images wherein the characters touch each other and their font sizes are small.
Other text data detecting algorithms are based on contour detection and texture analysis. Some algorithms make use of the high contrast of the strokes forming text characters and seek out vertical contours which they group into rectangles. These text data detecting algorithms require an enhancing stage for the image retrieved from the rectangles before the character recognition stage can begin. For instance a stage for connecting image information into binary signals is carried out following contour grouping by evaluating gradients that were accumulated for each image pixel.
Other text data detecting algorithms are based on a learning process and sometimes resort to Haar wavelets to retrieve text characteristics from each pixel forming an image. By moving a fixed-size window over the image, wavelet coefficients are injected into a neuron network of the multilayer perception (MLP) type in order to classify each pixel as being “text” or “non-text”.
In a variation, the neuron network classifies each image pixel as a function of contour density, of bar graph variance, etc.
In general the pixel classification as a function of text characteristics is carried out on neuron networks or on support vector machines (SVMs). These classifying means take into account the previous pixel classifications in order to classify the ensuing pixels and in this manner they “learn” to classify.
Other recent text data detecting algorithms act directly on compressed image shapes or on compressed video sequences, for instance on images having been compressed in the MPEG 1, MPEG 2 modes etc.
The text and non-text pixels do not lend themselves to be discriminated solely on the basis of the local text characteristics of the pixels of an image. The main text characteristics is of a text are retrieved from its geometry and are rendered by the presence of a baseline and by the feasibility of isolating the text by bounding a zone around it. A text's baseline refers to the alignment of the text's characters. Moreover, the text characteristics regarding texture become increasingly uncertain as the text being processed becomes shorter.
The objective of the present invention is to remedy the above drawbacks by determining more text characteristics of image pixels in order to detect text data in said images to improve the efficiency and accuracy of text data detection.