1. Field of the Invention
The invention relates to a method and a system for displaying an image based on a text in the image, and more particularly to a method and a system capable of displaying the image based on the text in the image by automatically determining whether to display the image in its real size or display the image in a reduced size to fit the screen.
2. Description of the Related Art
With development of information transmission technologies, such as computer technology and network technology and so on, image has been more and more popular as an intuitionistic expression of information in the fields of computer and network. The information represented by images are extremely rich, including people faces, landscape pictures, schematic diagrams, maps and so on, and even texts can be represented in the form of images. In addition, the types of information included in the same image are not single, for example, an image mostly of scene further includes texts, the background of people faces is a scene, and the like. A typical instance is a map which is a combination of schematic labels and texts. For various images, computers and network users give different emphases for requirement to them, for example, as for people faces and scene pictures, users normally intend to firstly feel them as a whole, while as for the image with texts as main expression for information, users normally intend to feel the details therein in which the image makes sense only if the main texts therein are recognizable. However, existing computers and networks do not select specific image display methods with respect to the requirements of different types of images, thereby resulting in that the display methods of images can not properly meet the demands of the users.
In earlier image display methods, such as ACDSee (registered trademark) image viewer management software of ACDSystems company, the image is displayed in its real size. In its earlier versions, such as version 2.4, in the case that the image is larger than the display screen, the user will normally see a part of the image at first, and if the user needs to first hold the image as a whole, the user needs to switch the display mode to display the image in a size fitting the screen. On one hand, the operations of the user are needed, and on the other hand, the user's waiting time is increased.
So far, in the case that the image to be displayed is larger than the display screen, the Internet Explorer (registered trademark) browser of Microsoft company use the mode of reducing the size of the image into the size fitting the display screen and then displaying it on the display screen, thereby the user can hold the image as a whole at first. In the case that the image mainly represents people faces or scenes, this kind of display mode can meet demands of the user well. However, in the case that the image contains a great number of texts and mainly represents the text information therein, or the image itself is an image of a text document, such reduction displaying is normally worthless for the user, because the user can not recognize the texts in the image. In such a case, the user has to move the cursor across the image to find and click an icon for enlarging the image, and waits for enlarging the image to its real size. On one hand, the operations of the user are needed, and on the other hand, the user's waiting time is increased.
In all, in the existing image display technologies, it is fairly impossible for either the mode for displaying the image in its real size or the mode for displaying the image in a reduced size to fit the screen to meet the demands of the user. Thus, on one hand, the operations of the user are needed, and on the other hand, the user's waiting time is increased. Therefore, there is needed a solution for displaying the image by automatically determining whether to display an image in its real size or in a reduced size to fit the screen, and existing Optical Character Recognition technology provides a possibility for realizing the object.
Optical Character Recognition (OCR) technology is a computer input technology for converting characters of various notes, newspapers and periodicals, books, documents and other publications into image information by an optical input method such as scanning, and then converting the image information into available format by a character recognition technology. It is applicable to the fields of bank notes, an amount of character material, archive files, the inputting and process of the document. Characters, letters and numbers in print can be automatically recognized, and characters, script letters, number and various symbols in script can be recognized by the OCR technology. OCR technology further has an automatic plate analyzing function capable of automatically analyzing a scanned plate, partitioning out the text regions to be recognized, and then performing recognition.
Optical Character Recognition includes the following key blocks: image inputting, image-preprocessing, character features extraction, comparative database, and comparative recognition.
Image inputting: an object to be processed by OCR transmits through optical devices, such as image scanners, facsimile machines or any photography equipments, and the images are sent to a computer. With the development of science and technology, the input devices such as scanners have been more and more refined, thinner, smaller, and of higher quality, and resolution of the scanners makes the images clearer and efficiency of the OCR processing is increased.
Image-preprocessing: it includes a process from a step of obtaining the images of black and white dualization, gray-scale or color images to a step of separating respective texts and images. It relates to image processes such as image normalization, noise elimination, image correction, and so on, and file preprocesses such as analyzing images and texts, separating rows and characters of texts (that is, separating the rows of the texts in unit of row at first, and then separating the characters in the rows with respect to respective rows of the texts) and so on. For the image processing, the theories and the actual technologies have been mature, so there are various available link libraries in the market or network resources. For the file preprocesses, with respect to the image, firstly, regions of graphics, tables and texts are separated, even the arranging direction of the article, the outline of the article and the content body can be separated, and the sizes and fonts of the characters can also be determined as the original file.
Characters features extraction: in terms of resolution, operations for the features extraction, such as which feature to use, how to extract, and so on, influences the effect of recognition directly. The features can be divided roughly into two types: one type is statistical feature, such as black-white dots ratio of the text regions; the other type is structural feature, such as number and positions of stroke ends and cross points, or stroke sections of the characters obtained after the image of the characters is thinned.
Comparative database: standards and character information treated as correct are stored therein. After extracting features of the characters, the comparative database is necessary for comparing either the statistical features or structural features. Contents of the comparative database include a set of all the characters to be recognized, and groups of features which are obtained according to the same feature extraction method as that used for inputting the characters.
Comparative recognition: according to different features, different mathematical distance functions are selected. Comparison methods, such as the comparison method in Euclidean Space, Relaxation comparison method, Dynamic Programming (DP) comparison method, and establishment and comparison of neural network database, Hidden Markov Model (HMM), and so on, are normally used. To make the result of recognition more stable, the so-called Experts System is also proposed, which uses the difference and complementarity of various feature comparison methods to improve reliability of the recognition result.
In addition, there are many methods for calculating a total area of the image and an area of one character in the related art. Many methods can be used to calculate the total area of the image. For a regular image, the length and width of the image can be returned by an image library function so as to obtain the total area of the image, and for an image having complex edges and profile, an area partition method can be used which partitions the image into many small blocks, calculates the area of each block, respectively, and then calculates the sum thereof. With respect to one character, many methods can be used to obtain either its size or its margins including up and down margins and left and right margins. The size of the margin can be represented by a pixel value. It is assumed that some character has a size of 5 pt, its size is 80×80 pixels, and all of the up, down, left and right margins are 5 pixels, so the area occupied by this character is 85×85 pixels.
The US patent application publication No. US2007/0104366A1 discloses a solution for extracting and reordering text regions in an image, so as to show texts stored in the format of image to the user. In the solution, if the image contains one or more text regions, and the respective text regions in the image have explicit edges, the OCR technology can be used to extract the respective text regions from the image to form sub-images, respectively. Then, the respective sub-images are reordered according to a preset order, such as reading order of the text, and are displayed to the user for the reading. However, the solution does not involve how to judge whether to display the image in its real size or in its reduced size to fit the screen.
The US patent application publication No. US2002/0120653A1 discloses a solution for obtaining text information in an image for a user to browse a webpage. In the case that the image browsed by the user contains text regions, the text regions in the image are recognized by a filter, characters in the text regions are recognized and extracted by the OCR technology, and then the characters are enlarged for displaying. The solution assumes that the characters in the image are relatively small and inconvenient for the user to read, so the characters in the image are extracted and enlarged. Accordingly, the solution does not involve judgment of whether to adjust the image to fit the display screen. The practical cases are normally that, the characters in the image are suitable for the user to read in its real size, but the user can not read the characters in the image as the browser automatically reduces the size of the image. Thus it is not necessary to enlarge the characters in the images in any case. Therefore, the solution results in worthless waste of computer system resources to a great extent. Particularly, with respect to an image having close relation between text regions and graphic regions, e.g. a map, if the texts representing place-names and legends in the map image are extracted to be displayed individually, they make no sense for the user.
Therefore, none of the existing solutions using OCR technology to process a text image can provide a technology capable of automatically determining whether to display the image in its reduced size to fit the display screen, resulting in the user's inconvenience in browsing and unnecessary load of the computer system.