The present invention relates to processing techniques for patch recognition. More particularly, the present invention relates to systems and methods for using a new feature referred to herein as invisible junctions for image-based document patch recognition. Still more particularly, the present invention relates to systems and methods for retrieving information using invisible junction features and geometric constraints.
Computers and electronic documents were once restricted to use in a desktop environments where electronic documents were output by the computer to printers and printed on paper. Printers and copiers are used in private and commercial office environments, in home environments with personal computers, and in document printing and publishing service environments. Once an electronic document is printed out on paper, manipulations on these two types of documents are mostly independent. Printing and copying technology has not been able to bridge the gap between static printed media (i.e., paper documents), and the “virtual world” of interactivity that includes the likes of digital communication, networking, information provision, advertising, entertainment and electronic commerce.
The advent and ever-increasing popularity of smaller portable computing devices and personal electronic devices, such as personal digital assistant (PDA) devices, cellular telephones (e.g., cellular camera phones) and digital cameras over the past few years, has expanded the concept of documents to include their electronic version by making the available in an electronically readable and searchable form and by introducing interactive multimedia capabilities, which are unparalleled by traditional printed media.
There continue to be problems in the prior art in bridging between the world of electronic documents on one hand and the world of paper documents on the other. A gap exists between the virtual multimedia-based world that is accessible electronically and the physical world of print media. In particular, it is still very difficult and/or computationally expensive to use a printed document to access or even find the electronic document from which the paper document was generated. A key obstacle to identifying and finding an electronic document corresponding to a printed document is the recognition of an image patch that is being captured by the camera. While capturing an image of the printed document has become trivial with the proliferation of cell phones with cameras, there is no way to use such low quality images for electronic document retrieval.
In other words, there is not an existing method that can effectively identify from a database the document page the camera is looking at, pin-point the exact camera look-at point on the recognized page, and estimate the frame box of the image on the recognized document page. This recognition task is made even more challenging considering that: 1) the input image is a small portion of the document page being looked at; 2) there is a large number of document pages in the database that look similar to each other; 3) the hand-held camera could have very different viewing conditions including different viewing angles and distances with respect to paper and camera motion due to hand movement; 4) there are considerable photometric changes due to lighting changes; and 5) there may be non-planar geometric deformation if the user is holding the paper by hand.
While there have been attempts in the prior art, they suffer from a number of deficiencies. For example, a popular Scale-Invariant Feature Transform (or SIFT) is not suitable for text documents. The SIFT key points are chosen from the extrema in scale space. More specifically, all scales and image locations are scanned for local maxima in scale space; these local maxima are chosen as key point candidates. This makes the SIFT poor at discrimination between text and the SIFT is not stable and repeatable in noisy environments. Other prior art approaches focus on geometric features of the text block but they are not suitable for Asian or ideographic languages.
Yet another problem with the prior art is that the few types of recognition available have discrimination capabilities that work poorly on text and image text combinations. This is in part due to the fact that there is some regularity in geometric layout of all text documents. Often the prior art recognition methods return large number of candidate matches with no ranking or ranking that provides to many false positive matches.