1. Field of the Invention
The invention relates to techniques for indexing and searching for mixed media documents formed from at least two media types, and more particularly, to recognizing images and other data using Mixed Media Reality (MMR) recognition that uses printed media in combination with electronic media to retrieve mixed media documents. More particularly, the present invention relates to the creation and use of a hierarchical index for performing MMR recognition in systems having repeated image content.
2. Description of the Related Art
Document printing and copying technology has been used for many years in many contexts. By way of example, printers and copiers are used in commercial office environments, in home environments with personal computers, and in document printing and publishing service environments. However, printing and copying technology has not been thought of previously as a means to bridge the gap between static printed media (i.e., paper documents), and the “virtual world” of interactivity that includes the likes of digital communication, networking, information provision, advertising, entertainment and electronic commerce.
Printed media has been the primary source of communicating information, such as news papers and advertising information, for centuries. The advent and ever-increasing popularity of personal computers and personal electronic devices, such as personal digital assistant (PDA) devices and cellular telephones (e.g., cellular camera phones), over the past few years has expanded the concept of printed media by making it available in an electronically readable and searchable form and by introducing interactive multimedia capabilities, which are unparalleled by traditional printed media.
Unfortunately, a gap exists between the electronic multimedia-based world that is accessible electronically and the physical world of print media. For example, although almost everyone in the developed world has access to printed media and to electronic information on a daily basis, users of printed media and of personal electronic devices do not possess the tools and technology required to form a link between the two (i.e., for facilitating a mixed media document).
Moreover, there are particular advantageous attributes that conventional printed media provides such as tactile feel, no power requirements, and permanency for organization and storage, which are not provided with virtual or digital media. Likewise, there are particular advantageous attributes that conventional digital media provides such as portability (e.g., carried in storage of cell phone or laptop) and ease of transmission (e.g., email).
One particular problem in the prior art is that the image capture devices that are most prevalent and common as part of mobile computing devices (e.g., cell phones) produce low-quality images. In attempting to compare the low-quality images to pristine versions of printed documents, recognition is very difficult if not impossible. Thus there is a need for a method for recognizing low-quality images.
A second problem in the prior art is that the image recognition process is computationally very expensive and can require seconds if not minutes to accurately recognize the page and location of a pristine document from an input query image. This can especially be a problem with a large data set, for example, millions of pages of documents. Thus, there is a need for mechanisms to improve the speed in which recognition can be performed.
A third problem in the prior is that comparison and recognition of multiple document pages with similar content yields too many matches. Some common examples are: boiler plate text in contracts and other legal documents, forms or templates filled by hand or electronically, multiple versions of the same document, regional editions of a newspaper, reused figures in presentations, hankos, etc. When the layout of two or more different document pages is identical in some region, and an image patch is taken of that region, it is theoretically impossible to determine from which document the patch image was taken. Moreover, another problem with repeated content occurs when two document pages are very similar, except in a very small area (this occurs for example in multiple form instances of the same form template). In this case existing systems may get confused as usual document scoring metrics will assign very high scores to all form instances matching the correct template.
The prior art recognition systems fail to compare and recognize multiple document pages with similar content for a number of reasons. Some MMR systems (such as Path Coding) do not return any matches if two or more indexed documents closely match a given patch. This occurs because the ratio of scores of the two most similar documents is used as a measure of confidence in the retrieval. Thus, in these systems, patches matching multiple documents will be rejected as not being sufficiently discriminative and no matching documents are returned. Other recognition systems return only one of the matching documents at random when multiple documents match a given patch. This occurs because identical documents may get different scores during retrieval where heuristics are used to accelerate document scoring, or depending on the order in which documents were indexed (This occurs in invisible junction, and brick wall coding). These problems are exacerbated, when the matching parts among documents are not numerically identical (as would be obtained through printing a digital file) but merely very similar (as would occur when scanning forms).