Google Inc. has an application called Google Goggles that sends a customer image to their server, which scans the image and returns a Google search page based on the scanned image.
U.S. Pat. No. 7,751,805 (Neven et al.) discloses a mobile telephone, a remote recognition server, a remote media server, a camera in the mobile phone, communication link for transmitting an image from the camera to the remote recognition server and for receiving mobile media content from the remote media server, matching an image from the phone with an object representation in a database using the remote recognition server, and forwarding an associated text identifier to the remote server based on the associated text identifier. Neven et al. discloses a VMS server including a visual recognition server and a media server. The visual recognition server recognizes objects within an image and enables storing of new objects in a database. The media server maintains content associated with a given ID and delivers the content to the client. Neven et al. discloses downloading applications to mobile devices using a VMS client. Application developers submit images to the VMS service.
Neven et al. states, “To implement an effective vision-based search engine it will be important to combine multiple algorithms in one recognition engine or alternatively install multiple specialized recognition engines that analyze the query images with respect to different objects.” All results are returned, or a hierarchy among the recognition disciplines are returned, or one result is returned. Objects should be updated regularly. The recognition output may be an image description. Recognition may be of objects, faces, and characters. Time, location, user profile, recent phone transactions, and additional user inputs may be used to correctly identify the image. Neven et al. discloses sending a low resolution image first, then additional image detail if required.
Neven et al. mentions SIFT feature approach of Davide Lowe (1999), extraction of feature vectors from key interest points with comparison of corresponding feature vectors and similarity measurement and comparison to thresholds as the basic elements of any successful recognition system. In addition, an interest operator is mentioned using phase congruency of Gabor Wavelengths as being superior to affine harris or DOG Laplace (Kovesi 1999). For Feature Vectors Gabor Wavelengths are used instead of Lowe's SIFT features, augmented with learned features (Viola and Jones 1999). Additionally dictionaries of feature vectors extracted from images from different viewing and lighting conditions are used. To cut down on search times color histograms and texture descriptors such as proposed under MPEG7 are used.
Neven et al. discloses performing optical character recognition on the image, identifying pictures of products and associating the product with the user, identifying portions of a printed page and returning real-time information about the text, converting a picture into a phone number or email address or SMS text message or web address. Neven et al. discloses media bridging and mobile advertising specifically searching on published pages, “together with publishing of the newspaper, magazine or book it will be necessary to submit digital pictures of the pages to the recognition servers so that each part of the printed material can be annotated.” From a picture of a billboard the user may enter a contest, or the advertiser may count clicks, and the advertisement may be adjusted based upon the clicks from users. Real-time data taken from billboards may be used to confirm that the billboard is targeting customers. The billboard may be electronic such that the advertisement may be changing in real-time. There are additional disclosures for use of these images in the patent. User feedback or user inactivity of the search results, are used to score how well the image was found.
U.S. Pat. No. 7,676,117 (Rowley et al.) discloses a system of identifying similar images using histograms, image intensities, edge detectors or wavelets. Concatenated labels are assigned to the similar images. Rowley et al. discloses using wavelets to identify duplicate images, “Fast Multi-resolution Image Query” by Charles E. Jacobs, Adam Finkelstein, and David H. Salesin, Computer Graphics (Proceedings SIGGRAPH 1995). Rowley et al. discloses converting the image to YIQ space. Labels may be concatenated.
U.S. Pat. No. 7,565,139 (Neven Sr. et al.) “discloses the remote server having an optical character recognition engine, an object recognition engine, a face recognition engine and an integrator module for generating a recognition output. In addition they disclose a rigid texture object recognition engine and an articulate object engine.
U.S. Pat. No. 7,437,351 (Page) discloses scanned in or electronically delivered published items, stored in a searchable data base. Ranked characterizations are returned for relevant web pages and published items. Hyperlinks to a more complete electronic representation of the published item may be returned. Publishers provide authorization to display copyrighted materials through a permission protocol. Figures show advertisements within text on pages.
U.S. Publication No. 2008/0107338 (Furmaniak et al.) discloses a media material analyzer that identifies block segments associated with columnar body text. Block segments belonging to a continuing article extending across multiple pages are identified. The identification is based on language statistics information and continuation transition information. Furmaniak et al. discloses analyzing pixel value change complexity along horizontal and vertical directions, language statistics information, layout transition information or both statistics and transition. In addition they disclose a layout transition analyzer. U.S. Publication No. 2008/0107337 (Furmaniak et al.) discloses a system for searching media material having a layout over a network.
U.S. Pat. No. 7,174,031 (Rhoads et al.) discloses a camera phone with a 2D imager decoding a watermark, and decoding steganographic data on imaged objects. Rhoads et al. discloses moving the phone to generate gestural input. Also disclosed are sensing and responding to digital watermarks, bar codes, RFID and sensing 2D or 3D objects. Information may be visual or hidden. Sound feedback may be provided when information is “found.” Other applications include generating an automatic grocery store list and notification from a refrigerator, data processing on a computer system, composing a document, printing the document including machine readable indicia and storing data in association with data identifying a location of the electronic version of the document. In addition, the reference discloses presenting a printed document to an optical capture deice, processing image data, launching a software application based upon the data, and using the software to open an electronic version of the document. The data may be used for a reward program and may provide a secure method of authentification. Roads et al. also discloses a greeting card with data directing a computer to a web site with image, video, and audio that corresponds to the card. A magazine with data is disclosed, an advertisement page, data identifying an entry in a database, and a database containing an internet address of a web page associated with the advertisement. The reference discloses a print advertisement with data, processing the print advertisement to extract data which directs the user via an internet web browser to a web site that provides consumer information related to a product or service promoted by the print advertisement. Linking traffic from the use of the invention to access the website may be monitored. Data may be acquired from an object, decoded, and a subset of the decoded data may be submitted to a remote computer which determines whether a prize should be awarded in response. U.S. Pat. No. 6,449,377 (Rhoads) discloses modifying line widths or spacings of line art to encode information then decoding the information and using it for security purposes.
The prior art may not identify which instance of an advertisement in a magazine page that the user has selected. For instance the same ad with the same image and content may run in multiple magazines over an extended length of time. Measuring the “click” rate of such an ad would be difficult to associate to a particular magazine. Sending the image, instead of decoding the image using the camera, requires additional bandwidth and time.
The prior art looks at all images from everywhere. It would be simpler and more accurate to have an application that is publication specific. A publication specific application does not need to recognize products or textures or faces if it may identify the correct page and publication. The publisher knows what is on each page. The publisher knows what is going to be on each page.
A more difficult problem with the prior art is that it requires an existing picture of the printed page in order to update the database of a recognition engine. In many cases the artwork includes component parts of the page to be printed such as graphics, text, and pictures. A printed proof of the printed page may not actually use the halftone methods or screening that will be used to print the publication when it goes to press. In many cases the proof may only be a virtual proof and only available on an electronic display.
The prior art is disadvantaged as the recognition engine may not be updated until the page is printed and a picture is submitted for recognition. This does not allow time for a website to be created and the links developed prior to the publication going to press.
Using the prior art one may not confirm whether similar pages in the publication are all uniquely identified prior to printing the publication. The prior art returns multiple results back to the customer from which to choose. The returns may not be exclusive to the magazine publisher so the magazine publisher does not always get feedback for using the prior art.
The prior art of encoding a watermark within an image or advertisement requires an image. Further the watermark changes the advertisement or the image. In addition the watermark must be invisible or unobjectionable while being detectable by an inexpensive camera. Lastly watermarking text is difficult as there is no image content to embed the watermark in.
The prior art of adding a barcode or glyph detracts from the readability and visual appearance of the magazine or published page. A barcode or glyph uses valuable space on each page and changes the visual intent of the graphic artist.