The present invention relates generally to a method and apparatus for automatically combining a digital image with text data, and more particularly to, a personalized document system that uses the method.
Combining an image with text data is mainly used for two different purposes. On the one hand, the image can serve as an illustration of the content of the text data. On the other hand, an image can be provided with additional information in text form.
It is a common problem in document creation, such as for newspapers or magazines, to retrieve a suitable photo from image collections on the basis of text queries and on the basis of similar-image queries. Typically, the image is retrieved once the subject for the text of an article has already been determined. In current practice, retrieval is based on using text keywords to search manually labeled image data (see, e.g., M. Markkula and E. Sormunen, “Searching for Photos—Journalists' Practices in Pictorial IR”, Challenge of Image Retrieval, Newcastle upon Tyne, 1998).
In the above situation, an image is retrieved given some text. The present invention, is concerned with the case of a given image and searching for a suitable text.
A photographic system for enabling interactive communication between a camera and an attraction site is disclosed in U.S. patent application Publication 2002/0030745 A1. In this photographic system, the site stores content data related to the site, and the user communicates with the attraction site through a camera capable of communication with the site. When a photograph is taken at an image spot, the camera communicates personality data to the image spot and the relevant information regarding the spot is communicated by the image spot to the camera. In the case of this photographic system, however, the information sent to the camera is actually independent of the photograph being taken since communicating the information is automatically triggered upon actuation of the camera in the vicinity of an image spot. In other words, a true connection between the image and the text is not established.
In addition, an information retrieval apparatus and method is disclosed in U.S. Pat. No. 5,926,116. According to this method, URLs of WWW servers are stored in a relationship with corresponding position data and image data. A user has a portable terminal to fetch an image and a corresponding position. Then, a host machine receives the image and the positional data, compares them with the stored positional data and the stored image, and displays—upon matching of the position and the image—a corresponding home page.
Further, U.S. Pat. Nos. 6,055,536 and 6,389,182, disclosed systems for linking PC's with small video cameras to supply augmented information about real world situations. The system allows a user to view the real world together with contact sensitive information generated by the computer. The system uses color codes, 2D matrix codes, or infrared beacons to recognize these real world situations. Thus, information is provided upon detection of a code or an infrared beacon related to the information data.
It is a drawback of these existing systems that information is provided only if either the camera is in the neighborhood of a specific attraction site (i.e., at a specific position) or if a specific code or image is detected. Further these existing systems are not flexible because the relation between an image and the information is completely determined in advance. Accordingly, it would be advantageous to provide a method for combining a digital image with text data having proved flexibility, in particular, that allows combining an image with text data even if the location or context of the image is not known in advance.