1) Field of the Invention
The present invention relates generally to indexing of digitized entities in a large and comparatively unstructured data collection such that a relevant search result can be obtained. More particularly the invention relates to a method of indexing digitized entities, such as images, video or audio files. The invention also relates to a computer program, a computer readable medium, a database and a server/client system for indexing digitized entities.
2) Description of Related Art
Search engines and index databases for automatically finding information in digitized text banks have been known for decades. In recent years the rapid growth of the Internet has intensified the development in this area. Consequently, there are today many examples of very competent tools for finding text information in large and comparatively unstructured data collections or networks, such as the Internet.
As the use of the Internet has spread to a widened group of users, the content of web pages and other resources has diversified to include not only text, but also other types of digitized entities, like graphs, images, video sequences, audio sequences and various other types of graphical or acoustic files. An exceptionally wide range of data formats may represent these files. However, they all have one feature in common, namely that they per se lack text information. Naturally, this fact renders a text search for the information difficult. Various attempts to solve this problem have nevertheless already been made.
For instance, the U.S. Pat. No. 6,084,595 describes an indexing method for generating a searchable database from images, such that an image search engine can find content based information in images, which match a user's search query. Feature vectors are extracted from visual data in the images. Primitives, such as color, texture and shape constitute parameters that can be distilled from the images. A feature vector is based on at least one such primitive. The feature vectors associated with the images are then stored in a feature database. When a query is submitted to the search engine, a query feature vector will be specified, as well as a distance threshold indicating the maximum distance that is of interest for the query. All images having feature vectors within that distance will be identified by the query. Additional information is computed from the feature vector being associated with each image, which can be used as a search index.
An alternative image and search retrieval system is disclosed in the international patent application WO99/22318. The system includes a search engine, which is coupled to an image analyzer that in turn has access to a storage device. Feature modules define particular regions of an image and measurements to make on pixels within the defined region as well as any neighboring regions. The feature modules thus specify parameters and characteristics which are important in a particular image match/search routine. As a result, a relatively rapid comparison of images is made possible.
The international patent application WO00/33575 describes a search engine for video and graphics. The document proposes the creation and storage of identifiers by searching an area within a web page near a graphic file or a video file for searchable identification terms. Areas on web pages near links to graphic or video files are also searched for such identification terms. The identification terms found are then stored in a database with references to the corresponding graphic and video files. A user can find graphic or video files by performing a search in the database.
However, the search result will, in general, still not be of sufficiently high quality, because the identification terms are not accurate enough. Hence, relevant files may either end up comparatively far down in the hit list or be missed completely in the search.