As the available amount of multimedia content grows, so does the need for effective search and retrieval systems. Search and retrieval systems (“search engines”) typically operate by receiving (e.g., from a user) a textual query of terms and/or phrases. The search engine compares the search terms (“keywords”) against a body of searchable content and returns an indication of the most relevant items in that content. The classic example of a search engine is an Internet search engine that uses user-provided keywords to find relevant web pages and returns a listing of the most relevant ones.
As the amount of digital data increases, search engines are being deployed not only for Internet search, but also for proprietary, personal, or special-purpose databases, such as personal multimedia archives, user generated content sites, proprietary data stores, workplace databases, and others. For example, personal computers may host search engines to find content on the entire computer or in special-purpose archives (e.g., personal music or video collection). User generated content sites, which host multimedia and/or other types of content, may provide custom search functionality tailored to that type of content.
To facilitate efficient search and retrieval, search engines typically pre-process content by creating indices. Once created, indices allow a search engine to map search terms to relevant content without need to rescan all of the content on each search. Therefore, the quality of search results is heavily dependent on the quality of the index generated. Accordingly, more effective techniques for generating accurate search indices are needed.