Searching refers to a process in which a user submits a query, such as a list of keywords, and receives in return a search result. The search result is a set of one or more resources in a search domain that are determined to be responsive to the query by a search algorithm. It is typical for queries to be processed against a search index that corresponds to a desired search domain. For example, a query may be applied against a search index corresponding to a search domain of web documents, such as web pages or other textual documents available via the Internet, to produce a result containing a list of links to web documents satisfying the query. A query may similarly be applied against a search index corresponding to a multimedia search domain, to produce a result containing a list of links to multimedia resources satisfying the query. For example, the search query “Santana” applied against a multimedia search index could identify a link to a multimedia resource corresponding to the song “Smooth” by the band Santana.
The conventional approach to indexing a search domain involves populating the search index using web crawling techniques. Such web crawling techniques typically involve automatically examining all or portions of the World Wide Web in a methodical manner to identify resources, and queuing information about visited resources for later processing to add these resources to an index. Depending on the scope of the crawl, such techniques can involve long update cycles—for example, the web crawl might run every three days and therefore the index would only be updated every three days. Because of the time required to complete the crawl, this method of indexing is better suited for indexing some types of resources over others. For example, this method of indexing is generally quite suitable for identifying multimedia resources that are available for relatively long periods of time. An example of a multimedia resource that may be accessible for an extended period is a link to a Santana song on Santana's homepage (santana.com). As long as the band Santana continues to enjoy consumer success, it is likely that a user will be able to select and play a Santana audio or video clip from the website at any time during the future (whether weeks, months, or years).
In contrast to such “static” multimedia resources that are available for long periods of time, some publishers make available transient multimedia resources as part of a data stream. For example, a publisher may offer programming from a radio station over the Internet as a digitized stream so that listeners do not have to be in geographic proximity to a broadcast tower in order to enjoy the station programming. When provided over the Internet as an audio stream, the programming is typically “real-time” in the sense that the station does not store or archive the transmitted content in a manner that can be easily accessed by users in the future. Instead, users are only able to receive and play the content that is presently being broadcast by the station. That is, a user accessing a station via the Internet half-way through the broadcast of a song is usually precluded from “rewinding” or otherwise listening to the song from the start. Instead, the user is only able to listen to the remaining portion of the song that hasn't been broadcast. In this sense, all of the multimedia resources contained in a real-time data stream—whether they are songs, interviews, traffic reports, weather updates, or any other content in audio, video, audio/video, and/or other form—may be considered to be “transient” since they are only accessible via the Internet for an instant of time. Unless stored by the user, each multimedia resource can only be accessed for the period that the resource is broadcast. Of course, real-time multimedia data streams vary significantly between publishers and may be produced in any of a variety of media types, in various encoding formats, and at various quality levels. Some multimedia resources broadcast in a real-time data stream may be “live” in the sense that the media content is being created contemporaneously with its broadcast, while other multimedia content may be preexisting content that is presented in the real-time data stream.
Because transient multimedia resources are only accessible for a brief instant of time, conventional indexing techniques used to identify static multimedia resources are not suitable for transient multimedia resources for at least two reasons. First, conventional indexing techniques that identify a real-time multimedia data stream are often only able to index the data stream based on general metadata associated with that stream. For example, a stream might be identified as “swing music from the 40s,” but the indexing would not be able to identify particular artists or song titles that are included in the stream. As a result, users searching for a particular artist may not know about the inclusion of the artist in a particular data stream because the metadata associated with the stream does not include such information. Second, even if the indexing technique was able to identify a particular artist that was being played on the stream at the time of indexing, because of the delay associated with most crawls such an index would quickly become out of date. Users searching for a particular artist would be unable to find the artist in a particular data stream because all or some of the results may be stale (meaning that a user selecting a search result and being redirected to a stream would not find the artist they were seeking in the stream) or no results may be returned even though the artist should be found in a stream if the stream were being indexed in a timely fashion. In each case, the user is unable to locate a desired result responsive to their search query.
In view of the above-discussed disadvantages of conventional approaches, a more effective approach to indexing transient multimedia resources would have substantial utility.