As background to understanding the invention, an aspect of the Internet (also referred to as the World Wide Web, or Web) contributing to its popularity is the plethora of multimedia and streaming media files available to users. However, finding a specific multimedia or streaming media file buried among the millions of files on the Web is often an extremely difficult task. The volume and variety of informational content available on the web is likely to continue to increase at a rather substantial pace. This growth, combined with the highly decentralized nature of the web, creates substantial difficulty in locating particular informational content.
Streaming media refers to audio, video, multimedia, textual, and interactive data files that are delivered to a user's computer via the Internet or other network environment and begin to play on the user's computer before delivery of the entire file is completed. One advantage of streaming media is that streaming media files begin to play before the entire file is downloaded, saving users the long wait typically associated with downloading the entire file. Digitally recorded music, movies, trailers, news reports, radio broadcasts and live events have all contributed to an increase in streaming content on the Web. In addition, less expensive high-bandwidth connections such as cable, DSL and T1 are providing Internet users with speedier, more reliable access to streaming media content from news organizations, Hollywood studios, independent producers, record labels and even home users.
A user typically searches for specific information on the Internet via a search engine. A search engine comprises a set of programs accessible at a network site within a network, for example a local area network (LAN) or the Internet and World Wide Web. One program, called a “robot” or “spider”, pre-traverses a network in search of documents (e.g., web pages) and builds large index files of keywords found in the documents, where such an index of keywords is eventually used as a database for a search engine. Typically, a user formulates a query comprising one or more search terms and submits the query to another program of the search engine. In response, the search engine inspects the index files and displays a list of documents that match the search query, typically as hyperlinks. The user may then activate one of the hyperlinks to see the information contained in the document.
Typical search engines, however, have drawbacks. For example, many search engines are oriented towards textual information only. In particular, they are not well suited for searching information contained in structured databases (e.g. relational databases), voice related information, audio related information, multimedia, and streaming media, etc. Also, mixing data from incompatible data sources, as data stores, is difficult for conventional search engines.
Moreover, within the workflow of a search engine, data from many different data stores are brought together and processed into a database used during the course of a search query. Because such data and sources of such data may be vastly different, generalized processes of applying the same extraction and refinement processes to all obtained data may result in the database not having that much useful information.
There is a need, therefore, for defining rules for the workflow of a search engine affecting the extraction and data refinement processes of such a workflow in view of data associated with a data source that overcome the previously described drawbacks and disadvantages.