1. Field of the Invention
The present invention relates to a system for searching a remote database, and more particularly to a system which indexes documents in the database, which identifies documents in the index that include user-specified data, and which outputs a list of documents that contain such data and, optionally, excerpts from those documents. The invention has particular utility in connection with text indexing and retrieval systems, such as World Wide Web search engines.
2. Description of the Related Art
In general, search engines search through a database for specific data and retrieve titles of documents in the database which contain that data. For example, World Wide Web search engines, such as Altavista(trademark) and Yahoo!(copyright), provide users with the ability to search the Web for documents containing user-specified words, phrases, or the like. However, conventional search engines, and Web search engines in particular, suffer from a drawback in that they do not allow a user to direct a search to a single database.
More specifically, conventional Web search engines, such as those noted above, operate by generating an index for all sites on the Web, and then retrieving data from that index in response to user queries. Since these search engines generate an index for all sites on the Web, however, they are limited to searching the entire Web. This is disadvantageous, particularly for those users who only want to search specific sites.
In response to the foregoing drawbacks in the art, software manufacturers have developed site-specific searching systems, such as Ultraseek(trademark), which ostensibly allow users to limit their searches to specific Web sites. To use these systems, however, they must be installed at each Web site, e.g., by the Web site""s provider. Once installed, the systems create an index of the Web site at the provider""s location. Thereafter, when a user accesses the Web site and inputs a search query, code at the site searches that index for the query, and relays the results of the search back to the user.
While the foregoing types of site-specific searching systems address some of the problems associated with conventional Web search engines, such as Yahoo!(copyright) and the like, they have several drawbacks. For example, their installation and subsequent maintenance can be costly and time consuming. As a result, Web site providers often choose not to install such systems at their sites. Additional problems arise with these systems in cases where a Web site is maintained by a Web site hosting company, as opposed to by the provider itself. That is, in these cases, in addition to the above problems, problems relating to licensing and the like arise, which make it difficult to implement conventional site-specific searching systems in a cost-effective manner.
In addition to the foregoing drawbacks, conventional database searching systems provide the user with only names/titles of documents in response to a query. For example, conventional Web search engines provide only the name of a document containing a search term, together with a uniform resource locator (xe2x80x9cURLxe2x80x9d) for that document. As a result, it is not always possible for the user to determine which of the retrieved documents is relevant without actually linking to, and opening, the document. This can slow down searching significantly.
Thus, there exists a need for a searching system which provides the user with the ability to search a portion of a database, such as one or more sites on the World Wide Web, and which is more cost effective, efficient, and easy to use than the conventional systems described above. In addition, there exists a need for a database searching system which is able to provide a user with the context of each search term in documents retrieved as a result of the search.
The present invention addresses the foregoing needs by providing a way to search through a database at one network site (e.g., a Web site) using a host computer which is at another network site. By hosting the search at a separate site, the present invention facilitates site-specific searching, as described below.
More specifically, according to the present invention, a Web site provider, for example, is able to create a search engine for the Web site simply by accessing the present invention via the Web and entering a request for a new account. In response to this request, the invention assigns the Web site a provider identifier, and then extracts URL(s) from the Web site. Thereafter, the invention xe2x80x9ccrawlsxe2x80x9d through the site in order to create an index of the site, which comprises data from the site indexed by document (e.g., Web page) and provider identifier. Once the indexing process has been completed, the site provider need simply copy a few (e.g., 10) lines of code into any sites for which searching capabilities are desired.
Following the foregoing (i.e., the setup), each time the Web site is visited, it will automatically transmit its provider identifier to the visiting user""s site. In addition, the Web site will display a search line, from which the visiting user may enter queries to search the site for specific data. When such a query is entered, the query, together with the provider identifier, is passed from the user""s site to the host computer""s site, where the actual searching takes place. Specifically, at the host computer""s site, an index corresponding to the provider identifier is retrieved from memory and searched for the data specified in the user""s query. Thereafter, a list of documents which contain the data (including URLs in the case of the Web) is output from the host computer""s site to the user""s site and displayed there.
By conducting the search at the host computer""s site, rather than at the Web site itself, the present invention reduces the difficulties involved with installing and maintaining an entire software application at the Web site. As a result, the present invention provides a way to search specified Web sites (and other types of databases as well), which is more efficient and less costly and time consuming than the conventional site-specific searching systems described above.
Thus according to one aspect, the present invention is a system (i.e., a method, an apparatus, and computer-executable process steps) for initiating a search at a first network site for user-specified data in a remote database at a second network site and for conducting the search at a third network site (e.g., at a host computer""s site). To begin, the system receives, at the first network site, a provider identifier associated with the database from the second network site. Thereafter, the user-specified data is input at the first network site, following which the user-specified data and the provider identifier are output from the first network site to the third network site. The system then searches for the user-specified data in a database at the third network site using the provider identifier. In the invention, this database at the third network site includes data that corresponds to data stored in the remote database at the second network site.
According to another aspect, the present invention is a way to configure a computerized searching system (such as the searching system resident at the host computer""s site described above) so that the searching system can be used to search a database. In this aspect of the invention, information identifying the database is input, a provider identifier is assigned to the database, and a search through the database is conducted using the input information in order to identify locations of documents in the database. Thereafter, the locations of the documents in the database are stored in memory together with the provider identifier, and the documents in the database are indexed. An index of the documents is then stored in memory together with the provider identifier; and data corresponding to data in the database is also stored in memory together with the provider identifier. The provider identifier is then output to the database. As noted above, this provider identifier is transmitted to those who visit the site.
According to still another aspect, the present invention is a system for identifying which documents in a database contain user-specified data. The system stores, in memory, indices of data in plural databases, such as those noted above. The system then receives the user-specified data and a provider identifier which corresponds to one of the plural databases, and retrieves, from memory, an index of data for a database that corresponds to the provider identified. Thereafter documents in the retrieved index that contain the user-specified data are identified, and identities thereof are output to the user.
The present invention also provides an optional feature for displaying excerpts from documents identified by a database search. In this aspect of the invention, the index of documents in the database is stored, and pointers to data segments in the database are generated based on the index. These data segments comprise target data together with data surrounding the target data. That is, assuming that the target data comprises a word which matches an input user query, the data segment for that word might comprise, e.g., five words to the left of the word, the word itself, and five words to the right of the word. When the invention searches the index for the word, it compiles a list of pointers to data segments which include the word. These data segments may then be extracted and passed to a user""s site for display along with the list of documents.
Thus, according to this aspect, the invention is a system for retrieving a list of documents in a database which include user-specified data, and of retrieving one or more data segments from each document on the list. The system includes storing an index of documents from the database, the index including pointers corresponding to data in the database, where the pointers define data segments having a predetermined size. One or more documents in the database that contain the user-specified data are then identified based on the index; and a list is created which includes one or more pointers corresponding to each occurrence of the user-specified data in the identified documents. Each data segment in the database that contains the user-specified data is extracted based on the list of pointers created in the creating step, whereafter a list of the documents in the database that contain the user-specified data is output, together with the extracted data segments.
By virtue of the foregoing, the invention makes it possible to display data excerpts (i.e., segments) from each document found in the search. A user may then refer to these excerpts in order to determine whether each document is relevant, instead of actually opening the document. As a result, the present invention facilitates database searching.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiment thereof in connection with the attached drawings.