The present invention relates to method and system for searching information databases and, more particularly, method and system for electronically searching indexed information databases of information sources accessible over the Internet with automatic user registration.
Electronically searchable information databases interconnected through communication links, computers and computer networks, such as the Internet, provide consumers or others who desire to access, i.e., search for or retrieve, information concerning a topic of interest with a vast, although decentralized, data depository from which information related to the topic can be accessed. These information databases constitute sources of information which are constantly growing in number. The information sources can contain information which is in text, image, audio, video and multi-media formats and which is, preferably, arranged on graphical web sites or web pages accessible on the World Wide Web via the Internet. An Internet user can choose from one of a number of search services or search engines to search for information on a topic of interest and to retrieve web pages corresponding to web page titles identified in the search results of such searches as being related to the topic of interest. Often, the Internet user encounters obstacles in the quest to search, with relative ease and speed, web pages which are likely to contain highly relevant and high quality information concerning a topic of interest. The information sources which can be searched rapidly and easily over the Internet usually are those that do not require payment of a fee or registration before access is permitted. Furthermore, such information sources usually have too much irrelevant information, insufficient relevant information and information which is not of high quality. Also, search engines typically use primitive and undeveloped search procedures that return a large number of irrelevant web page hits that a user must view individually. It is common that an Internet user will only retrieve and view the first few web page hits returned for a particular search.
Moreover, if an Internet user desires to search for information concerning a narrowly defined and specialized topic of interest, such as information concerning a particular medical ailment, the shortcomings of Internet searching described above do not allow for easy and rapid access to highly relevant and high quality information concerning such specialized topic of interest. For example, a layperson or a trained health care professional, such as a physician, nurse or medical technician, who desires to obtain specialized information related to a narrowly defined topic of interest, such as cardiopulmonary edema, does not have available a search engine which can be used to search the World Wide Web effectively, quickly and easily for information on such narrowly defined topic of interest in a large number of information sources containing highly relevant and high quality information content.
Some owners of information sources containing premium content or specialized information have made their information databases available for searching over the Internet in the form of fee-based subscription services requiring registration. An Internet user often is reluctant to and typically does not access such subscription services because the procedure of initially registering with such a service and subsequently providing a password or login information to access the information database of the service as a registered user is too cumbersome and time consuming. Although some subscriptions services allow searching of titles or the bodies or full text content of web pages contained in their information databases without registration, registration is required subsequently when the user clicks on a web page hit displayed on the user""s browser to retrieve the web pages corresponding to the web page hit where such web pages are linked to a subscription service. Further, the time overhead associated with having a user provide registration information several times during a search or for each search performed is very burdensome, especially for a person, such as a busy and time-pressed physician, who may require immediate access to specialized information or desire that the same searches be repeated subsequently to ensure that the person remains informed of new developments in a specialized area of interest. Consequently, an Internet user generally limits a search for specialized information to information sources which do not have registration requirements or possibly to only a single subscription service, thereby decreasing the opportunity of identifying the most pertinent and highest quality information for a search query.
Therefore, there exists a need for a search engine capable of accessing information sources whose information databases have been indexed to provide that high quality and highly relevant information concerning a topic of interest is identified for a search query, which automatically registers a user with an information source requiring registration for access to provide that such source can be searched and information can be retrieved therefrom without having the user provide any user identification data to such source and which can update the search results for a search query with relative ease and minimal time expenditure by the user.
In accordance with the present invention, method and system for electronically searching information databases of information sources, which can be accessed for free or on a subscription fee basis, provide for access to information on a topic of interest using a search engine which searches information databases whose data records have been indexed into index fields, such as title, full text content and classification category with a plurality of selections, and where indexing data is stored at an indexed database coupled to the search engine. The search engine, in addition, utilizes user identification data obtained a single time from a user and stored in a user identification database coupled to the search engine to register the user automatically with an information source requiring registration for access, such as a subscription service, without requiring any submission of user identification data by the user when access to such registration information source is desired, thereby increasing the speed and ease with which a large number of indexed information sources, including fee-based premium content subscription information sources, can be accessed.
In a preferred embodiment, the system is an Internet search engine for searching the World Wide Web and includes a server engine which is interconnected with a user identification database, an indexed web page database and a user web page database. The user identification database stores user identification data for system users and registration compatibility information for information sources requiring registration for access to provide for automatic registration of a system user with an information source requiring registration for access, for example, a medical and health care information fee-based subscription service, and whose web pages are linked to the system and indexed at the indexed web page database. The user identification database, preferably, stores transaction data representative of the transactions, e.g., searches for or retrievals of information, that are performed by a user at information sources requiring registration for access to provide for accounting and subscriber service management for such information sources, as may be required. The classification categories for web pages indexed in the indexed web page database can include categories obtained by automatic web-traversing programs called robots or spiders and, preferably, categories and respective selections generated by review of the content of web pages by a human viewer.
In one preferred embodiment, the server engine includes a query server containing a search processor which performs searching of the indexed database based on the search query entered and expansion words generated from the search query using semantic network expansion. The query server uses selections in the respective classification categories selected by the user to limit the web page hits returned in the search results. The classification categories can include: type of web page having selections such as text, image, audio, video, multimedia, etc.; subject matter description of web page; and target audience of web page having selections such as health care professional or patient. Preferably, the searching is performed on web pages indexed into selections of respective classification categories by a human viewer. The query server ranks the relevancy of web pages identified as web page hits based on the match which has been identified between the original search query or the expansion words and the indexed data in the index fields of the indexed database.
In a preferred embodiment, the rank value for a web page hit identified as a match is computed based on whether the identification resulted from a match between the search query or expansion words and those words contained in a title or full text context index field for web pages and, more preferably, furthermore based on the relative proximity and frequency of occurrence of search query or expansion words within the web page body of a web page for which a match has been identified.
In a further preferred embodiment, the search engine includes an automatic search component to provide (i) that a search for a search query automatically is repeated at user defined intervals, (ii) that the web page hits identified from a repeated search for a search query are stored on a user web page maintained at the user web page database, and (iii) that the user is notified when the search results for a search query have been updated, preferably, electronically by such means as email, facsimile or automatic telephone messaging. Each user web page preferably includes the user""s search queries and the links of the web page hits for the original search results and the updated search results, respectively.
In still a further preferred embodiment, the server engine utilizes user identification data in the user identification database to provide that the user can purchase services and products offered for sale by a registration information source which is also an e-commerce web site, and whose web pages may or may not be indexed in the indexed database, without requiring the user to provide any user information data, such as a credit card number, to such e-commerce web site. Based on a pre-established data exchange protocol agreed upon between the respective owners of the search engine and the e-commerce web site, the server engine automatically, without any submission of user information by the user, submits user identification data to and receives transaction data from the e-commerce web site to facilitate payment for products or services purchased by the user.