In recent years, computers have taken the world by storm. Today, most businesses entirely rely on computers to conduct daily operations. In the academic world, computers have become essential tools for learning, teaching and research. In homes, computers are used to perform daily tasks ranging from paying bills to playing games. The one unifying requirement for all computer applications is the ability of a user to utilize a computer to locate particular information or data desired by the user.
During the past few years, the quantity and diversity of information and services available over the public (e.g. Internet) and private (e.g. Intranet) local and wide area networks has grown substantially. In particular, the variety of information accessible through Internet-based services is growing rapidly both in terms of scope and depth. In simple terms, the Internet is a massive collection of individual computer networks operated by government, industry, academia, and private parties that are linked together to exchange information. While originally, the Internet was used mostly by scientists, the advent of the World Wide Web has brought the Internet into mainstream use. The World Wide Web (hereafter "WWW") is an international, virtual-network-based information service composed of Internet host computers that provide on-line information in a specific hypertext format. WWW host servers provide hypertext metalanguage (HTML) formatted documents using a hypertext transfer protocol (HTTP). Information on the WWW is accessed with a hypertext browser, such as the Netscape navigator or Microsoft Explorer. Web sites are collections of interconnected WWW documents.
Typically, users communicate with the Internet through a communication gateway that may be implemented and controlled by an Internet service provider (i.e. an ISP)--a company that offers a user access to the Internet and the WWW through a software application that controls communication between the user's computer and the communication gateway. The role of the ISP may also be taken directly by a particular organization that allows internet access to its employees or members. The user can access and navigate the WWW using a hypertext browser application residing on, and executed by, the user's computer.
No hierarchy exists in the WWW, and the same information may be found by many different approaches. Hypertext links in WWW HTML documents allow readers to move from one place in a document to another (or even between documents) as they want to. One of the advantages of WWW, is that there is no predetermined order that must be followed in navigating through various WWW documents. Readers can explore new sources of information by following links from place to place. Following links has been made as easy as clicking a mouse button on the link related to the subject a user wants to access. Each WWW document also has a unique uniform resource locator ("URL") that serves as an "address" that, when followed, leads the user to the document or file location on the WWW. Using the browser, the user can also mark and store "favorites"--URLs of particular WWW documents that interest the user such that the user can quickly and easily return to these documents in the future by selecting them from the favorites list in the browser.
Because of the vastness of the Internet and the WWW, locating specific information desired by the user can be very difficult. To facilitate search for information a number of "search engines" have been developed and implemented. A search engine is a software application that searches the Internet for web sites containing information on the subject in which the user is interested. These searches are accomplished in a variety of ways--all well-known in the art. Typically, a user first inputs a "search string" to the hypertext browser containing key words representative of the information desired by the user. The search engine then applies the search string to a previously constructed index of a multitude of web sites to locate a certain number of web sites having content that matches the user's search string.
The located web site URLs are then presented to the user in the order of relevance to the key words in the user's search string. For example, a user providing the key word PLANT would obtain an exhaustive list of all registered sites that refer to plants. This list, however would be so large that the user would want to limit this search. Depending on the search engine used, the user could limit the search by entering a combination of key words such as the following: PLANT AND FLOWER AND GARDEN. This would limit the search to only Internet sites that contain all three words. In addition, users could further limit the search by entering PLANT AND FLOWER AND GARDEN NOT TREE NOT ORCHID. The results from this search would be further limited to exclude sites in which trees and orchids are listed as keywords.
A number of approaches have been developed to improve the performance and accuracy of typical key word searches. For example, U.S. Pat. No. 5,845,278, issued to Kirsch, et. al, teaches approaches to establishing a quantitative basis for selecting client database sets (i.e. Internet documents or web sites) that include the use of comprehensive indexing strategies, ranking systems based on training queries, expert systems using rule-based deduction methodologies, and inference networks. These approaches were used to examine knowledge base descriptions of client document collections or databases.
However, the key word searching approaches utilized by previously known search engines suffer from a number of significant disadvantages. Most search systems are viewed as often ineffective in identifying the likely most relevant documents. Accordingly, the users are often presented with overwhelming amounts of information in response to their key words. Thus, using proper key word searching techniques becomes an art in itself--an art that is outside the capabilities of most Internet users.
Most importantly, typical key word and even more advanced searches only provide the user with search results that depend entirely on the search string entered by the user, without any regard to the user's cultural, educational, social backgrounds or the user's psychological profiles. The results returned by the search engines are tailored only to the search string provided by the user and not to the user's background. None of the previously known search engines tailor results of user's searches based on his or her background and unexpressed interests. For example, a twelve year old child using key word searches on the Internet for some information on computers may be presented with a multitude of documents that are far above the child's reading and educational level. In another example, a physician searching the Internet for information on a particular disease may be presented with dozens of web sites that contain very generic information, while the physician's "unexpressed" interest was to find web sites about the disease that are on his educational and professional level.
It would thus be desirable to provide a system and method for extracting and using linguistic patterns of textual data to assist a user in locating requested data that, in addition to matching the user's specific request, also corresponds to the user's professional, cultural, educational, and social backgrounds as well as to the user's psychological profile and thus addresses the user's "unexpressed" requests.