The present invention relates generally to locating computer documents and more particularly to determining topics of interest to a user and locating documents related to those topics.
In the past, personal computer systems worked as stand-alone units, and information was stored on local hard disks or floppy disks. Correspondingly, information retrieval methods were developed for users to locate information or data that they themselves had earlier created and stored on their own computers. For example, hierarchical directory and file systems were developed which enabled users lo manually store related information in files, and then organize related files tog-ether within a particular directory. Database systems were also developed to store large amounts of information that could be accessed by a user.
As computer usage has become increasingly prevalent, a vast amount of information has become available from computers. In particular, interconnecting computers via computer networks has allowed computer users to not only access information that is stored locally on their computers, but also to access information stored on an enormous number of other computers and storage devices. The recent growth in intranets, the World Wide Web (WWW), and the Internet has greatly expanded the amount of computer-accessible information. With information from other computers available, computer users will often wart to access information of interest that they themselves did not create. However, it is often difficult or impossible for a user to even discover all of the information that is available, let alone to select the information that is of interest to them from among these vast possibilities. Various methods have been developed to assist with these problems.
While manually indexing and organizing large amounts of information is possible, this technique does not generally provide a satisfactory solution to the problem of locating computer-accessible information of interest. Instead, various companies have developed information search engines which can automatically index and organize information that is accessible from a computer. This accessible information may be located on any networked computer or storage device that the computer can access, or may be located on the computer system itself. After the information is indexed or organized, these search engines can then search the indexed or organized information to locate particular information of interest.
Information that is located by a search engine is typically made available as one or more computer documents. A computer document comprises related information which is grouped together by human users, either physically or conceptually. A single computer file or a particular entry in a computer database can represent a physical grouping of information into a document, but a document could also be a portion of a file, multiple files, or portions of multiple files. Similarly, a document could be a single database entry, multiple database entries, or portions of multiple database entries. The contents of a computer document will typically be related to one or more topics, and will be composed of various terms that are related to these topics. Such terms include text, but may also include other forms of information such as symbols, images, video clips, audio clips, embedded executable programs, etc. In addition, computer documents on systems such as the WWW are often interconnected with other documents, with the contents of these WWW documents containing references to other accessible documents (these interconnected documents are often referred to as a web of information).
Most information search engines (also referred to as xe2x80x9cspidersxe2x80x9d because they traverse this web of information) function by performing two separate functions, indexing and searching. During the indexing function, a search engine will be given one or more computer documents. The search engine will analyze the contents of the documents, and create an index of some or all of the terms in the documents. The search engine may also attempt to identify one or more general topics to which the entire document relates. The search engine will next search the documents for references to other computer documents. Upon finding such references, the search engine will access those referenced documents and continue the same process. In this manner, the search engines can eventually traverse and index all computer documents that are interconnected with the first documents given to the search engine. After creating this comprehensive index, the search engine can locate documents by receiving a search query containing terms or topics of interest to a user, and by searching the index to locate documents with corresponding terms or topics.
However, current search engines suffer from numerous drawbacks. For example, if a user wishes to locate information on Apple Computer Corporation, the user may request computer documents containing the term xe2x80x9capple.xe2x80x9d If the user does so, the search engine is likely to locate a significant number of documents which are related to the term xe2x80x9capple.xe2x80x9d but which are not related to the computer company. For example, the search engine is likely to locate documents that contain information related to fruit. A search on the WWW is likely to return thousands of documents that contain the term xe2x80x9capple,xe2x80x9d many of which will not be of interest to the user. Unfortunately, a user typically must access a particular document and begin to view it in order to determine their level of interest in the document. This is a time-consuming process, and it is typically not practical for large numbers of documents. Thus, current search engines can be highly inefficient in locating only relevant documents (i.e., those of interest to a user).
One method of increasing the efficiency of a search is for the user to create a more specific search query. However, in order to perform an efficient search, the user must formulate highly sophisticated search queries. Such queries typically use a form of Boolean logic, requiring the use of AND, OR and NOT terms. Most search engines have implemented even more sophisticated options for their search queries, such as searching only certain portions of documents. Obviously, a user must understand a document""s structure to use such an option Even sophisticated computer users can have difficulty formulating efficient search queries. For novice computer users, the problem can be overwhelming.
Users typically have access to a small number of documents that are of interest to them, and often desire to access additional documents which have related contents. However, many computer users are unable to even identify the relevant terms or topics for a particular document which they can already access, and even when given the appropriate terms and topics they are not generally capable of creating an appropriate search query that would allow them to locate additional documents that are of interest.
Some embodiments of the present invention provide a method and system for locating information of interest to a user, such as computer documents or data, without specification by the user of topics of interest. The system detects when the user of the system selects computer documents, and monitors the user""s interactions with the selected computer documents. The system also analyzes the contents of the selected computer documents to identify relevant terms in the contents of the documents, and more generally to identify topics to which the contents are related. The system then proceeds without user intervention, and uses the identified terms and topics and the monitored user interaction information to generate topics of interest to the user. The system next determines a level of user interest in the various generated topics, and prioritizes the generated topics on the basis of these levels so that the topics of most interest receive the highest priority. The system then attempts to locate additional computer documents, on any computer or device that is accessible to the system, whose contents are related to these prioritized generated topics of user interest. One method that the system may use to locate these documents involves identifying a computer document search engine, generating an appropriate search query, and requesting the search engine to perform the search on the generated search query. Softer additional documents are located, they are made available to the user for selection.
In another embodiment, a group of documents or data has been designated by others for selection by the user and the system uses the prioritized generated topics of user interest to prioritize the order of selection within the group so that the documents or data of most interest to the user can be selected first. In yet another embodiment, the system monitors live data feeds (where data is accessible for only a short time), uses the prioritized generated topics of user interest to identify data or documents of interest, and selects the identified data or documents for the user while the data or document is still accessible.