1. Field of the Invention
This invention relates to networks of computational devices, and more particularly to enhancement of identification of relevant information accessible using such a network.
2. Description of the Related Art
The following descriptions and examples are not admitted to be prior art by virtue of their inclusion within this section.
The continuing proliferation of powerful, convenient computational devices has been accompanied by an increase in the use of networks connecting these devices. Computational devices include computers and other, often portable, devices such as wireless telephones, personal digital assistants, and automobile-based computers. Such portable computational devices are also sometimes termed xe2x80x9cpervasive devicesxe2x80x9d. xe2x80x9cComputerxe2x80x9d, as used herein, may refer to any of such computational devices. The networks connecting computational devices may be xe2x80x9cwiredxe2x80x9d networks, formed using xe2x80x9cland linesxe2x80x9d such as copper wire or fiber optic cable, wireless networks employing earth and/or satellite-based wireless transmission links, or combinations of wired and wireless network portions. Many networks are organized using a client/server architecture, in which xe2x80x9cserverxe2x80x9d computational devices manage resources, such as files, peripheral devices or processing power, which may be requested by xe2x80x9cclientxe2x80x9d computational devices. The client device is often operated by a user of the network. Computational devices not operated directly by a user, such as xe2x80x9cproxy serversxe2x80x9d which act on behalf of other machines, may act as either clients or servers.
Currently a very widely used network is the Internet, a global network of computational devices which communicate using a set of protocols called TCP/IP (transmission control protocol/Internet protocol). An especially popular aspect of the Internet is the World Wide Web (WWW, or xe2x80x9cwebxe2x80x9d), a collection of interlinked documents formatted in hypertext markup language (HTML). These documents, or xe2x80x9cweb pagesxe2x80x9d, may incorporate text, graphics, audio, and/or video content, and may include convenient links to one another, often called xe2x80x9chyperlinksxe2x80x9d or simply xe2x80x9clinksxe2x80x9d. Documents or files are requested by client computers through an application program called a web browser. The files are requested from server computers, or xe2x80x9cweb serversxe2x80x9d. The transmission of the files over the web uses an additional Internet protocol called hypertext transfer protocol (HTTP).
An important feature of the Internet is that it is substantially free of central organization. A server hosting, for example, web pages can be connected to the Internet easily and at relatively low cost. Although this decentralization allows extremely wide access, and an ever-increasing variety and availability of information, identifying and locating specific pieces of information (or, e.g., information on a specific topic) can be extremely difficult. For this reason, various database systems, including search engines and directories, have been developed to aid users in finding specific types of information. These database systems typically contain an entry for each web page (or other file or document) included in the database, so that the database can be searched relatively quickly without the need to retrieve actual web pages in order to perform a search. An entry may include, for example, a network address of the page (typically a Uniform Resource Locator, or URL) along with one or more keywords associated with the page content and possibly a brief summary of the page content. Although xe2x80x9csearch enginexe2x80x9d is often used to describe any such database system, a xe2x80x9csearch enginexe2x80x9d is sometimes distinguished from xe2x80x9cdirectoryxe2x80x9d in that with a search engine the database entries are collected automatically using programs (often called xe2x80x9cspidersxe2x80x9d, xe2x80x9crobotsxe2x80x9d or xe2x80x9ccrawlersxe2x80x9d) which visit web pages and collect the needed data (typically by downloading a page file and subsequently processing the file). Examples of currently available search engines include AltaVista and Excite. xe2x80x9cDirectoryxe2x80x9d may be used to describe a database for which entry information is submitted manually, typically by a web site developer. Yahoo! is an example of a currently available Internet directory (Yahoo! also currently provides an automated search engine called Inktomi). xe2x80x9cSearch enginexe2x80x9d as used herein may refer to either of these types of database system.
An illustration of a client-server network containing a search engine server is shown in FIG. 1. Network 11 connects various computational devices, such as search engine server 15, clients 17, and information content servers 42. Although these computational devices are shown outside of the oval representing network 11 for clarity, these devices are actually a part of the network as well. In the embodiment of FIG. 1, network 11 is the World Wide Web, and may include millions of web content servers 42, represented collectively by the surrounding dashed line, located anywhere in the world. Each content server may store, or xe2x80x9chostxe2x80x9d, one or more web sites 13. Each web site includes one or more web pages, as described above. Transmission media 26 are used to connect the search engine server, clients, and content servers hosting the web sites to network 11, which includes other transmission media and computational devices interconnected all over the world. In a typical searching sequence, communication is established over network 11 between a client 17 and search engine server 15, typically using a web browser program on the client. Search criteria are entered by a user of the client machine, and transmitted to the search engine server. The search engine server searches the information available on the network, including the information content servers 42, for documents relevant to the search criteria. This is typically done by searching a database stored on the search engine server, where the database includes previously-formed entries corresponding to web pages accessible over network 11. The results of the search are transmitted back to the requesting client. The client may then, for example, access particular web pages included in the results directly over network 11, as desired by the user.
The large and constantly expanding size of the Internet and World Wide Web presents difficult challenges for using search engines in the above-described process. For example, most engines are currently unable to index (create database entries for) the entirety of the documents available on the Internet, or even a substantial fraction of these documents. The storage space and computational time constraints which limit the ability of the engines to index documents may also limit the complexity of the database entries for documents which are indexed, such that only a rough categorization and/or analysis of search results may be performed. This may create various problems for a user of search engines, including excessive numbers of xe2x80x9chitsxe2x80x9d, or documents matching the search query, returned by the engine. The documents returned in response to a search query may also be incorrectly matched to the subject of the query, and incorrectly or insufficiently categorized (if categorized at all in a way apparent to the user). In some cases, for example, priority of search results returned may be influenced by factors such as advertising revenue to a search engine""s web site. Resource limitations on the search engine server often prevent the use of algorithms which might improve the accuracy and categorization of search results. However, the algorithms for categorization and searching are often very good, particularly the ones utilizing techniques from the field of artificial intelligence. The fundamental problem is believed to be the resource restrictions on the search engines. Hence the search engines often use xe2x80x9cquick and dirtyxe2x80x9d algorithms for categorization and searching. Typically this involves examining only a few keywords or hyperlinks in a Web page and rapidly returning the results. Incremental classification is often done in a semi-automated or manual fashion.
It would therefore be desirable to develop a system and method to improve the accuracy of network search results without increasing resource requirements for a search engine server. The desired method would allow improvement of the results of both a current search and future searches involving a topic.
The problems outlined above are in large part addressed by a system, method and program for improving initial search results obtained by a client from a server by utilizing computing resources on the client. The initial search results are further processed on the client machine, which may produce a more refined set of search results. For example, the search results may be re-prioritized and/or categorized, or less relevant results may be discarded. The usefulness of the search results to a user may therefore be improved, without the need for additional server resources (e.g., computation time or storage space). The additional search results produced by the processing on the client, or at least information derived from the additional results, are preferably sent back to the server. The server may then update a database according to the additional search results, so that results for subsequent searches on the server may be improved. This updating could involve, for example, adding an additional keyword to some database entries, or removing an inappropriate keyword from an entry. The additional processing of the search results on the client may in some embodiments be done at times of low utilization of the client machine, or as a background process, rather than being done real-time during a user""s search session.
In an embodiment of a method for identifying stored information accessible over a network, initial results corresponding to an initial search of the stored information are transferred from a network server to a network client. An additional search of the initial results is performed using the client to produce additional results (e.g., a refined version of the initial results), and data associated with the additional results is transferred from the client to the server. The initial search results transferred to the client may include descriptions of files accessible over the network, and such a description may include, for example, a network address and one or more keywords associated with the corresponding file. Performance of the additional search may include comparing the initial results to user-entered search criteria. In some embodiments, the stored information corresponding to the initial results (e.g., network-accessible files or documents) may be downloaded by the client, and the stored information compared to the user-entered search criteria. The additional search may in some embodiments include ranking and/or categorizing of the initial results by a user. This ranking or categorizing by the user may be done xe2x80x9cmanuallyxe2x80x9d, for example by manipulation within a graphical user interface of icons representing documents (or descriptions of documents) returned by the initial search. The transfer of data associated with the additional results may, in an embodiment, include transferring the additional results or a subset of the additional results. Alternatively, the transferred data may include instructions for updating a database on the server, where the instructions are derived from the additional results. The data associated with the additional results could also include, for example, an indicator of the relevance of one or more of the documents returned by the initial search. The method may also include updating a search database on the server, such that the results of future searches may be improved.
An embodiment of a system for identifying stored information accessible over a network includes a network client adapted to perform an additional search on initial search results received from a network server, where the network client is further adapted to transfer additional search results to the network server. The network client may include a processor, a storage device, a browser program, and a client-side search program. The client-side search program may be adapted to perform the additional search by comparing the initial search results to user-entered search criteria. Alternatively or in addition, the client-side search program may be adapted to compare user-entered search criteria to downloaded files or documents corresponding to the initial search results. The system may further include the network server, where the server is adapted to transfer the initial search results to the client and receive data associated with additional search results from the client. The network server may include a processor, a storage device and a server-side search program, as well as a search database including data characterizing the stored information (e.g., entries describing web pages available through the network). The server may be further adapted to use the data received from the client to update the database, such that the results of future searches may be improved.
In addition to the method and system described above, a computer-usable carrier medium is contemplated herein. The carrier medium may be a storage medium, such as a magnetic or optical disk, a magnetic tape, or a memory. In addition, the carrier medium may be a transmission medium, such as a wire, cable, or wireless medium along which data or program instructions are transmitted, or a signal carrying the data or program instructions along such a wire, cable or wireless medium. The carrier medium may contain program instructions executable for carrying out embodiments of the methods described herein. For example, a carrier medium may contain program instructions executable for receiving from a network server initial results of an initial search of stored information over a network, performing an additional search of the initial results to produce additional results, and transferring data associated with the additional results to the server. In such an embodiment, the program instructions may form part of a client-side search program. Alternatively, a carrier medium may contain program instructions executable for transferring to a network client initial results of an initial search of stored information available over a network, and for receiving from the network client data associated with additional search results. The carrier medium may further contain program instructions executable for using the data received from the client to update a database used in performing the initial search. Such program instructions may form part of a server-side search program.