The present invention relates generally to identifying and accessing information stored by communication and information networks. More particularly, the present invention describes techniques for identifying and accessing information of interest to a use while preserving the privacy of the user.
With the widespread use of computers, an expanding telecommunication network, and the rising popularity of communication networks such as the Internet, an increasing amount of information is contained in documents stored by computer systems coupled to the communication networks. Users can access these documents by using computer systems coupled to the communication networks. For example, a user can browse the Internet and access web pages stored by servers coupled to the Internet.
Computer systems connected to communication networks such as the Internet can generally be classified as xe2x80x9cclientsxe2x80x9d or xe2x80x9cserversxe2x80x9d depending on the role the computer systems play with respect to requesting information or storing/providing information. Computers systems which are used by users to access information are typically called xe2x80x9cclientxe2x80x9d computers. Computer systems which store information and provide the information to client computers are usually referred to as xe2x80x9cserverxe2x80x9d systems. Accordingly, server systems are responsible for receiving information requests from client systems, performing processing required to satisfy the requests, and for forwarding the results/information corresponding to the information requests back to the requesting client systems. The processing required to satisfy the client request may be performed by a single server system or may alternatively be delegated to other servers connected to the communication network, such as the Internet. It should be apparent that a particular computer system may function both as a server and a client.
In the World Wide Web (xe2x80x9cWebxe2x80x9d) environment, information resources are typically stored in the form of hypertext documents called xe2x80x9cweb pagesxe2x80x9d which can be accessed and read by users of the Web. A web page may incorporate any combination of text, graphics, audio and video content, software programs, and other data. Web pages may also contain hypertext links to other web pages. Web pages are typically stored on web servers or content servers coupled to the Internet. Each web page is uniquely identified by an address called a Uniform Resource Locator (URL) that enables users to access the web page.
Users typically access web pages using a program called a xe2x80x9cweb browserxe2x80x9d which generally executes on a client computer coupled to the Internet. The web browser is a type of client application that enables users to select, retrieve, and perceive information contained in web pages. Examples of browsers include the Internet Explorer browser program provided by Microsoft Corporation, the Netscape Navigator browser provided by Netscape Corporation, and others. Users generally access web pages by providing URL information to the browser, either directly or indirectly, and the browser responds by retrieving the web page corresponding to the user-provided URL from the Internet. The retrieved web page is then displayed to the requesting user on the client computer.
Due to the vast volume of information available via communication networks such as the Internet, it is becoming increasingly difficult for a user to identify documents which contain information of interest to the user or documents which are relevant to the user. For example, in a Web environment, a user may be interested in locating web pages containing information on a particular topic, e.g., Thai cooking. In a Web environment, the user may locate the relevant web pages by accessing one or more web servers, and browsing through web pages stored by the one or more web servers to identify web pages containing information related to Thai cooking. However, searching for web pages in this manner is a non-trivial task because the user does not typically know which web servers store information of interest to the user. Further, since each web server may store a vast number of web pages, in order to find web pages containing information of interest to the user (e.g., web pages containing information related to Thai cooking), the user is often forced to sift through large volumes of information and web pages, most of which are irrelevant to the user. As a result, the task of identifying relevant web pages can be very time consuming and frustrating to the user, and may not yield the results desired by the user.
In order to alleviate the above problem, most users generally use programs which help identify relevant documents from a large pool of documents. These programs are commonly referred to as search engines and are generally executed by servers coupled to the communication network. Examples of search engines in the Internet environment include search engines provided by Yahoo, Google, Lycos, Excite, Altavista, and the like which enable users to identify web pages of interest to the user.
Search engines typically use a crawler or a spider to find information about documents stored by the communication network which are accessible to the search engine and which can be located and searched using the search engine. For example, in a Web environment, a crawler may access web pages and URL links to other web pages embedded in the web pages, and so on. For each web page accessed by the crawler, the crawler discovers information about the web page including the URL of the web page, the contents of the web page, the web server storing the web page, and the like. The information collected by a crawler is usually stored by the server providing the search engine in the form of an index.
An index built by a search engine generally facilitates identification of documents based on criteria related to the documents or their contents. The criteria may include words occurring in the documents, concepts or topics to which the documents relate, subject matter of the documents, and the like. The structure of an index may vary based on the search engine. For example, in a Web environment, a particular search engine may prepare an index mapping words found in a plurality of web pages to the URLs corresponding to the web pages. In another index, the information may be indexed based on titles, headings, subheadings, etc. found in the web pages, or based upon concepts and topics extracted from the web pages contents, and so on. In general, indices are built in a way that facilitates the identification of the documents and/or locations of the documents. In a Web environment, the locations of documents may be identified by URLs corresponding to the web pages.
A search engine also provides a search tool which allows users to identify documents of interest using information stored in the index generated by the search engine. In order to identify documents of interest, a user generally configures a query using a client computer. The query may contain query terms which describe, for example, a topic or concept for which the user is interested in finding more information. For example, if the user is interested in finding information on Thai cooking, the query terms may include the words xe2x80x9cThaixe2x80x9d and xe2x80x9ccooking.xe2x80x9d
The user-configured query is then communicated from the user""s client computer to a remote server system executing a search engine. Upon receiving the search query, the search engine executing on the remote server identifies documents (or locations of the documents) which match or satisfy the user query based upon information stored in the index used by the search engine. The search engine may use various techniques to determine documents which are relevant to the search query received from the user""s client system. Information identifying the relevant documents or their locations determined by the search engine is then communicated from the search engine server to the user""s client computer. The user may then use the information received from the search engine to access one or more of the relevant documents.
Some search engines also perform searches implicitly without receiving specific user input based on the contents of documents (e.g., web pages) viewed by the user. These search engines use the contents of the document being browsed/viewed by the user as a search query which is communicated from the user computer to the search engine server. Based on the contents of the document being viewed by the user and based upon index information used by the search engine, the search engine identifies documents of interest to the user. Information related to the documents identified by the search engine is then communicated to the user system. The information may then be presented to the user via a pop-up screen which appears on an output device of the user""s computer system. For example, in a Web environment, a window may appear on the user""s display device listing URLs corresponding to documents identified by the search engine to be of interest to the user based on the contents of the documents presently viewed by the user. Examples of companies which provide such implicit search engines include Nano (http://www.nano.com/), Kenjin (http://www.autonomy.com), Third Voice (http://www.thirdvoice.com/), Flyswat (http://www.flyswat.com), Gurunet (http://www.gurunet.com), Annotate (http://www.annotate.net/) and Alexa (http://www.alexa.com/).
In a Web environment, the relevant documents may be web pages which may be identified by URLs. Accordingly, the search engine may communicate a list of URLs of interest to the user to the user""s client system in response to the user query. The user may then select one or more URLs from the list of URLs and access web pages corresponding to the selected URLs. When the user selects a URL, the URL request is sent to a web server storing the web page corresponding to the URL, and the web server responds by communicating the requested web page to the user""s client computer system. The server executing the search engine may act as a conduit forwarding the selected web page received from the web server to the user client computer system.
While conventional search engines simplify the process of identifying documents containing information of interest to a user, they also compromise the user""s privacy. This is because conventional search engine servers frequently track and/or mine the user""s browsing activities and track information provided by the user to the search engine. For example, several conventional search engines mine, without the user""s permission, information contained in user search queries (which may contain information of a sensitive and private nature) provided to the search engines. Several conventional search engines also track the contents of documents (e.g. web pages) accessed by the user using the search engine. For example, in a Web environment, conventional search engines track the web pages accessed by the user, the content of the web pages, transactions performed by the user using the web pages, and other like information without the user""s permission.
The information mined or tracked by conventional search engines is then used to ascertain information about the user""s interests, likes/dislikes, the user""s shopping preferences, information related to the user""s use of the Internet, and other information related to the user and the user""s behavior. Since users generally have a tendency to use a particular search engine to perform searching, over a period of time, the particular search engine is capable of building a pretty detailed profile of the user and the user""s behavior.
The user information collected by the search engines and the user profile information built by the search engines, which may be sensitive in nature and contain confidential information, may then be distributed or even sold by providers of search engines to entities such as advertising agencies, government agencies, insurance companies, business entities, and the like. This may result in the user being subjected to unsolicited Spam mail messages, unwelcome advertisements, credit card fraud, mail fraud, banking fraud, and other unwelcome activities. As a result, the use of a conventional search engine executing on a remote server can severely compromise a user""s privacy and security. Further, since the information collected by the search engines is typically stored on a server system which is located at a remote location from the user""s computer system, the user has very little control on the collection and dissemination of the information.
In light of the above, there is a need for techniques which allow a user to identify and access documents of interest to the user (e.g., web pages in a Web environment) without compromising the user""s privacy and security.
According to the present invention techniques are provided which allow a user to identify and access documents (e.g., web pages) of interest to the user in a network environment without compromising the user""s privacy. More particularly, according to an embodiment of the present invention, the user system receives index information which is used to identify documents of interest to the user at the user system itself without having to have to provide any user-related information to search engines executing on remote servers. The present invention preserves user privacy by controlling and minimizing the communication and collection of user-related information from user system. Merely by way of example, the present invention allows users to identify and access web pages from web servers coupled to a communication network such as the Internet without compromising user privacy.
According to an embodiment of the present invention, techniques are provided which enable a user system to access a first document from a plurality of documents stored by a plurality of web servers. In this embodiment, an index server determines index information to be communicated to the user system, the index information comprising information identifying the plurality of documents stored by the plurality of web servers and information related to the contents of the plurality of documents. The index server communicates the index information to the user system. The user system is configured to identify a first set of documents from the plurality of documents using the index information received from the index server, the first set of documents including the first document, to receive a signal indicating selection of the first document from the first set of documents, and responsive to the signal, to access the selected first document from a web server storing the first document. According to the teachings of the present invention, the user system is configured to identify the first set of documents substantially free from interaction with the index server and the plurality of web servers.
According to another embodiment, the present invention provides techniques for identifying and accessing a first document from a plurality of documents stored by a plurality of servers using a data processing system. In this embodiment, the data processing system is configured to receive index information from an index server, the index information comprising information identifying the plurality of documents stored by the plurality of servers and information related to the contents of the plurality of documents. The data processing system is configured to identify a first set of documents from the plurality of documents using the index information received from the index server, the first set of documents including the first document. According to the teachings of the present invention, the data processing system is configured to identify the first set of documents substantially free from any interaction with the plurality of servers and the index server. The data processing system is also configured to receive a signal indicating selection of the first document from the first set of documents, and to access the selected first document from a server storing the first document in response to the signal.
Various additional objects, features and advantages of the present invention can be more fully appreciated with reference to the detailed description and accompanying drawings that follow.