A. Field of the Invention
This invention relates generally to methods for browsing network information and, more particularly, to a method for organizing information from network documents in a conceptual index to facilitate browsing.
B. Description of the Related Art
The Internet, fueled by the phenomenal popularity of the World Wide Web (WWW or Web), has exhibited exponential growth over the past few years. In the case of the WWW, the ease of self-publication has helped generate an estimated 50-120 million documents.
To access all this information, users need only standard computer equipment, such as a home personal computer with a display and modem, and an Internet connection. Several types of Internet connections are available, including connections through Internet Service Providers (ISPs). To use an Internet connection from an ISP, for example, the user dials into a computer at the ISP""s facility using the modem and a standard telephone line. The ISP""s computer in turn provides the user with access to the Internet.
Through this Internet connection, the user accesses information on the Web using a computer program called a xe2x80x9cWeb browser,xe2x80x9d such as the Netscape Navigator(trademark) from Netscape Communications Corporation. To accomplish this, the user gives the Web browser a Uniform Resource Locator (URL) for an object on the Internet, for example, a document containing information of interest. The document is referred to as a xe2x80x9cWeb page,xe2x80x9d and the information contained in the Web page is called xe2x80x9ccontent.xe2x80x9d Web pages often refer to other Web pages using xe2x80x9chypertext linkxe2x80x9d or xe2x80x9chyperlinksxe2x80x9d that include words or phrases representing the other pages in a form that gives the browser a URL for the corresponding Web page when a user selects a hyperlink. Hyperlinks are made possible by building Web pages using the Hypertext Markup Language (HTML).
The URL identifies a specific computer on the Internet, called a xe2x80x9cWeb Server,xe2x80x9d and, more particularly, the location of a Web page located on the Web Server. The Web browser retrieves the Web page and displays it for the user.
The virtually instantaneous and cost-free publication inherent in the WWW leads to problems with information overload. Search engines help users locate specific information on the Web; however, there is time typically only for keyword searches. As a result, one keyword search engine, Alta Vista(trademark) from Digital Equipment Corporation, returns nearly 90,000 hits or URLs for a search for the word xe2x80x9czoology.xe2x80x9d Thus, the user must review the long list of URLs and access many of the corresponding Web pages to find those that contain sought-after information. This demonstrates the relative lack of utility associated with using keyword search engines available on the Internet.
Researchers are, however, experimenting with intelligent agents to facilitate browsing by xe2x80x9clearningxe2x80x9d the user""s interests based on prior sessions surfing the Web. Two better known research prototypes include WebWatcher and Letizia.
WebWatcher is a server-based interface agent that resides between the user and the Web. Any user running a browser can enter the system simply by typing a topic of interest in WebWatcher""s FrontDoor page. WebWatcher replaces the current page with a modified page that embeds WebWatcher command menus and enables WebWatcher to follow the user browsing the Web; and presents the user with a highlighted listing of recommended hyperlinks. Because WebWatcher is a server-based system it logs data from thousands of users to xe2x80x9ctrainxe2x80x9d itself and refine its search knowledge. If a user signals that a particular search was successful, WebWatcher annotates each explored hyperlink with user keywords, adding to the knowledge base from previous sessions. WebWatcher uses information retrieval techniques based on the frequency of weighted terms and documents for all hyperlinks on a page, as well as user statistics associated with those links.
Letizia is a client-side personal agent and thus resides on the computer running the user""s browser, as opposed to on a separate server. Letizia collects information about the user""s browsing habits and tries to anticipate additional items of interest. Making inferences about user interests and using various heuristics, Letizia conducts a resource-limited search of the Web during idle times looking for promising links to suggest when prompted.
While both prototypes try to anticipate a user""s interest in accessing certain information, neither addresses the problem of organizing available information on the Web to facilitate browsing. There is therefore a need for a system that organizes or indexes available network information in a structure that permits users to pinpoint the location of information likely to be of interest.
Accordingly, systems and methods consistent with the present invention substantially obviate one or more of the problems due to limitations, shortcomings, and disadvantages of the related art by incrementally indexing conceptual information in network documents, and integrating the information in a manner usable by the user in a browsing session.
Consistent with the present invention, a method for accessing information from a network comprises the steps, performed by a processor, of: receiving a document from the network containing content; extracting conceptual information from the content of the document; analyzing the extracted conceptual information semantically; and assembling an index of the extracted conceptual information that reflects relations based on semantic data in a stored lexicon.
Both the foregoing general description and the following detailed description are exemplary and explanatory only, and merely provide further explanation of the claimed invention.