The present invention relates to Internet based information retrieval. More particularly, the present invention relates to systems and methods for concept-based Internet searching.
The Web has blossomed as a means of access to a variety of information by remote individuals. The Web is an open system in that virtually any individual or organization with a computer connected to a telephone line may use the Web to present information concerning almost any subject. To accomplish this, the Web utilizes a body of software, a set of protocols, and a set of defined conventions for presenting and providing information over the Web. Hypertext and multimedia techniques allow users to gain access to information available via the Web.
Users typically operate personal computers (PC's) executing browser software to access information stored by an information provider computer. The user's computer is commonly referred to as a client, and the information provider computer is commonly referred to as a Web server. The browser software executing on the user's computer requests information from Web servers using a defined protocol. One protocol by which the browser software specifies information for retrieval and display from a Web server is known as Hypertext Transfer Protocol (HTTP). HTTP is used by the Web server and the browser software executing on the user's computer to communicate over the Internet.
Web servers often operate using the UNIX operating system, or some variant of the UNIX operating system. Web servers transmit information requested by the browser software to the user's computer. The browser software displays this information on the user's computer display in the form of a Web page. The Web page may display a variety of text and graphic materials, and may include links that provide for the display of additional Web pages. A group of Web pages provided by a common entity, and generally through a common Web server, form a Web site.
A specific location of information on the Internet is designated by a Uniform Resource Locator (URL). A URL is a string expression representing a location identifier on the Internet or on a local Transmission Control Protocol/Internet Protocol (TCP/IP) computer system. The location identifier generally specifies the location of a server on the Internet, the directory on the server where specific files containing information are found, and the names of the specific files containing information. Certain default rules apply so that the specific file names, and even the directory containing the specific files, need not be specified. Thus, if a user knows that specific information desired by the user is located at a location pointed to by a URL, the user may enter the URL on the user's computer in conjunction with execution of the browser software to obtain the desired information from a particular Web server. Users, or the browser software executing on the user's computer, must always at a minimum know the Internet address portion of the URL for a particular Web server.
However, often the user does not know the URL of a site containing desired information. Even if the user once knew the proper URL, the user may have forgotten, mistyped, or otherwise garbled a URL for a specific location, as URL's can often be lengthy strings with a variety of special characters. To allow increased ease in locating Web sites containing desired information, search engines identifying Web sites likely to contain the desired information are widely available. A search engine using a well constructed search may often very quickly allow a user to quickly and accurately locate Web sites with desired information. Due to the multiplicity of Web sites, and indeed due to the unstructured nature of the Web, a poorly constructed search may make locating a Web site with the desired information virtually impossible.
An inability of a user to quickly and easily locate a Web site poses difficulties with respect to some commercial uses of the Web. Commercial entities have found the Web a useful medium for the advertisement and sale of goods and services. A variety of commercial entities have created home pages for the commercial entity as a whole, and for particular products sold and marketed by the commercial entity. The effectiveness of advertising in such a way on the Web is dependent on users accessing a commercial entity's Web site and viewing the information located there. The user must undertake two critical actions for this to occur. The user must first access a commercial entity's Web site, and then the user must actually view the material displayed there. A user who desires to view a Web page advertising or selling a particular product, but who is a poor Web searcher, may represent a lost sale of the product.
The huge amounts of poorly accessible information frustrate consumers, analysts and content providers alike. Existing navigation devices often fail to connect people and content, limiting the growth of Web-based information services and e-commerce.
What is needed is an improved method that allows a user to easily obtain information via the Web. The method should allow a user to use natural language, and search based on idea concepts, rather than strict Boolean strings.