The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. Users navigate these pages by means of computer software programs commonly known as Internet browsers. The vastness of the unstructured WWW causes users to rely primarily on Internet search engines to retrieve information or to locate businesses. These search engines use various means to determine the relevance of a user-defined search to the information retrieved.
The authors of web pages provide information known as metadata, within the body of the hypertext markup language (HTML) document that defines the web pages. A computer software product known as a web crawler systematically accesses web pages by sequentially following hypertext links from page to page. The crawler indexes the pages for use by the search engines using information about a web page as provided by its address or Universal Resource Locator (URL), metadata, and other criteria found within the page. The crawler is run periodically to update previously stored data and to append information about newly created web pages. The information compiled by the crawler is stored in a metadata repository or database. The search engines search this repository to identify matches for the user-defined search rather than attempt to find matches in real time.
Typically, each search result rendered by the search engine includes a list of individual entries that have been identified by the search engine as satisfying the user's search expression. Each entry or “hit” includes a hyperlink that points to a Uniform Resource Locator (URL) location or web page. In addition to the hyperlink, certain search result pages include a short summary or abstract that describes the content of the web page.
A common technique for accessing textual materials on the Internet is by means of a “keyword” combination, generally with Boolean operators between the words or terms, where the user enters a query comprised of an alphanumeric search expression or keywords. In response to the query, the search engine sifts through available web sites to match the words of the search query to words in a metadata repository, in order to locate the requested information.
This word match based search engine parses the metadata repository to locate a match by comparing the words of the query to indexed words of documents in the repository. If there is a word match between the query and words of one or more documents, the search engine identifies those documents and returns the search results in the form of HTML pages.
This type of search engine is thus very sensitive to the words selected for the query. The terminology used in a query reflects each individual user's view of the topic for which information is sought. In other terms, the content of the query and the resulting response from word based search engine, is highly dependent upon individual users' expression of the query terms, and different users may obtain different search results when searching for the same or similar information. For example, to locate information about medical services, a first user may compose the query “doctors and services”, and a second user may compose the query “hospital and medical and research”.
Furthermore, not only is the quantity of the WWW material increasing, but the types of digitized material are also increasing. For example, it is possible to store alphanumeric texts, data, audio recordings, pictures, photographs, drawings, images, video and prints as various types of digitized data. However, such large quantities of materials is of little value unless it the desired information is readily retrievable. While certain techniques have been developed for accessing specific types of textual materials, these techniques are at best moderately adequate for accessing graphic, audio or other specialized materials. Consequently, there are large bodies of published materials that still remain inaccessible and thus unusable or significantly underutilized.
Attempts have been made to construct a search and retrieval system that is not highly dependent upon the exact words chosen for the query, and that generates a similar response for different queries that have similar meanings. An exemplary attempt is illustrated in U.S. Pat. No. 5,953,718 to Wical, titled “Research Mode for a Knowledge Base Search and Retrieval System”.
The Wical patent describes a search and retrieval system that generates a research document which infers an answer to a query from multiple documents. The search and retrieval system includes point of view gists for documents to provide a synopsis for a corresponding document with a slant toward a topic. To generate a research document, the search and retrieval system processes a query to identify one or more topics related to the query, selects document themes relevant to the query, and then selects the point of view gists, based on the document themes, that have a slant towards the topics related to the query. A knowledge base, which includes categories arranged hierarchically, is configured as a directed graph to links those categories having a lexical, semantic or usage association. Through use of the knowledge base, an expanded set of query terms are generated, and research documents are compiled that include the point of view gists relevant to the expanded set of query terms. A content processing system identifies the themes for a document, and classifies the document themes in categories of the knowledge base.
However, this search and retrieval system and similar other conventional systems rely on the user entering alphanumeric keyword queries, and are thus still prone to rendering ineffective and inaccurate results that might not fully satisfy the user's need. For example, if a user is searching for a leather purse with a specific design, and a peculiar color she is incapable to express in terms of an alphanumeric query, the user will endure the aggravation of successive compound searches to locate the desired purse. Even then, the user desiring to obtain a comparative price report will face a tedious task collecting the desired information.
Attempts have been proposed to facilitate purchases over the Internet. One such attempt is described in U.S. Pat. No. 6,016,504 to Arnold et al., titled “Method and System for Tracking the Purchase of a Product and Services over the Internet”.
The Arnold et al. patent describes a method for establishing and maintaining a virtual outlet (“VO”) relationship on the Internet between an entity that controls and manages a web site constituting a VO and a merchant that controls and manages a different web site. The VO presents a series of VO web pages to customers that contain descriptive information about products from one or more merchants. Customers can link through the VO web pages directly to a merchant web page provided to the customer computer by the merchant computer for the purpose of obtaining more detailed information about the product and for ordering the product. When the customer has finished ordering a product, the customer computer returns to a VO web page. To the customer, it appears that the entire ordering process is conducted entirely within the VO web pages. The merchant then credits the VO for the sale of the product to the customer, charges the purchase to the customer, and sends the ordered product to the customer.
However, these attempted solutions still rely on conventional keyword searching with limited input from the users. Further, these solutions do not allow for the automatic formulation of queries to improve the users' search capability. There is therefore a still unsatisfied need for a system and method that address the concerns with conventional search and marketing strategies, and that significantly increase the users' input choices and improve the search efficiency.