For almost as long as computers have existed, their designers and users have sought improvements to the user interface. Especially as computing power has increased, a greater portion of the available processing capacity has been devoted to improved interface design. Recent examples have been Microsoft Windows variants and Internet web browsers. Graphic interfaces provide significant flexibility to present data using various paradigms, and modern examples support use of data objects and applets. Traditional human computer interfaces have emphasized uniformity and consistency, thus, experienced users had a shortened learning curve for use of software and systems; while novice users often required extensive instruction before profitable use of a system. More recently, intuitive, adaptable and adaptive software interfaces have been proposed, which potentially allow faster adoption of the system by new users but which requires continued attention by experienced users due to the possibility of interface transformation.
While many computer applications are used both on personal computers and networked systems, the field of information retrieval and database access for casual users has garnered considerable interest. The Internet presents a vast relatively unstructured repository for information, leading to a need for Internet search engines and access portals based on Internet navigation. At this time, the Internet is gaining popularity because of its “universal” access, low access and information distribution costs, and suitability for conducting commercial transactions. However, this popularity, in conjunction with the non-standardized methods of presenting data and fantastic growth rate, have made locating desired information and navigation through the vast space difficult. Thus, improvements in human consumer interfaces for relatively unstructured data sets are desirable, wherein subjective improvements and wholesale adoption of new paradigms may both be valuable, including improved methods for searching and navigating the Internet.
Generally speaking, search engines for the World Wide Web (WWW, or simply “Web”) aid users in locating resources among the estimated present one billion addressable sites on the Web. Search engines for the web generally employ a type of computer software called a “spider” to scan a proprietary database that is a subset of the resources available on the Web. Major known commercial search engines include such names as Yahoo, Excite, and Infoseek. Also known in the field are “metasearch engines,” such as Dogpile and Metasearch, which compile and summarize the results of other search engines without generally themselves controlling an underlying database or using their own spider. All the search engines and metasearch engines, which are servers, operate with the aid of a browser, which are clients, and deliver to the client a dynamically generated web page which includes a list of hyperlinked universal resource locators (URLS) for directly accessing the referenced documents themselves by the web browser.
A Uniform Resource Identifier (RFC 1630) is the name for the standard generic object in the World Wide Web. Internet space is inhabited by many points of content. A URI (Uniform Resource Identifier is the way you identify any of those points of content, whether it be a page of text, a video or sound clip, a still or animated image, or a program. The most common form of URI is the Web page address, which is a particular form or subset of URI called a Uniform Resource Locator (URL). A URI typically describes: the mechanism used to access the resource; the specific computer that the resource is housed in; and the specific name of the resource (a file name) on the computer. Another kind of URI is the Uniform Resource Name (URN). A URN is a form of URI that has “institutional persistence,” which means that its exact location may change from time to time, but some agency will be able to find it.
The structure of the World Wide Web includes multiple servers at distinct nodes of the Internet, each of which hosts a web server which transmits a web page in hypertext markup language (HTML) or extensible markup language (XML) (or a similar scheme) using the hypertext transport protocol (http). Each web page may include embedded hypertext linkages, which direct the client browser to other web pages, which may be hosted within any server on the network. A domain name server translates a top-level domain (TLD) name into an Internet protocol (IP) address, which identifies the appropriate server. Thus, Internet web resources, which are typically the aforementioned web pages, are thus typically referenced with a URL, which provides the TLD or IP address of the server, as well a hierarchal address for defining a resource of the server, e.g., a directory path on a server system.
A hypermedia collection may be represented by a directed graph having nodes that represent resources and arcs that represent embedded links between resources. Typically, a user interface, such as a browser, is utilized to access hyperlinked information resources. The user interface displays information “pages” or segments and provides a mechanism by which that user may follow the embedded hyperlinks. Many user interfaces allow selection of hyperlinked information via a pointing device, such as a mouse. Once selected, the system retrieves the information resource corresponding to the embedded hyperlink.
One approach to assisting users in locating information of interest within a collection is to add structure to the collection. For example, information is often sorted and classified so that a large portion of the collection need not be searched. However, this type of structure often requires some familiarity with the classification system, to avoid elimination of relevant resources by improperly limiting the search to a particular classification or group of classifications. Another approach used to locate information of interest to a user, is to couple resources through cross-referencing. Conventional cross-referencing of publications using citations provides the user enough information to retrieve a related publication, such as the author, tide of publication, date of publication, and the like. However, the retrieval process is often time-consuming and cumbersome. A more convenient, automated method of cross-referencing related documents utilizes hypertext or hyperlinks. Hyperlink systems allow authors or editors to embed links within their resources to other portions of those resources or to related resources in one or more collections that may be locally accessed, or remotely accessed via a network. Users of hypermedia systems can then browse through the resources by following the various links embedded by the authors or editors. These systems greatly simplify the task of locating and retrieving the documents when compared to a traditional citation, since the hyperlink is usually transparent to the user. Once selected, the system utilizes the embedded hyperlink to retrieve the associated resource and present it to the user, typically in a matter of seconds. The retrieved resource may contain additional hyperlinks to other related information that can be retrieved in a similar manner.
A well-recognized problem with existing search engines is the tendency to return hits for a query that are so incredibly numerous, sometimes in the hundreds, thousands, or even millions, that it is impractical for user to wade through them and find relevant results. Many users, probably the majority, would say that the existing technology returns far too much “garbage” in relation to pertinent results. This has lead to the desire among many users for an improved search engine, and in particular an improved Internet search engine.
In response the garbage problem, search engines have sought to develop unique proprietary approaches to gauging the relevance of results in relation to a user's query. Such technologies employ algorithms for either limiting the records returned in the selection process (the search) and/or by sorting selected results from the database according to a rank or weighting, which maybe predetermined or computed on the fly. The known techniques include counting the frequency or proximity of keywords, measuring the frequency of user visits to a site or the persistence of users on that site, using human librarians to estimate the value of a site and to quantify or rank it, measuring the extent to which the site is linked to other sites through ties called “hyperlinks” (see, Google.com and Clever.com), measuring how much economic investment is going into a site (Thunderstone.com), taking polls of users, or even ranking relevance in certain cases according to advertiser's willingness to bid the highest price for good position within ranked lists. As a result of relevance testing procedures, many search engines return hits in presumed rank order or relevance, and some place a percentage next to each hit which is said to represent the probability that the hit is relevant to the query, with the hits arranged in descending percentage order.
However, despite the apparent sophistication of many of the relevance testing techniques employed, the results typically fall short of the promise. Thus, there remains a need for a search engine for uncontrolled databases that provides to the user results, which accurately correspond the desired information sought.
Therefore, the art requires improved searching strategies and tools to provide increased efficiency in locating a user's desired content, while preventing dilution of the best records with those that are redundant, off-topic or irrelevant, or directed to a different audience.