According to Wikipedia (2016), a heuristic is a technique designed for solving a problem more quickly when classic methods are too slow. A heuristic may also be advantageously employed to find an approximate solution when classic methods fail to find any exact solution. Thus a heuristic produces a solution in a reasonable time frame that is ‘good enough’ for the problem at hand. One way of achieving the kind of computational performance gain expected of a heuristic is just solving a simpler problem whose solution is also a solution (even if only approximate) to the initial problem. In computer searching, a heuristic acts to select branches (of the search) more likely to produce meaningful outcomes than other branches.
One of the continuing problems in computer searching (which is typically more sifting than searching) through vast amounts of data for relevant and useful data to retrieve is the sheer volume of ‘hits’ or returns possible on any given key word search. (According to the official Google blog, the number of unique URLs indexed by them passed the one trillion mark in 2008 and billions more pages have been added every day since then. Interestingly, they also say that “size of the web really depends on your definition of what's a useful page” and that there is therefore no exact answer.)
Clearly, “useful” as a descriptor (like its cognate “meaningful”) is determined almost solely by the person for whom the search is being conducted, and not by any scheme or combination of rankings and relevancy algorithms employed by a favorite search engine. The problem is that for many users, the results returned from a search are not only not terribly useful, they are downright useless.
Most current conventional search and retrieval is done on a key word or key phrase basis, and proprietary ‘relevance’ algorithms are employed to rank the retrieved resource URLs primarily in terms of number of occurrences of the searched-for term on the returned page, position of the term on that page, freshness of the results, quality of the website, age of the domain, and other like metrics. There is equivocal discussion about whether things known as linkage or back-linkage contribute to relevance, but what appears to have wide agreement is that such cross-linking is related, if at all, only to topical relevance. In other words, key word and key phrase returns are really just ‘voting’ by the search engine.
It should be noted that Google and other major search engines use something called natural language processing or NLP (typically executed on a search engine server) which turns normal English questions or phrases entered in a browser into boolean strings that are displayed, if at all, on the browser's address bar without further input by the user. It is for the most part all done invisibly to the eye of the user or querier. No computations are displayed. The point is that the user does not enter any boolean terms, and generally does not have to.
So the question of whether ‘relevance’ as determined by a search engine can produce a particular relevance for a particular user is often a mystery not solved until the hundreds of pages of hits are presented for the user's own review. It may be the best that can be reasonably expected however from current search intelligence. What is needed, for at least some users or a class of users, is a way to heuristically skip the conventional keyword or key phrase tallying and associated relevance algorithm calculations, and go directly to resources already known to be both useful and of particular relevance to the user.
Reviewing selected aspects of computerized searching and resource acquisition, conventionally, a Uniform Resource Identifier (URI) is a string of characters used to locate and identify a specific resource on a network. There are two classifications of URIs: Uniform Resource Name (URN), which is simply a unique identifier (or a particular resource; and Uniform Resource Locator (URL), which specifies the address (location, i.e. domain and or path) of a resource on a particular network (e.g., the World Wide Web), and occasionally also a means for retrieving that resource.
It may be useful to consider the following conventional example URI:http://example.com/city/development?name=building#electricalin which:http:// (Hyper Text Transfer Protocol) is referred to as the scheme or protocolexample.com is referred to as the authority or domaincity/development is referred to as the path (within the identified domain)
(all of the above sometimes also referred to collectively as the URL though properly speaking, the sum of the above components is the URI)
?name=building is an optional so-called query portion of the URI
#electrical is an optional so-called fragment portion of the URI. (See FIG. 1.)
According to the HTTP scheme, the domain is parsed from the URI by the client browser and, with reference to a DNS, the true domain is identified and a session connection established. The path is passed to the connected domain server which uses the path to identify and navigate to a particular part of the storage managed by the server.
The optional fragment is separated from the domain and path (and from the query, if present) by a hash (#). The fragment contains a fragment identifier providing direction to an internal secondary location in the particular primary resource either identified by the preceding URI or found by the query if present. When the primary resource is an HTML document, the fragment is typically an ID attribute of a specific element inside the primary resource or a section heading internal to the primary resource.
The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side (browser-side) with no participation from (and no parsing by) the web server and it directs the client web browser to scroll this element or heading into view after the primary resource is returned by the server to the browser.
The optional query is typically separated from the domain and path by a question mark (?) and contains a query string of non-hierarchical data. Unlike Fragments (preceded by the hash (#) mark), a query is seldom if ever parsed or acted upon by the client browser. The server at the connected domain, using conventional logic and parsing software generally well known to those skilled in the art, identifies and handles all queries and query parsing. Generally, the parsed query is then used by the server-side software, perhaps along with a server-side index of all data resources at the specified data location on the server, to create and return a list of all resources (for example, web pages) that match the elements of the query.
It is important to note that while queries can be a part of a URI, they are not input by users as part of the URI, but created by programming run on the client or on a server. Typically buttons clicked by a user or natural language entered on a search bar or a combination of those two and or other well known components cause client agents (browsers) to run code that creates the query component of the URI.
Generally, URIs must specify the protocol (the rules for the transmission of information) by which the communicating entities in the network abide. For example, HTTP is a client-server protocol by which a client, e.g., a web browser, exchanges messages (requests and responses) with hosting servers (e.g., “the Internet”). In a typical HTTP URI (e.g., https://www.nyu.edu), the browser (via DNS) resolves the domain name, nyu.edu, to an IP address which is a unique identifier for the server, and with which the browser intends to communicate. The URI https://ww.nyu.edu/students.html refers more specifically to a location on this server, namely the resource at the “/students.html” address on the nyu.edu server.
Occasionally a computer user would like resources that are returned by the server to be processed in some way client-side (e.g., and continuing the above example, by the web browser on the client, and not by the server hosting the nyu.edu website). To this effect, URIs sometimes provide the hash mark fragment identifier (#). The hash mark, introduced at the end of a URL, typically identifies a portion of a primary document returned by the web server.
For example, if the URI in question is https://www.nyu.edu/students.html#awards, the web browser operates on the returned resource, the students.html document, to then bring to focus for the user the “awards” portion denoted in the students.html document.