The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs. As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated and complex computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Years ago, computers were isolated devices that did not communicate with each other. But, today computers are often connected in networks, such as the Internet or World Wide Web, and a user at one computer, often called a client, may wish to access information at multiple other computers, often called servers, via a network. Searching is the primary mechanism used to retrieve information from the Internet. Users typically search the web pages of the Internet using a search engine, such as AltaVista, Yahoo, or Google. These search engines index hundreds of millions of web pages and respond to tens of millions of queries every day.
To accomplish this formidable task, search engines typically employ three major elements. The first is an agent, often called a spider, robot, or crawler. The crawler visits a web page, reads it, and then follows links to other pages within the site. The crawler typically returns to the site on a regular basis, such as every month or two, to look for changes. The crawler stores the information it finds in the second part of the search engine, which is the index. Sometimes new pages or changes that the crawler finds may take some time to be added to the index. Thus, a web page may have been “crawled” but not yet “indexed.” Until the web page has been added to the index, the web page is not available to those searching with the search engine. Search engine software is the third part of a search engine. This is the program that interrogates the millions of pages recorded in the pre-created index to find matches to a search and ranks them in order of what the program believes is most popular, which is often referred to as the page rank. Page rank is extremely important to the user because a simple search using common terms may match thousands or even tens of thousands of pages, which would be virtually impossible for the user to individually sort through in an attempt to determine which pages best serves the user's needs.
In order to aid the user, search engines typically determine relevancy by following a set of rules, which are commonly known as a page-ranking algorithm. Exactly how a particular search engine's algorithm works is usually a closely-kept trade secret. But, all major search engines follow the same generally-accepted methods described below. One of the main methods in a page-ranking algorithm involves the location and frequency of keywords on a web page, which is known as the location/frequency method. For example, page-ranking algorithms often assume that terms appearing in a title control-tag are more relevant than terms appearing in other locations in the page. Further, many page-ranking algorithms will also determine if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words at the beginning. Frequency of terms is the other major factor that page-ranking algorithms use to determine relevancy. The page-ranking algorithm analyzes how often keywords appear in relation to other words in a web page and deems more relevant those with a higher frequency.
In addition to the location/frequency method, which is an on-the-page ranking criteria, search engines also typically make use of off-the-page ranking criteria. Off-the-page criteria are those that use data external to the page itself. Chief among these is link analysis. By analyzing how pages link to each other, the page-ranking algorithm attempts to determine the relative importance of the page with respect to other pages. For example, page-ranking algorithms typically assume that a page to which many other pages link is an important page and deserves to have a high page rank. In addition, some page-ranking algorithms use recursive page-ranking where the rank of the pages that link to the linked-to page also factor into the ranking of the linked-to page.
A problem with link analysis occurs with searches that attempt to find pages that are local to a particular area, such as a city. For example, when a user enters search terms for a service (e.g., a home builder, a plumber, or a real estate agent) and a city name, the pages with a high page rank are often not local to the searched-for city. For example, the first pages returned are not those for home builders who have an address within the city. Instead, the pages with a high page rank are referral services that merely mention the searched-for city, along with many other cities, and advertise that the referral service can recommend a home builder in the searched-for city. These referral service pages have a high page ranking because they are partners with many services around the country or around the world, who all link to the referral-service page. These many cross links cause the referral service to have a high page ranking that dominates any page ranking that the local service might have.
Thus, a need exists for a better technique for searching pages that are local to an area.