There are many content providers providing a vast quantity of information which can be accessed via various kinds of global or local communications networks (the Internet, the World Wide Web, local area networks and the like). The available information includes a variety of content types, such as photos, video, audio and the like, and relates to a wide range of topics, such as but not limited to news, weather, traffic, entertainment, finance and the like. The information can be accessed using a wide range of electronic devices such as desktop computers, laptop computers, smartphones, tablets and the like.
Generally, in order to access a given piece of information via a communication network, a user can locate the given piece of information by providing the specific address where that information is located, or by conducting a search, using a search engine, to locate the given piece of information. The latter is particularly suitable in circumstances, where the user does not know the exact address where the information is stored. When using a search engine to locate a given piece of information, the user generally desires to locate the most relevant results and to locate the results relatively quickly.
For search engine providers, therefore, it is important to be able to fulfil these objectives (relevancy of results, speed, etc.) and to do so without using extensive resources such as bandwidth, and processing capacity. In a typical search engine, a crawler is used to index various resources on the Internet. In order to index a page, the crawler needs to “visit” the web page. There are various means available for the crawler to “learn” which pages to visit—i.e. which URL address to hit in order to visit the page and index it. Most crawlers use links within documents and web resources to find URLs on the Internet to visit. However, there are several web resources that have too many “potential pages” (i.e. combinatory explosion as it is known in the art). For example, on a web site for booking air tickets, the combination of potential pages is enormous—for each departure point, destination point, date, price range—each combination is assembled at a particular URL. It is impractical, and hugely resource intensive for a search engine provider to “crawl” and index all available resources. There is thus a need for a faster and more efficient method and system for providing relevant search results to a user.