The present invention generally relates to searching documents and more specifically to resiliently searching for a desired element in a document.
With the advent of the internetwork of networks generally referred to as the Internet, the amount of information (“content”) and the number of web pages on which the content is found has increased enormously. The Internet has allowed users access to information that was previously inaccessible or hard to find. However, the increase in the number of web pages makes finding content more difficult.
One way a user system may easily access content from a variety of web pages is to create a portal page, where a user's desired content is aggregated from various web pages to their portal page. Typically, a user desires a snippet of content from a variety of web pages. A portal gathers the desired content from the web pages and displays the aggregated content in the portal page. Thus, multiple web pages do not need to be accessed and users may just access their portal page to receive the desired content.
The task of aggregating content the user desires is difficult because of the ephemeral nature of the Internet. Web pages may be static information stored on a web server and served upon request or may be dynamic pages that are generated in whole or part in response to a request for the page. Whether static or dynamic, a given web page may change from time to time, such as daily or in real-time. For any given web page, the content, structure, or layout may change, often at irregular times and normally without notice to subsequent requestors of the page, thus making automated querying of the pages difficult.
In one example, a user may desire a snippet of a page that shows the top news stories of the day, but the layout of the page including the snippet and the location of the snippet within that page may change over time. Typically, search methods that aggregate snippets of content rely heavily on the hierarchical structure of software code used for developing the web pages. Thus, relatively small changes in the web page will cause a search query to fail and thus, the desired snippet will not be found. Additionally, defining a search query required to gather the snippets of content may become complex and hard to define, and may require knowledge of the code structure for the web page. Thus, search methods used by portals may not be able to define what content is desired for a user. Accordingly, the portal may not be able to display a user's desired content.