The present invention relates to techniques for generating a script to scrape a web page based on actions of a user while navigating through the web page.
Online account systems (such as Internet Banking) are increasingly popular. These websites allow easy access to account balance and transaction information for a single online merchant or financial institution. However, for a complete understanding of an individual's or a business' financial position, data from multiple merchants and financial institutions may need to be aggregated.
Several existing web services provide aggregated financial information that is collected from online accounts using Open Financial Exchange (OFX). However, not all of the websites that host online accounts support OFX. To address this problem, a central server can be used to aggregate the financial information. Using customer credential information (such as a username and password) to login to an online account on a website, this server can collect or scrape the appropriate data from the returned formatted web pages, and thus, can aggregate the financial information.
Typically, the financial information is collected from websites using scraping scripts. A scraping script usually includes commands that parse and interact with one or more web pages via a network, such as the Internet. For a scraping script to function properly, it is typically designed based on the details of a given web page (such as the web page flows to login and access data), so that the relevant customer financial information can be located and collected. For example, a scripting engineer may manually analyze the financial institution's web page to determine the sequence of commands needed to navigate and obtain specific data from this web page. Therefore, creating a scraping script can be time-consuming and expensive.
In addition, if the financial institution modifies a particular website and/or if there are changes to a customer's online account, a scraping script may not function correctly. When this occurs, a scripting engineer typically has to access the website to duplicate the exact problem that the server encountered, and then update the scraping script accordingly. This process is also expensive and can be time-consuming.