In data searching and gathering over the Internet, using the structure of the Web-page to extract data is referred to as scrapping. A scraper includes certain rules that capture the structure of Web pages. A scraper browses web-pages on a Web site and uses rules to extract specific data from the web-pages. In a client-server model, this technique is applied in two configurations: (1) The entire scrapping application is installed on the client device, or (2) The scrapping application resides on a Web server, wherein the Web server extracts data and provides data to the client device.
However, once the Web site changes the structure of the Web pages, new rules that capture the new structure have to be used to make the scraper function correctly. In the first configuration above, the client device (e.g., PC) user has to update the scraper application on the client. This is a download update and install paradigm.
Such a paradigm works for PCs since they are homogenous compared to consumer electronics (CE) devices. Homogeneity in PCs refers to the fact that the variety of operating systems and hardware architectures available for PCs is small compared to that of CE devices. Also a PC being a general purpose device with large amounts of persistent storage, main memory and processing power allows the installations and update of limitless number of programs (e.g., scrapers).
On the other hand, CE devices are heterogeneous and are designed for specific use. CE devices also have limited storage, memory and computational power. This makes it difficult to apply the download update and install paradigm for CE devices. Installation, as is done for PCs, is not suitable for CE devices. The installation or update process in PCs makes use of mouse and keyboard. The process sometime assumes that the device has a file system that can be manipulated by the installer or updater. This assumption does not always hold true of most CE devices.
Further, for the second configuration above, a significant amount of infrastructure has to be set-up on the server side to make the service available.