The Internet has emerged as a critical communication infrastructure, carrying traffic for a wide range of important scientific, business and consumer applications. Most large businesses spend significant time and effort in developing and maintaining a customer friendly website. In general, these websites are well designed and may contain many aspects of the core business related to a company. An individual, e.g., a salesperson, or a vendor, with a need to understand the core business of a company and competing companies can get educated using these websites. Alternatively, a user may also want to acquire specific information from these websites, but the user must expend a large amount of time to peruse through the numerous pages to detect the pertinent information that is important to the user. However, given how vast and extensive information is on various websites, learning all about specific businesses of one or more companies is very time consuming.
Websites can be deployed in two ways: static and dynamic. Static pages provide information to the user but do not accept information from the user, whereas dynamic pages have an interface to a user such as a form to accept information from the user and can respond to requests made by the user. Currently, network service providers and enterprise network operators provide connectivity but do not have the ability to assist the user in terms of detecting and extracting the essential data that a user may be interested in. For example, a user may locate the website of a business that has a particular product using a search engine. The search engine explores the internet and collects all the static pages that match the search criteria. Upon locating the target website, the user then requests the location of the closest store by entering a zipcode on a form. The dynamic web page responds to the user with the store location but also bombards the user with a lot of excess information including advertisements and other menus that the user does not need. The user then has to sort through the web page's response to find the pertinent information. This process is time consuming and often annoying to users.
Consequently, a need exists for a method and apparatus for detecting and extracting information from a dynamically generated website.