When integrating or reusing the existing web-based systems, sometimes the existing web-based systems are not well designed for external systems to access. There is no API layer for others to interact with. Worse, the presentation logics, business logics and data logics are usually mixed up without good separation of concerns. So the only way for others to interact with such systems is through web pages, HTML/XHTML or other markup language, over HTTP/HTTPS like a web browser. With such a situation, how could the human-machine interaction approach be turned into a machine-machine interaction approach so as to be able to expose their functions and data as reusable services for application integration?
There are some existing solutions for such a scenario;                1. System re-engineering: re-engineer the existing system to expose the business logics through well designed machine to-machine accessible interface, such as EJB, Web Services.        2. Customizing adapters: develop custom adapters for specific pages to extract the useful information; repeatedly write hundreds of codes to parse the structure of the HTML tree, extract the useful information and return the result for every web page; collaborate the result of each page to compose the useful information,        
The drawbacks of above approaches:
Usually, system re-engineering needs to analyze the existing legacy system architecture and code to have a deep understanding of the system architecture; to mine and re-engineer the existing assets to make it well structured and layered; and to expose the business logics through well designed interfaces. Therefore, the re-engineering work is too heavy or complicated to afford. When the system is not well designed, the needed efforts might be more than rebuilding the system from scratch. Usually; the older system can hardly find a maintenance document and a detailed description for code design. There is no design documentation or the document is so out-dated compared with the evolving code that it is meaningless, and then the re-engineering becomes a high risk, since re-engineering developers often did not attend the existing systems development at all. Sometimes there is no source code but the binary image, which leaves you no way to do the re-engineering.
The custom adapter approach needs to write different codes for each different scenario, respectively; and achieve the object by analyzing web pages and simulating a flow. This kind of approach is not a modeling one, and has insufficient automatization, low efficiency and a lot of repetitive works. Further, if the flow has a certain change or the page structure changes, the effort needed to modify code is great. Specifically, this approach is good for extracting information from a few specific pages, but when dealing with a large amount of pages, it is very time-consuming and not so efficient because it does not have a unified way to deal with all kinds of situations. Worse, there is no mechanism in this approach to expose functions into a logically cohesive and complete service for use in application integration, say nothing of security and session management.