1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to an apparatus, a visual method and a system for rapid construction and delivery of distributed applications, such as web applications and web services.
2. Background Art
With a wide-spread adoption of the Internet and related forms of computer networking, the term “application” has come increasingly to mean combination of hardware and software integrated into a computing system that exposes a web-based interface to the network. FIG. 1 illustrates a computer network on which such applications operate.
The need to serve a large number of simultaneous users accessing the application from anywhere on the network requires modern applications to be scalable beyond the capacity of any single computer system. As a result, these applications are predominantly designed and implemented as distributed software systems deployed on clusters of commodity servers. FIG. 2 illustrates one possible topology of such cluster, while FIG. 3 illustrates a typical distributed application.
Distributed applications are by their nature more complex than traditional computer applications, which are designed to execute on a single machine and usually have a single memory space. A scalable distributed application that is capable of operating reliably 24 hours a day, 7 days a week, is a very complex computing system. Nevertheless, the explosion of complexity experienced today by enterprises that develop, deploy and operate distributed web applications and web services cannot be explained only by the inherent complexity of the functionality of these applications.
A big part of the complexity comes from the fact that the need to scale on-line applications caused a mass transition from traditional “big-box” enterprise servers, such as mainframes and SMP Unix servers, to commodity clusters in which different hardware and software components are delivered and supported by different vendors. The big-box enterprise servers were vertically integrated computing systems where the vendors spent billions of R&D dollars ensuring that all components and subsystems that go into the server interoperate well and no significant bottlenecks exist. Thus, integration of complex computing systems was and remains a key competency of every large server vendor.
When deploying applications on commodity clusters, the responsibility for integrating servers, networks, storage, operating systems, middleware, database engines, web servers, monitoring systems, management systems, backup systems, application-specific code and data, and all other moving parts that go into the finished system is now fully in the hands of the enterprise IT department. Moreover, unlike the big-box vendor, who was able to spread the costs and time spent on system integration over hundreds of nearly identical systems sold to different customers, with commodity clusters system integration has to be done over and over again, typically on each significant release of each application.
Over the years, there have been multiple attempts to develop an approach that reduces the complexity of such systems. These attempts can be classified into three broad categories: single system image (SSI) systems, distributed component systems and network-based systems.
The SSI systems attempt to rein in complexity by abstracting a distributed hardware system, such as a cluster, and presenting it to the application software as a single, large enterprise server with shared resources, in the hope that the benefits of a scalable commodity cluster can be combined with the simplicity of operating a big-box enterprise server. Naturally, there is no free lunch: both the operating systems and the applications designed for shared memory servers do not scale better than the SMP hardware—performance penalties become severe in systems with as few as 8 processors and only a rare application scales well to 64 processors, which is impressive by big-box standards but represents an entry level system for many, if not most, web applications.
The distributed component systems, such as CORBA, Microsoft .NET and DCOM, attempt to abstract the distributed nature of the underlying hardware system by changing the way the functionality of the application translates into software code; in these systems, the application is developed as a set of interoperating “component objects” with the assumption that every object is remote relative to any other object and the system is left to distribute the running set of objects in a transparent fashion and assist their interactions. While these systems should theoretically scale linearly to large system sizes, in practice, they rarely do. This is related as much to the fact that the system typically requires single vendor software to execute on all nodes and deliver most of the infrastructure the application might need, as to the fact that all aspects of the application have to be re-written specifically for the given system at a great expense.
The evolution of distributed applications over the last 15 or so years has shown clearly that the only successful approach to building such applications is network-based systems, also known as multi-tier architectures. With this approach, the application is constructed as a network of servers, configured to run mostly pre-existing software engines, such as web servers, database servers, Java application servers and similar to them, and specialized appliances, such as firewalls, load balancers and network attached storage. The application-specific content and code are deployed to the appropriate servers and each server typically is configured to execute a single function of the application.
The “one server—one function” principle is key to making network-based systems work. With this approach, the logical structure of the distributed application and the physical structure of the hardware system on which it executes become isomorphic, allowing one to use network monitoring and management tools and systems to gain visibility into the application and control its execution.
It is not accidental that most truly scalable Internet applications today, such as Google, Amazon.com, eBay, Yahoo! and many others, are implemented as network-based systems.
Despite its evolutionary success to date, the network-based approach to building distributed applications has three fundamental shortcomings which aggravate each other and limit significantly the ability to deliver new applications to market.
First, the network-based approach results in tightly-coupled configurations of servers, network switches, appliances, storage and software. Each cluster is built to fit the architecture of the individual application; configuring the application requires coordinated changes in the configurations of all of the above elements, which, in turn, require multiple specialists to effect the changes. The resulting system is very fragile, difficult to modify, and extremely difficult to tune and troubleshoot.
Second, the one server—one function principle, which is the only way by which these systems can be reasonably constructed, leads to a proliferation of underutilized servers that have to be individually maintained and administered, and consume huge amounts of power, air conditioning and physical space.
Finally, the large number of servers used to build manageable network-based systems means that using proprietary operating systems and infrastructure software in them is exceedingly expensive since those products are usually licensed per server or per processor and are value-priced assuming deployment on very few servers. As a result, typical network-based systems are built with the wide-spread use of open source software, including operating systems, web servers, database engines, etc. The providers of open source software, however, derive revenue primarily from support and consulting which leaves them with little incentive to make their products easy to install, configure and operate. As a result, the complexity of network-based systems is amplified by the complexity and fragility of installing and configuring dozens of instances of open source software packages.
All this means that network-based systems are brought to market only through massive application of highly qualified manpower. While this approach is acceptable and justified when constructing unique and large services, such as Google, its impact on a typical business application is nothing short of devastating. The amounts of capital expenses, effort and money spent by a typical enterprise between the time the application code is complete and the time the application is successfully deployed to operations exceeds today the amount of money and time spent developing the application itself by a wide margin, often exceeding the total spending on development and operations together.
The negative results of the enormous complexity of today's application delivery process are easily visible. Over 40% of the defects found in the applications successfully escape the testing cycles and are reported by end users as negative experience. Over 50% of the attempted deployments of such applications fail due to hard-to-find configuration errors and have to be rolled back. Finally, the long and expensive process of delivering distributed applications means that enterprise IT departments become extremely risk-adverse and resist changes that are required for the enterprise to respond to market conditions.
There is clearly a tremendous need for a solution that can simplify and accelerate significantly the process of delivering distributed applications on commodity hardware systems, while preserving the ability to use widely existing software, particularly open source infrastructure, in the construction of such applications. Such solution must also make it easy to implement a fluid, iterative process of modifying the applications and adding functionality to them, so that new business services can be delivered to market within the same quarter when the need for them is identified rather than 3 or 4 quarters later, as is the case frequently today.