1. Field of the Invention
The present invention relates generally to the art of designing multi-tier system architectures, and more particularly to producing a design or set of designs meeting a high level set of performance and availability requirements.
2. Description of the Related Art
Certain businesses or other organizations deploying Internet and Enterprise services utilize components performing within multiple tiers. In such an environment, service downtime and poor performance either among individual components or within tiers can reduce overall productivity, revenue, and client satisfaction. The challenge in such an environment is to operate at efficient or sufficiently optimal levels of availability, where availability is defined as the fraction of time the service delivers a specified acceptable level of performance. Acceptable levels of performance may vary depending on the organization's business mission.
Component failure within the infrastructure supporting a service can adversely impact service availability. A “service” is a process that may run on one or more computing hardware components, and perhaps a large number of such components, including servers, storage devices, network elements, and so forth. Many of the hardware components run various collections and layers of software components, such as operating systems, device drivers, middleware platforms, and high-level applications. Performance of these components may be characterized by quantifiable statistics, including but not limited to component failure rates. For an individual component, if the component has a low failure rate in isolation, in total the combined infrastructure having multiple components can experience a significant rate of component failures. This significant component failure rate can in turn lead to frequent or extended periods of unplanned service downtime or poor performance.
The challenge in such an environment is to assess service availability and performance as a function of the different design choices including the type of components to be used, the number of these components and associated hardware and software configurations, and to select the appropriate design choice that satisfies the performance and availability requirements of the service at a relatively minimum cost.
Previously available assessment tools have been unable to automatically find a solution from this multi-dimensional design space that provides an enhanced cost-benefit tradeoff assessment to the user.
Currently available tools to select a design typically only enable evaluation of a single design. Since previous tools only evaluate single designs, system design has entailed employing human experts to manually define alternative designs satisfying the specific availability requirements. A primary disadvantage of the current approach is the need to employ an expert to carry out the design. Such experts may be in scarce supply or be relatively expensive. In addition, assessment and design according to the expert process is largely manual and likely slow. Finally, the final results of the manual design process are not necessarily optimal since they are guided mostly by experience and intuition rather than based on a systematic algorithm for searching the large, multi-dimensional space of candidate designs.
Automating the design and configuration of systems to meet user's availability requirements exists in very few situations. One system, an Oracle database design, implements a function that automatically determines when to flush data and logs to persistent storage such that the recovery time after a failure is likely to meet a user-specified bound. Automated design of storage systems to meet user requirements for data dependability have been considered, encompassing both data availability and data loss. Such technologies for automating subsystems, such as databases and storage systems tend to be domain specific and generally cannot be applied to designing multi-tier systems.
Certain previous attempts to manage component and configuration availability have been limited to automated monitoring and automated response to failure events and other such triggers. For example, cluster failover products such as HP MC/Serviceguard, Sun Cluster, and Trucluster detect nodes that fail, automatically transition failed application components to surviving nodes, and reintegrate failed nodes to active service upon recovery from the failure condition. IBM Director detects resource exhaustion in its software components and automates the rejuvenation of these components at appropriate intervals. Various utility computing efforts underway will also automatically detect failed components and automatically replace them with equivalent components from a free pool. Most notably, none of these products or processes provide an overall assessment for particular architectures, but merely react upon failure of a process, component, or tier.
Based on the foregoing, it would be advantageous to offer a system and method for designing multi-tier systems