It is no secret that Internet usage has exploded over the past few years and continues to grow rapidly. People have become very comfortable with many services offered on the World Wide Web (or simply “Web”), such as electronic mail, online shopping, gathering news and information, listening to music, viewing video clips, looking for jobs, and so forth. To keep pace with the growing demand for Internet-based services, there has been tremendous growth in the computer systems dedicated to hosting Websites, providing backend services for those sites, and storing data associated with the sites.
One type of distributed computer system is an Internet data center (IDC), which is a specifically designed complex that houses many computers for hosting Internet-based services. IDCs, which also go by the names “Webfarms” and “server farms”, typically house hundreds to thousands of computers in climate-controlled, physically secure buildings. These computers are interconnected to run one or more programs supporting one or more Internet services or Websites. IDCs provide reliable Internet access, reliable power supplies, and a secure operating environment.
FIG. 1 shows an Internet data center 100. It has many server computers 102 arranged in a specially constructed room. The computers are general-purpose computers, typically configured as servers. An Internet data center may be constructed to house a single site for a single entity (e.g., a data center for Yahoo! or MSN), or to accommodate multiple sites for multiple entities (e.g., an Exodus center that host sites for multiple companies).
The IDC 100 is illustrated with three entities that share the computer resources: entity A, entity B, and entity C. These entities represent various companies that want a presence on the Web. The IDC 100 has a pool of additional computers 104 that may be used by the entities at times of heavy traffic. For example, an entity engaged in online retailing may experience significantly more demand during the Christmas season. The additional computers give the IDC flexibility to meet this demand.
While there are often many computers, the Internet service or Website may only run a few programs. For instance, one Website may have 2000–3000 computers that run only 10–20 pieces of software. Computers can be added daily to provide scalability as the Website receives increasingly more visitors, but the underlying programs change less frequently. Rather, there are simply more computers running the same software in parallel to accommodate the increased volume of visitors.
Managing the physical resources of an Internet service is difficult today. Decisions such as when to add (or remove) computers to carry out functionality of the Internet service are made by human operators. Often, these decisions are made based on the operators' experience in running the Internet service. Unfortunately, with the rapid growth of services, there is a shortage of qualified operators who can make real-time decisions affecting the operation of a Website. Accordingly, it would be beneficial if some of the managerial, or policy aspects of running a Internet service could be automated.
Today, there is no conventional way to automate policy aspects of running an Internet service in a way that abstracts the functionality of the policy from the underlying physical deployment of the application. Perhaps this is because the industry has grown so fast that everyone's focus has been simply to keep up with the exploding demand by adding computers to Website applications. Not much thought has gone into how to model a policy mechanism that is scale-invariant.
At best, most distributed applications rely on human intervention and/or manual control for: (a) installing application components, (b) configuring the components, (c) monitoring the health of the overall application and individual components, and (d) taking reactive measures to maintain good overall application and component health. Where these tasks are automated, they operate in context of isolated physical components, without a big picture of how the components of the application relate to one another.
The downside with such traditional procedures to implementing policy is that Website operators must maintain constant vigilance over the operation of the application. Upon detecting a change in the application's operation that is contrary to the overall health of the application, Website operators are typically required to consult one or more documents that essentially detail the actions the Web Operators should take upon the occurrence of the condition. Moreover, such documents must continually be updated as the Website grows in physical resources and/or as the policy changes. Accordingly, it would be beneficial if some of the managerial, or policy aspects of running an Internet service could be automated.