It is no secret that Internet usage has exploded over the past few years and continues to grow rapidly. People have become very comfortable with many services offered on the World Wide Web (or simply “Web”), such as electronic mail, online shopping, gathering news and information, listening to music, viewing video clips, looking for jobs, and so forth. To keep pace with the growing demand for Internet-based services, there has been tremendous growth in the computer systems dedicated to hosting Websites, providing backend services for those sites, and storing data associated with the sites.
One type of distributed computer system is an Internet data center (IDC), which is a specifically designed complex that houses many computers for hosting Internet-based services. IDCs, which also go by the names “Webfarms” and “server farms”, typically house hundreds to thousands of computers in climate-controlled, physically secure buildings. These computers are interconnected to run one or more programs supporting one or more Internet services or Websites. IDCs provide reliable Internet access, reliable power supplies, and a secure operating environment.
FIG. 1 shows an Internet data center 100. It has many server computers 102 arranged in a specially constructed room. The computers are general-purpose computers, typically configured as servers. An Internet data center may be constructed to house a single site for a single entity (e.g., a data center for Yahoo! or MSN), or to accommodate multiple sites for multiple entities (e.g., an Exodus center that host sites for multiple companies).
The IDC 100 is illustrated with three entities that share the computer resources: entity A, entity B, and entity C. These entities represent various companies that want a presence on the Web. The IDC 100 has a pool of additional computers 104 that may be used by the entities at times of heavy traffic. For example, an entity engaged in online retailing may experience significantly more demand during the Christmas season. The additional computers give the IDC flexibility to meet this demand.
While there are often many computers, an Internet service or Website may only run a few programs. For instance, one Website may have 2000–3000 computers that run only 10–20 distinct software components. Computers may be added daily to provide scalability as the Website receives increasingly more visitors, but the underlying programs change less frequently. Rather, there are simply more computers running the same software in parallel to accommodate the increased volume of visitors.
Today, there is no conventional way to architect Internet Services in a way that abstracts the functionality of the Service from the underlying physical deployment. Little thought has gone into how to describe a complete Internet Service in any manner, let alone a scale-invariant manner. At best, Internet Service operators might draft a document that essentially shows each and every computer, software program, storage device, communication link, and operational relationship in the Website as of a specific time and date. The downside with such physical schematics is, of course, that the document is always out of date, must be updated as the Service grows in physical resources and hence, it is of limited usefulness as a management tool. Furthermore, while a human may understand such a document, it holds no meaning to a computer.
Moreover, managing the physical resources of the distributed computer system for an Internet Service is difficult today. Decisions such as when to add (or remove) computers to carry out portions of the Internet Service are made by human operators. Often times, these decisions are made based on the operators' experience in running the Internet Service. Unfortunately, with the rapid growth of services, there is a shortage of qualified operators who can make real-time decisions affecting the operation of a Website. Accordingly, it would be beneficial if some of the managerial aspects of running a Internet service could be automated.