The great progress in online services during the last decade has led to a dramatic increase of system complexity. Nowadays, typical systems such as GOOGLE and AMAZON contain thousands of components including operating systems, application software, servers, and networking and storage devices, which deal with millions of user queries or transactions every day. Users of online services always expect high Quality of Services (QoS), which has as one important criterion short latency, among others.
However, a sudden increase of user workload will probably lead to a system bottleneck on some components, hence deteriorates QoS. While sufficient hardware resources need to be deployed to ensure high QoS, meanwhile, an oversized system with scale could be a huge waste of resources. Therefore, it is very important to match resource assignments with capacity needs for each component in large scale systems.
Although static capacity planning for standalone software might be simple, it is difficult to specify the capacity needs for online service systems because they usually vary with user workloads. In other words, fixed numbers of capacity are not able to give a good specification. Due to the dynamic nature of user workloads, a mathematical model is highly desired to capture their hidden relationship with system capacity needs. Moreover, the model should be general enough to be applicable to different types of system resources, such as CPU, disk I/O, memory, and so on.
Co-pending U.S. patent application Ser. No. 11/860,610, filed Sep. 25, 2007, and assigned to the assignee herein, addresses the capacity planning problems associated with large scale computer systems. U.S. patent application Ser. No. 11/860,610 discloses a method and apparatus that extracts pair-wise relationships between computer system measurements, but does not address relationships involving multiple system measurements.
Accordingly, a method and apparatus is needed for capacity planning and resource optimization in large scale computer systems, which addresses the relationship among multiple system measurements.