The present invention relates generally to the field of information data management and, more particularly, to allocating resources on a computer server to perform background management activities and foreground primary activities.
When a background backup job in a personal computer starts, often times, people experience slow response time, slow down of network connection, and even frozen keyboard response. Similar impacts can occur in computer servers managing data. Managing large scale data storage system with terabytes or even petabytes of enterprise data presents daunting challenges. Management events for the data are usually periodically scheduled by the system or manually driven by an operator. Periodically scheduling management tasks typically is not responsive to the system's changing needs and a periodic, scheduled management job may not be performed at the best time. Manual scheduling lacks any mechanism to isolate any performance impacts of the management jobs from the primary applications. These approaches can either adversely affect the user utilization or over-provision resources to ensure a certain level of service quality.
Prior art of related fields generally falls into three categories: systems providing performance isolation among processes, various scheduling methods, and storage management software systems. The idea of performance isolation comes up in several contexts: process scheduling in operating system, resource sharing among virtual machines on the same physical devices, and application level workload management. To differentiate various classes of applications, UNIX-like operating systems allow users to set the priority of an application through the nice command. A lower priority process always yields to a higher priority process. A drawback of priority-based scheduling is that it does not provide a predictable finish time or a predictable share for any process.
QLinux is a Linux® real time operating system which provides quality of service guarantees for multimedia applications. QLinux focuses on serving multimedia applications. Virtual machine systems Xen® and VMware® deal with the issue of sharing resources among multiple virtual machines at the same physical instance. Xen® and VMware® provide basic resource (CPU, network, and storage) isolation among multiple virtual machines. The isolation is enforced at the virtual machine level, but not at application or process level. IBM® Enterprise Workload Manager (EWLM) provides performance monitoring and resource management among applications on various platforms. EWLM provides performance isolation at application level.
Many scheduling methods have been proposed for resource sharing with performance isolation on various types of physical resources, including networks, CPU, and disk bandwidth. Start time Fair Queueing (SFQ) is a scheduling algorithm that achieves fairness and work-conserving performance isolation among multiple resource competitors. It provides proportional resource sharing according to the weight each stream reserves. SFQ has also been adapted for resource sharing on a storage service including SFQ(D), a further refinement of SQF, Four-tag Start-time Fair Queuing (FSFQ(D)), a further refinement of SFQ(D), and Request Windows algorithm. These methods all share the same assumptions of inaccurate or unknown resource capacity. Sledrunner, an IBM® memory management system, provides performance guarantees through I/O rate throttling. Potentially, Sledrunner can be non-work-conserving when automatic throttling mode is on. In this mode, the incoming request rate allowed may be less than the actual resource capacity of the storage server. Adaptive rate control approach, introduced in SIGMETRICS Performance Evaluation Review, 33, 4 (2006), 11-16 by J. Zhang, P. Sarkar, and A. Sivasubramaniam, has the non-work-conserving issue similar to the Sledrunner system. Work conserving is a key requirement for maximizing resource utilization. Since the workload is dynamic, a non-work-conserving scheme may not be able to capture the frequently changing pattern and consequently under-utilize the throughput ranging from 5% to 40% in various cases.
Existing storage management software systems include IBM Tivolli® Storage Manager (TSM), HP OpenView® Management software, and Veritas® Volume Manager. Their management processes typically run periodically at a preset time of the day. While running, it often lacks the mechanism to share the resources proportionally with the primary applications. Microsoft's® MS Manners is a feedback-based method to improve the performance of the high importance jobs and reduce the contention. MS Manners does not provide any completion time guarantee or proportional resource sharing.
A scheduling system that effectively controls the background data management activities in a large scale storage system would offer benefits in efficiency, minimizing adverse impacts on foreground program jobs. A better method of scheduling management jobs is needed to more optimally perform background management jobs to minimize adverse impacts on foreground primary activities.