The present invention is related to backing up individual data sets, and more particularly, to a computer implemented method for determining an optimal number of tasks for processing data backup requests that may be executed in parallel.
In the past, the Data Facility Storage Management Subsystem hierarchical storage manager (xe2x80x9cDFSMShsmxe2x80x9d) product has been used to supervise the processing of data storage requests. However, the hierarchical storage manager would start a data storage task for each request. A task would start up, process the request and then terminate. In such an environment, a task may be idle while waiting to process a request, or a request may await a tape mount of a newly starting task. The request might then be required to wait for that mount to complete as opposed to having a request processed by a task just finishing with a tape already mounted. This type of system operates with no regard for overhead costs in terms of either time or resources that become unavailable for processing other requests. Thus, the DFSMShsm product provides no method for minimizing overhead costs on a per request basis, nor does it optimize the use of available resources, resulting in unnecessary operation costs for end users. A need therefore exists for a data storage system that determines an optimal number of tasks to be executed so as to minimize or reduce the average overhead and processing costs of the requests that comprise the executing tasks.
The invention provides a computer implemented method that employs a parallel assessment algorithm for determining an optimal number of concurrently executing tasks for processing requests in the context of a computer hardware system. In general, the parallel assessment algorithm simulates the assignment of a request to a task representation based, in part, on the estimated processing times of simulated execution of task representations that represent actual executing tasks, but are not tasks themselves. An actual executing task is a sequence of executable instructions that causes work or requests to be processed, waits to process a request, or starts to process a request. There are two types of task representations. One type of task representation is an existing task representation which represents an actual executing task. The other type of task representation is a projected task representation which is a proposed executable task that is being considered for execution, but which is not being executed. A projected task may be pseudo-assigned one or more requests of which none represents a request that is being processed. Each task representation and actual executing task has an estimated task busy time (or task processing time) associated with it. The task busy time refers to the estimated time for completing the processing of all requests (also referenced as the task busy time) pseudo-assigned or assigned to a particular task representation or actual executing task, respectively. Requests are pseudo-assigned to the task representation having the least amount of task busy time if the task busy time is less than a minimum work threshold. The minimum work threshold represents a predetermined minimum processing time that justifies starting a task. In general, pseudo-assigning a request is the process of simulating the assignment of the processing time of a request to a task representation.
If the task busy times of all task representations equals or exceeds the minimum work threshold of all existing task representations, and the number of task representations does not exceed a maximum task index, a new projected task representation is defined to which the next non-pseudo-assigned request from the queue is pseudo-assigned. A non-pseudo-assigned request is a request that has not been pseudo-assigned to a task representation. Then a value representing the estimated processing time of a non-pseudo-assigned request and estimated start-up overhead time of starting a new task are added to an array value representing the task busy time for the new projected task representation. If the estimated processing time, i.e., task busy time, associated with projected task representation meets or exceeds a minimum threshold value, one embodiment of the parallel assessment algorithm indicates projected task representations that may be transformed into actual executing tasks, and another embodiment of the parallel assessment algorithm transforms the projected task representations into actual executing tasks. Transforming a task representation into an actual executing task means that a task is created that actually processes requests that were pseudo-assigned to the task representations in the simulation.
The foregoing features may be implemented in a number of different ways. For example, the invention may be implemented to provide a method for determining an optimal number of tasks for processing Requests. In another embodiment, the invention may be implemented to provide a signal bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured for determining an optimal number of tasks for processing requests. In still another embodiment, the invention may be implemented as a computer hardware system.