It is often difficult to balance the conflicting demands of storage capacity requirements and performance requirements (e.g., access times). Multi-tier storage environments, such as two-tier storage systems, typically provide a performance tier that employs memory based on performance considerations and a capacity tier that employs storage based on capacity considerations. In this manner, multi-tier storage systems balance between the relative costs of memory and other storage and their relative speeds. Such multi-tier storage environments typically allow particular levels of performance to be achieved at a significantly lower cost than would otherwise be possible.
It is often desirable to provide such multi-tier storage environments transparently to users and applications. In some circumstances, however, applications can obtain performance improvements when the multiple tiers are visible to applications.
MapReduce is a programming model for processing large data sets, such as distributed computing tasks on clusters of computers. During the map phase, a master node receives an input, divides the input into smaller sub-tasks, and distributes the smaller sub-tasks to worker nodes. During the reduce phase, the master node collects the answers to the sub-tasks and combines the answers to form an output (i.e., the answer to the initial problem).
A number of job schedulers exist that allocate computational tasks, e.g., batch jobs, in such job scheduling environments among available computing resources. A need exists for improved job schedulers that assign different classes of jobs or tasks to different classes of storage resources.