The present invention relates to a data processing field, and more specifically, to a data assignment method and data assignment apparatus for a physical machine in the data processing field.
With the development of virtualization techniques, it is common to run multiple virtual machines (VMs) on a same physical machine (PM) simultaneously. Sharing hardware resources of a PM and operating in coordination with each other by these VMs may not only increase hardware capacity of the PM, but also enable independent data processes in various VMs without mutual effects, and thereby work efficiency of the PM can be improved significantly.
Current common virtualization techniques comprise Xen, KVM (Kernel Virtual Machine), etc. Taking Xen as an example, U+1 VMs Dom 0, Dom 1, Dom 2, . . . , Dom U may simultaneously run on a PM A. VM Dom 0 acts as a control VM, capable of recognizing which one of VMs Dom 1 to Dom U data received by PM A belongs to. Each of Dom 1 to Dom U has its own queue. Dom 0 stores data in the queue of Dom i (i is an integer from 1 to U) for the processing of Dom i. For example, Dom 0 assigns data of Dom 1 to Dom 1 by storing data to be assigned to Dom 1 into a reference storage page Page 1 corresponding to Dom 1, and then switching data in Page 1 and data in a VM storage page Page 1′ corresponding to Dom 1. Similarly, data assigned to each of Dom 2 to Dom U is stored in its respective queue. Thus, each of Dom 1 to Dom U may run in parallel by fetching data from its respective queue.
However, since data handled in the process of handling data of a job by each VM may correspond to a task with a different complexity contained in the job (a job may comprise several tasks), even though each VM has the same amount of data to be processed, the processing times of various VMs may be different due to different operation and computation complexities corresponding to different data sets. For example, as to a certain MapReduce load, due to different processing times of multiple VMs for handling the same amount of data on a single PM, 97% of tasks may be accomplished within one hour, remaining some tasks that might require a considerable long period of time to be accomplished, for example more than 10 hours.
Due to different data processing times of different VMs on the PM, there may be a situation in which some VMs on the PM have completed their data processes while other VMs are still in data handling processes, causing a long waiting state of some VMs on the PM, and thus PM resources cannot be sufficiently utilized and processing efficiency of the PM may be affected.