Modern tasks of a computer systems may require that a task, application, or process be started, run, and stopped hundreds of times. As a consequence, a computer system can quickly exhibit poor performance and suffer usage issues when it has to constantly recreate an environment in which to run the task. To alleviate this requirement pre-started jobs may be utilized.
A job is generally an execution path through an address space of a computer system. The job may be as simple as a set of program instructions loaded in memory or as complex as an instance of an application or program. A pre-started job is a job loaded from a storage device, such as a hard disk drive or electronic memory, active in memory and ready for immediate use. With a pre-started job the computer system can quickly process a task specific to that pre-started job without waiting for program load times, allocation of memory, configuration of the program, configuration of data, processor scheduling, or other overhead associated with starting the job from scratch. Generally speaking, pre-started jobs are active in memory in a “pool” and chosen by the computer system for a particular task. The management of pre-started jobs is known colloquially as “job pooling”.
In a typical computer system, the processor, or central processing unit (“CPU”), is coupled to a multi-level memory architecture that includes a main memory typically implemented using Dynamic Random Access Memory (DRAM) solid state devices along with one or more smaller, faster Static Random Access Memory (SRAM) cache memories that are used to reduce the average time to access data by temporarily storing copies of data from the most frequently used portions of the main memory. Caches are often integrated onto the same processor chip as the CPU, and thus provide significantly faster performance than a main memory, which is generally external to the chip upon which the CPU is fabricated. When data required by the CPU is not present in a cache (i.e., there has been a cache “miss”), the main memory must be accessed to retrieve the data. Consequently, the performance of a computer is often dependent upon how often data used by a CPU needs to be retrieved from a slower main memory instead of accessed from a cache.
In a computer with a plurality of CPUs, a non-uniform memory access (“NUMA”) configuration may be utilized to effectively distribute the main memory across multiple nodes. NUMA configurations originated from the need to provide the plurality of CPUs with sufficient memory without decreasing the performance or otherwise “starving” the plurality of CPUs with slow memory access. In a typical NUMA configuration at least one CPU, one or more CPU caches, and a portion of the main memory (e.g., a set of DRAM memory devices) are connected to a memory bus to form a “node.” Typically, a plurality of nodes are connected by means of a high speed interconnect to form a NUMA configuration. The portion of the main memory resident on the same node as a CPU is typically considered to be the “local memory” for the CPU, while portions of main memory resident on other nodes are typically referred to as “remote memories” relative to the CPU.
In a computer system with a NUMA configuration (a “NUMA system”), a data access by the a CPU that is satisfied by the contents of a local CPU cache or a local memory is referred to as a “local node” access. Accordingly, a “remote node” access is typically an access satisfied by accessing data that is stored on a remote node. Data accesses to remote nodes are associated with a very high latency as compared to local node accesses. As such, NUMA systems are typically configured to “compartmentalize” processing to local nodes whenever possible.
Typically, when a pre-started job is utilized in a NUMA system, it follows that the pre-started job will execute faster if it is configured to perform the task on a local node with the data required for operation in a local CPU cache. Manual configuration of pre-started jobs to particular nodes is known in the art. In the typical manual configuration, the pre-started job is assigned to a node and may be referred to as a “local pre-started job” in reference to its node.
Upon a request for execution of a task, the NUMA system congruent with the prior art typically matches the task to a particular pre-started job and dispatches the task to the node with the particular pre-started job. In this way, a typical NUMA system that utilizes pre-started jobs assigns every task a “home node” with a “home” pre-started job. Upon a subsequent attempt to process the task, such a NUMA system typically attempts to assign that task to its home node and home pre-started job because the data for the task is already present on the home node. As such, the task will typically have affinity towards the node that it is assigned initially to run on. When the home pre-started job for the task is busy, the NUMA system will queue the task in its home node.
One downfall of the current art occurs when multiple pre-started jobs in different nodes require access to the same data. In that event, the local pre-started job in one node may not have data that is required for its associated task, causing data to be transferred from node to node. Transferring data from node to node results in severe latencies and a performance hit. This poses a distinct problem, as the performance hit for a NUMA system may be greater than the performance hit associated with access to data in a typical computer system. For example, suppose that there are three distinct queries, each operable to execute in different pre-started jobs, each pre-started job placed in a different node as per the current art, and two of the queries operate on the same data. The two pre-started jobs that require the same data will compete for access to the same data, resulting in a performance hit as the data is transferred from node to node.
Another downfall of the current art occurs when there are multiple task execution requests for a particular pre-started job. In that event, tasks in the queue for the particular pre-started job on the particular node will back up. If the tasks may be performed in other pre-started jobs, those other pre-started jobs may remain underutilized. Thus, there is currently no way to reconfigure the pre-started jobs to execute tasks more efficiently in a dynamic manner.
Consequently, there remains a need for selecting pre-started jobs to prevent performance impairments that may be caused by the transfer of data from one node to another in NUMA computer systems.