Data processing applications often include task schedulers, which coordinate and control execution of the tasks performed by the data processing application. A task is generally implemented in the form of one or more instructions having a corresponding function. An example task is transferring a segment of data from one part of a memory storage media to another.
More specifically, consider the example of transferring 1 Megabyte (MB) of memory. Conceptually, one task can be assigned to perform this operation. The underlying operating system or computer hardware may not, however, allow the full 1 MB transfer. System resources are held for a relatively long time by such a data transfer, and the amount of data able to be transferred by any particular task may be consequently limited by design. Thus, a data transfer task must in this case be executed multiple times to complete the required 1 MB memory transfer. If an application performs many tasks, and each has a time slot allocated in a “round robin” manner, completing the 1 MB transfer reduces data throughput. This degrades the system performance.
Suppose now that the operating system has a limit on data transfer of 4 Kilobytes (Kb) for each task per execution. To transfer the proposed 1 MB, a data transfer task has to be executed 250 times, which is necessarily time consuming. An alternative is to have multiple tasks performing the same operation.
For example, if 10 tasks are assigned to a data transfer operation, 1 MB of transfer is performed by 10 tasks each executing 25 times. This approach reduces the total execution time as the task sequence performs the same operation in parallel. So, if the execution load is large, and there are multiple tasks, distributing the execution to multiple tasks is advantageous.
Referring to the example above, 250 executions are distributed among 10 task registers equally, each having 25 executions. The number of executions, and the task registers, may vary depending on the application. For example, if 100 executions are required for an application, and there are 9 task registers available, then equal distribution assigns to each task register a value of 11, which makes a total count of 99. The remaining execution is added to the first task register. Accordingly, the first task register executes 12 times, and the remaining 8 task registers each execute 11 times.
A more formally described example of this existing technique is now provided with reference to FIGS. 1 and 2. FIG. 1 presents a flow chart of the steps used for task distribution, while FIG. 2 is a schematic diagram of a hardware implementation of the task distribution technique described with reference to FIG. 1.
Consider an application that requires X number of executions, in this case 23 executions, using Y number of task registers, in this case 5 task registers. This data is read in step 120. Each register shares the execution load equally, if possible. A check is made in step 130 if the data value for the number of executions X is zero. If so, no further action is required. Otherwise, a check is made in step 140 of whether the data value for the number of task registers Y is zero, in which case no further action is required either.
Having made these two preliminary checks in steps 130 and 140, a division operation of X and Y is performed in step 150. The divisor and remainder are stored. In the following step 160, the X number of executions are distributed among the Y number of task registers using the “division method”. More specifically, consider distributing 23 executions among 5 task registers. As the quotient from step 150 is 4, and the remainder is 3, each task register is assigned 4 executions, and the remaining 3 executions are distributed as required. The “excess” three remainder executions are distributed to the first three task registers. Thus, in this example, each of the 5 task registers will have 5, 5, 5, 4, 4 assigned executions respectively.
When this regime is realized in hardware, as presented in FIG. 2, the number of executions 205 and task registers 210 are copied to divider logic 260. Besides divider logic 260, the hardware implementation also requires subtract logic 225, control logic 230 and adder logic 245. Adder logic 245 and subtract logic 225 are required to account for arbitrary combinations of executions 205 and tasks registers 210, which can result in remainder results following division.
After division, control logic 230 generates an enable signal DIN_SELECT to get the RESULT 265 values via DIN 255. This RESULT 265, which is in binary form, is then copied into each task count holding register 240. If the REMAINDER 220 is non-zero, then the control logic 230 generates an enable signal to the remainder select 215. This causes the remainder select 215 to pass the new value to remainder 220. The new value of the remainder 220 is calculated by subtracting “1” that from the previous value of the remainder 220 using subtract logic 225.
Correspondingly, control logic 230 also generates an enable signal DIN_SELECT to get the “DOUT+1” value via ADDER LOGIC 245 to DIN 255. The DOUT value is the RESULT 265 stored into each task count holding register.
The control logic 230 is synchronized to generate the enable signals to the REMAINDER SELECT 215 and DIN 255. Every time the REMAINDER 220 gets the new value (previous value less one), the task count holding register 240 value is incremented by 1 via an ADDER LOGIC 245. The control logic 230 then selects the next task count holding register 240.
The above process of subtracting “1” from the remainder 220, adding “1” (incrementing) to the task count holding register 240 and selecting the next task count holding register 240 continues until the REMAINDER 220 becomes zero.
As a result of division via DIVIDE LOGIC 260, if the REMAINDER 220 is zero then only the RESULT 265, which is in binary form, is then copied into each task count holding register 240 via DIN 255.
Task count holding register 240 is implemented as a Random-Access Memory (RAM), and the control block 230 generates an appropriate address, and reads and writes signals to this RAM. The data input to the RAM has two sources. First, the RESULT of the division is selected as input. The selection is performed through the control logic 230. The selected input is then written to the respective task count holding register 240. After writing the result in to selected task count register 240, the REMAINDER distribution occurs.
The remainder register 220 has two sources of inputs. One of the inputs is the remainder of the division from the divide logic 260 via the remainder select 215, and other is the content of remainder register subtracted by 1 every time the remainder is distributed among the task count holding register 240. Control logic 230 generates a select signal to select one of the inputs.
The remainder distribution is done by adding “1” to the contents of the RAM task count holding register 240 until the remainder becomes zero (that is, “1” is subtracted every time the RAM contents are incremented). The control logic 230 generates the select signals for the MUXs, read-write signal for the RAM and an enable signal for divide logic 260. The control logic 230 also generates address of the task count holding register 240 for copying the RESULT and distributing REMAINDER.
“Area critical” applications (in which silicon area of the hardware implementation is a key consideration) require one to minimise all unnecessary logic components. A need clearly exists for any improvement upon existing designs.