A typical computer system includes, among other components, a processing unit (i.e., microprocessor or Central Processing Unitxe2x80x94CPU) that is responsible for performing, executing or interpreting a number of different tasks or processes which process data in some manner within the system. By way of example, a networked data communications device such a router or switch may include a processor that performs a primary task of transferring data in packets, cells, frames or other units between various network interfaces or ports in the data communications device. Besides the primary data transfer task, the processor may also need to perform other tasks such as protocol processing, statistics gathering, fault monitoring/handling, and so forth. While the main purpose of the processor in the data communications device is to perform the data transfer task to transfer data through a network, the device must allow the processor to perform each of the other tasks from time to time and each task may have a different level or class of importance or priority with respect to the other tasks.
Typically, a computer system that performs multiple tasks also includes an operating system that incorporates a task scheduler algorithm, routine, procedure or process. The task scheduler is generally responsible for determining the priority, class or importance of each task and for selecting and scheduling various tasks for performance and initiating the tasks on the processor in a time sliced or multiplexed manner based on the priorities of the various tasks. As an example, perhaps the processor in the data communications device only needs to perform a statistics computation task occasionally and should only perform such a task during low priority situations (e.g., situations that can allow for interruptions). The statistics computation task may have a lower assigned priority than, for example, the data transfer task which carries out the main purpose of the device of transferring data. The task scheduler may thus schedule the statistics task for performance less frequently than the data transfer task. Conversely, the processor may need to perform a fault or exception handling task only occasionally and for only a brief period of time, but when such a task is required, it is essential that the device perform it immediately in response to a condition such as a network fault to prevent a loss of data, interruption of network service or complete device failure. The fault handling task may thus have the highest assigned priority of any task. As such, when a fault occurs that must be handled by the fault handler task, the task scheduler in the operating system will schedule the fault handling task for performance over all other tasks including the data transfer task.
Some implementations of computerized devices provide multiple processors to perform different tasks or groups of tasks at the same time, one task per processor. Other arrangements provide a single processor which is responsible for performing all of the required tasks in the device. In either case, when more than one task can be performed on a processor, prior art task scheduling algorithms generally divide up a total amount of available processing cycles or processor time available on the processing unit between the various tasks that must be performed. This is typically called time slicing or time division multiplexing (TDM). Under the general theory of operation of operating systems that use prior art task scheduler algorithms, each task will execute using a segment of processor time to which that task is assigned. The task scheduler may adjust time segments assigned to tasks based on priorities of those tasks. A common example of a prior art task scheduling algorithm is called the xe2x80x9cRound Robinxe2x80x9d algorithm. A task scheduler within an operating system that uses round robin generally divides a block of time representing a total amount of available processor time into equal time slices and designates or assigns one time slice to each task that is to be performed. The task scheduler then allows the processing unit to execute the first task during the first time slice. When the task scheduler detects that the first time slice for the first task is complete (i.e., has elapsed), the task scheduler removes the first task from execution and places it at the end of a queue of tasks that are ready to execute and then allows the next (i.e., the second) task to execute during its assigned time slice. This process repeats for each task to be performed, hence the name round robin. A version of the round robin task scheduling algorithm that takes priorities into account may, for example, increase the size of a time slice assigned to higher priority tasks and may decrease the time slices provided to lower priority tasks.
Another prior art task scheduling technique is called xe2x80x9cFirst-In-First-Outxe2x80x9d or FIFO. Using FIFO, a task scheduler places tasks that are ready for execution into a FIFO queue. The order in which the task scheduler enters a task determines when that task performs in relation to other tasks. The processor may perform the task at the head of the queue (or the next ready task) until either the task completes, or is blocked for an event such as an input/output interrupt. At that point, the processor begins performance of the next task in the ready queue. Thus in FIFO scheduling, there is no particular fixed time slice or number of cycles for which a particular task is performed.
A third example of a prior art task scheduling algorithm is called xe2x80x9cpriority based pre-emptive scheduling.xe2x80x9d In a priority based approach, the task scheduler assigns a relative priority to each task that the processor must perform based upon various factors such as how critical the task is to the operation of the computerized device. The task scheduler generally schedules tasks that are ready to perform based upon the relative task priorities. Thus, if three tasks indicate that they are each available to perform on the processing unit, the priority based task scheduler will generally select the task with the highest priority for performance. The task scheduler may select a task for performance from two or more tasks with equivalent priorities, for example, based on a round robin approach used for multiple tasks that share the same priority. This allows the processor to execute equal priority tasks for equal amounts of time.
In most computer systems, various events can cause a task to be blocked during performance. By way of example, a disk drive experiencing disk latency may cause a task to be blocked while waiting for data. The task scheduler can remove the blocked task from a ready queue or adjust the task""s status to xe2x80x9cwaiting.xe2x80x9d During the period of blocking, the task scheduler is able to instruct the processor to execute other tasks. When the disk drive is ready with the data for the waiting task, the disk drive processor can signal the processor via an interrupt, for example. The task scheduler can detect this condition and stop the current task from performance in order to have the processor resume performance of the waiting or blocked task by moving it back to the ready queue and setting the task""s status to xe2x80x9cready.xe2x80x9d Priority or pre-emptive scheduling allows tasks to execute based on their priorities and is designed to allow a device to perform tasks of high priority in favor of tasks having lower priorities.
Prior art implementations of task scheduling algorithms suffer from a number of deficiencies. Generally, these deficiencies arise since system designers typically design and create prior art operating systems for general purpose uses. In other words, the designers employ prior art task scheduling algorithms in computer devices to handle a wide range performance scenarios that can vary from use to use of the particular computer device into which they are incorporated. However, when operating systems employ prior art task scheduling techniques for performance of highly specialized applications and/or in highly specialized or dedicated computerized devices, problems frequently arise.
By way of example, in a data communications device such as a router, switch, hub or the like, the primary task of the device is to transfer data. In a router, for instance, packets arrive on various ports or interfaces and must be transferred through the router to destination ports or interfaces to move the packets along the network towards their destination. While the overall operation of the router may involve a number of different tasks such as maintaining routing tables, error checking, statistics gathering and so forth, the primary task performed by the router involves transferring the data packets from one port or interface to another. Under heavy network traffic conditions where the router must transfer many packets, the priority of the data transfer task must remain very high to keep up with the packet flow in order to avoid network bottlenecks causing delays or lost data. As such, a data communications device employing a prior art task scheduling algorithm schedules the data transfer task ahead of most other tasks. This can cause a situation called xe2x80x9ctask starvationxe2x80x9d in which a processor is unable to execute many lower priority tasks each for a sufficient amount of time to allow each low priority task to effectively perform. Essentially, in situations involving task starvation, during heavy load conditions, the device xe2x80x9cstarvesxe2x80x9d lower priority tasks of processor cycles in favor of one or more high priority tasks (the data transfer task in this example). Accordingly, in heavy traffic load situations, a system that allocates a small amount of time to lower priority tasks might be able to properly perform one or two of such non-critical tasks, but when the processor must perform several of such non-critical tasks, each is unable to effectively perform. When a non-critical task cannot perform sufficiently, the output of that task may not be available when required for some other more important task. This can result in a thrashing situation where non-critical tasks that fail eventually begin affecting performance of the critical tasks that rely on their output and therefore the entire system degrades.
Even in task scheduling systems where the device fixes the time quantum or time slice for each task, the number of non-critical or lower priority tasks that must execute can change dramatically in the system. This can result in the data transfer task not having enough time to perform. Also, in situations where the device raises the priority of the non-critical tasks, this can result in the data transfer or packet switching tasks not having enough time to perform. Effects of this can range from packet delays to packet drops because the non-critical tasks are assigned too much time for performance.
The present invention significantly overcomes many of the deficiencies of prior art task scheduling algorithms. More specifically, in one arrangement, the invention provides a method for performing a plurality of tasks of varying priorities in a computerized device. The method is preferably implemented in a yielding scheduler within an operating system in a computing device configured according to the invention. Though examples herein include a data communications device, the invention is applicable to any computing device that performs multiple tasks. The method includes the operations of initiating performance of a first task and then upon occurrence of a first time period, receiving a yield signal initiated by the first task during performance of the first task. The first task may be, for example, a primary task such as a data transfer task in a data communications device. The yield signal, under control of the first task, allows the first task to share processor time with other tasks. In response to receiving the yield signal, the method includes the operations of temporarily disabling performance of the first task and performing at least one second task for a second time period, such that the first task yields performance time to one or more second tasks after the first time period, irrespective of a performance time slice provided to the first task by the computerized device. The second task or tasks may be, for example, lower priority tasks grouped into a non-critical class of tasks. When the first task yield performance time, depending upon the embodiment, just one second task may be performed during the second time period, or more than one second task may be performed during the second time period. Allowing a task to yield time to one or more other tasks is advantageous over conventional systems since the decision to yield processor time is given to the task itself and thus task starvation can be better controlled.
Another arrangement includes the steps of detecting an event indicating expiration of the second time period, and in response thereto, stopping performance of the second task(s) and enabling performance of the first task. This arrangement also includes a step of repeating the steps of performing a first task for the first time period, receiving a yield signal, temporarily disabling performance of the first task, performing at least one second task for the second time period, detecting an event indicating expiration of the second time period and stopping performance of the at least one second task, such that no matter what priorities are assigned to the first and second task(s), each can perform on the processing unit for some time irrespective of a scheduling algorithm used to determine task performance. This can avoid processor starvation situations that are apparent in conventional task scheduling systems.
In another arrangement, the processing unit controls the operation of a data communications device and the first task is a data transfer task designed to transfer data between ports of the data communications devices and the second task is a lower priority task than the data transfer task. Is such an arrangement, the steps of performing the first task for the first time period, receiving a yield signal from the first task, temporarily disabling performance of the first task, performing the at least one second task for the second time period, detecting expiration of the second time period and stopping performance of the second task(s) and repeating allow data transfer operations performed by the first task to take place without starving operations associated with the second task(s).
In yet another arrangement, the step of performing a first task on the processing unit for the first time period further includes, upon the start of performance of the first task, a step of setting an event to occur after an elapsed performance time equal to Y and performing data processing operations associated with the first task. During the step or operation of performing, a step of detecting an occurrence of the event indicating that the performance time Y has elapsed is provided, thus indicating an end of the first time period, and in response thereto, a step of generating the yield signal from within the first task to indicate that the first task is able to yield performance time to another task or tasks is also provided.
In one arrangement, the step of generating the yield signal is performed from within the first task by calling a yield function which then performs on the processing unit. The yield function, when performing on the processing unit, causes the processing unit to perform the step of temporarily disabling performance of the first task.
In yet another arrangement, the step of receiving a yield signal from the first task is performed by a yield function that performs on the processing unit. The yield function, when performed on the processing unit, causes the processing unit to perform the step of temporarily disabling performance of the first task by performing the steps of (i) setting a yield event to detect an elapsed performance time X corresponding to the end of the second time period and (ii) temporarily disabling performance of the first task until the occurrence of the yield event, and (iii) calling a scheduling task in order to schedule performance of another task or tasks during the second time period until the occurrence of the yield event.
In another arrangement, the step of temporarily disabling performance of the first task further includes the steps of dequeuing the first task from a ready queue and setting a status of the first task to a yield condition and enqueuing the first task to a yield queue where it remains until the occurrence of the yield event. This allows other tasks of other priorities in the ready queue to be selected for performance.
In another arrangement, the step of temporarily disabling performance of the first task temporarily disables performance of all tasks having a priority equivalent to the first task thus removing all tasks having a priority equivalent to the first task from a set of tasks available for performance, such that only tasks having a priority that is not equivalent to the first task remain in the set of tasks selectable by the scheduling tasks for performance during the second time period. This is beneficial, for example, in cases where there may be more than one primary (e.g., first) task.
In yet another arrangement, the step of performing the first task is initiated by a scheduler task performing on the processing unit, and the scheduler task, when performing on the processing unit, causes the processing unit to perform the steps of (i) determining if any tasks are available for performance during a task time period determined according to a task scheduling algorithm. If tasks are available, this arrangement performs the steps of selecting a task for performance according to the task scheduling algorithm such that tasks having the first priority that are available for performance are selected for performance over tasks having the second priority that are available for performance, and then performing the selected task for the task time period determined by the task scheduling algorithm. However, also in this arrangement, if it is determined that no tasks are available for performance in the step of determining, then the arrangement performs the step of detecting if any tasks have been temporarily stopped from performance, and if so, the arrangement enables performance of at least one task that has been stopped from performance for the task time period determined by the task scheduling algorithm. Also, the arrangement in this case detects an end of the task time period used to perform a task according to the task scheduling algorithm and repeats the step of determining and detecting.
In other arrangements of the invention, the end of the first time period as determined by the first task occurs before the end of task time period determined by the task scheduling algorithm and the yield signal generated by the first task causes the at least one second task to be performed during the second time period which elapses at least in part in conjunction with the task time period assigned to the first task by the task scheduling algorithm, such that the first task can yield a portion time within the task time period assigned to the first task, as determined by the task scheduling algorithm, to the at least one (i.e., one or more) second task. In other words, the first time period may be a time period Y which occurs concurrently with a time slice assigned to the first task, which may be a primary task. The second time period may be a time period X, that can also elapse in whole or in part during the time slice assigned to the first task by the task scheduling algorithm. One or more second tasks can be performed during the second time period X.
In yet another arrangement, the step of enabling performance of at least one task removes the task from a yield queue and returns the task to a set of tasks available for performance.
Other arrangements based on the above can include the step of classifying each of the plurality of tasks by assigning a priority to each task, such that tasks having the first priority are preferred for performance in the processing unit over tasks having the second priority. In a related arrangement, the first task is a higher priority task and the at least one second task is a lower priority task (or tasks) and the first time period during which the higher priority task performs before yielding performance to a lower priority task is set to be approximately equal to a response time requirement of the lower priority task(s).
In another arrangement, the first task is a higher priority task and the second task(s) are lower priority task(s) and the second time period during which the higher priority task yields performance to a lower priority task is set to be approximately equal to a response time requirement of the higher priority task.
In other arrangements, the first time period is greater than the second time period such that the first task performs longer than the second task(s). In an alternative to this, an arrangement is provided in which the second time period is greater than the first time period such that the second task(s) perform longer than the first task.
In still other arrangements, if the step of initiating performance of a first task initiates performance of a system critical task, the system critical task is performed to completion and is performed with a highest priority of all tasks in the computerized device. This may be the case, for example, where there are three classes of tasks, critical, primary, and non-critical. In such cases, for example, the first task may be a primary task and the second task(s) may be non-critical tasks. In many device and system configurations, critical tasks such as fault handlers are rarely needed, but when they are, they can perform in such arrangements immediately and can perform to completion. In such an embodiment, a critical task may even interrupt the operation of a second task performing during the second time period X. That is, even though the invention allows a first task to yield performance time to one or more second tasks, depending upon the main scheduling algorithm in use (i.e., the scheduling algorithm that originally selected the first task for performance), a critical task might be needed that can interrupt or block any task from performance, and that critical task may be so important to the system that it is allowed to run to completion.
This system of the invention also provides for a method for performing tasks in a data communications device by dividing processor time between the tasks according to a task scheduling algorithm. This method includes the steps of executing at least one data transfer task. The data transfer task generally performs the steps of processing data through the data communications device until an event occurs indicating a time period Y has elapsed. Thereafter, the method includes detecting, from within the data transfer task, that the event indicating the time period Y has elapsed and providing a yield signal from within the data transfer task to the data communications device, the yield signal indicating a time period X in which at least one other task may perform.
In another arrangement, the data transfer task is executed during a task time period allocated to the data transfer task by the task scheduling algorithm, and wherein the event indicating that the time period Y has elapsed is detected by the data transfer task and occurs within the task time period, and, upon the detection of the event, the data transfer task provides the yield signal to yield at least a portion of processor time that occurs during the task time period to at least one other task. The portion generally is not required to exceed the time period X.
Another arrangement includes the steps of executing an operating system task to perform the steps of receiving the yield signal and temporarily disabling performance of the data transfer task. Then, the method selects at, least one other task besides the data transfer task to perform during the time period X and performing that task or tasks for the time period X. Next, upon detecting expiration of the time period X, the method includes enabling performance of the data transfer task and repeating the step of executing at least one data transfer task.
Besides the aforementioned methods, embodiments of the invention include a data communications device comprising a processing unit, a memory system encoded with a plurality of tasks that can perform in the processing unit, and an interconnection mechanism that couples the memory system and the processing unit. In this configuration, the processor performs a first task in the memory system having a first priority, the first task performing for a first time period determined by the first task and upon expiration of the first time period, the first task generating a yield signal. In response to the yield signal, the processor temporarily disables performance of the first task and selects at least one second task for performance and performs the second task in the memory system having a second priority for a second time period, such that the first task yields performance time in the processing unit to the second task(s) irrespective of priorities and scheduling of the first and second tasks.
In another configuration, the processing unit detects expiration of the second time period, and in response thereto, stops performance of the selected second task(s). The processor repeatedly performs the first task for the first time period and in response to the yield signal generated by the first task, temporarily disables performance of the first task and selects and performs at least one second task (e.g., a lower priority task) for the second time period, and detects expiration of the second time period and stops performance of the selected second task(s), such that the first task can control when one or more other selected second task(s) can perform and can provide time for performance of those second task(s) on the processing unit for the second time period by generation of the yield signal.
In another arrangement of the data communications device, the memory system is encoded with a ready queue and a yield queue. When the first task generates the yield signal, the processor removes the first task from the ready queue and places the first task in the yield queue for the duration of the second time period, and upon expiration of the second time period, the processor removes the first task from the yield queue and places the first task in the ready queue such that the first task can be scheduled for performance.
Yet another arrangement of the data communications device includes ports to receive and transmit data. The ports are coupled to the interconnection mechanism. Also, the first task is a data transfer task responsible for processing data through the data communications device using at least one port. The memory system is encoded with an operating system task which controls scheduling of tasks for performance on the processing unit. During performance of the first task, the first task processes data through the data communications device for a time period Y and detects an event indicating the occurrence of the end of the time period Y and in response thereto, provides the yield signal from within the data transfer task to the operating system task (e.g., to a yielding scheduler or yield function within the operating system). The yield signal indicates a time period X in which the selected second task(s) may perform during a task time period assigned by the operating system task to the data transfer task. In a another example embodiment, the one or more second tasks that can perform during the second time period may perform sequentially during the second time period, and are typically of lower priority than the first task. There may be only one such second task, or there may be more than one.
In another arrangement of the data communications device, during conditions of heavy network traffic when the data transfer task is scheduled to perform frequently by the operating system task to ensure that data is processed through the ports of data communications device, the first and second time periods are set such that no matter how frequently the operating system task attempts to schedule the data transfer task, a task or tasks selected as the second selected task(s) can still perform for a period of time equal to the second time period when the yield signal is received from the data transfer task, thus preventing processor time starvation of the second task(s).
In another embodiment of the invention, a method is provided that prevents a task scheduling algorithm from starving lower priority tasks of processor time. The method includes the steps of performing a task scheduling algorithm to schedule a plurality of tasks having varying priorities in a time sliced manner for performance on a processing unit. During performance of a higher priority task in the processing unit, after a first time period, the method includes allowing the higher priority task to generate a yield signal to yield processing time to at least one lower priority task and also includes the step of receiving the yield signal from the higher priority task. In response thereto, the method includes the steps of temporarily disabling performance of the higher priority task and starting performance of at least one selected lower priority task for a second time period, upon the expiration of which, performance of the at least one lower priority task is, stopped and the task scheduling algorithm selects another task for performance.
In another arrangement, the step of temporarily disabling performance of the higher priority task removes the higher priority task from a set of tasks available for performance and uses the task scheduling algorithm to select another task to perform as the second task from the set of tasks available for performance. During the second time period, more than one second task may be performed.
In another arrangement, the higher priority task is a data transfer task for transferring data in a data communications device, and the first and second time periods are dynamically adjustable based on data traffic loads experienced by the data transfer task.
In yet another arrangement, the first time period during which the higher priority task performs before yielding performance to a lower priority task is set to be approximately equal to a response time requirement of the lower priority task(s). Such a response time requirement, may for example, be the minimum response time requirement of the lower priority task(s).
In still another arrangement, the second time period during which the higher priority task yields performance to a lower priority task is set to be approximately equal to a response time requirement of the higher priority task. Such a response time requirement, may for example, be the minimum response time requirement of the higher priority task(s).
Embodiments of the invention also include computer program products such as disks, or other readable media that have a computer-readable medium including computer program logic encoded thereon for scheduling tasks in a computerized device, such that the computer program logic, when executed on at least one processing unit with the computerized device, causes the at least one processing unit to perform any or all of the aforementioned methods.
In another similar arrangement, a computer program product is provided having a computer-readable medium including computer program logic encoded thereon for performing tasks in a data communications device by dividing processor time between the tasks according to a task scheduling algorithm, such that the computer program logic, when executed on at least one processing unit within the computerized device, causes the at least one processing unit to perform above summarized methods.
The methods and arrangements of the invention are preferably implemented primarily by computer software and hardware mechanisms within a data communications device apparatus. The computer program logic embodiments, which are essentially software, when executed on at least one processing unit with the data communications device, causes the at least one processing unit to perform the techniques outlined above, as well as all operations discussed herein that can be performed by software program(s) executing on computer hardware. In other words, these arrangements of the invention are generally manufactured as a computer program stored on a disk, memory, card, or within a prepackaged operating system or other such media that can be loaded into a computer or data communications device to make the device perform according to the operations of the invention.
The features of the invention, as summarized above, may be employed in data communications devices and other computerized devices and software systems for such devices such as those manufactured by Cisco Systems, Inc. of San Jose, Calif.