FIG. 1A illustrates a conventional software state machine in a parallel processing system. As shown in FIG. 1A, the conventional software state machine may include four states, namely erase 102, suspend erase 104, read 106, and resume erase 108. The software state machine transitions from one state to another state upon certain state transition conditions are met. For example upon observing a first set of transition conditions, the software state machine may transition from erase 102 to suspend erase 104. Similarly, upon observing a second set of transition conditions, the software state machine may transition from suspend erase 104 to read 106. At the state read 106, the software state machine may stay in this state to execute a series of reads, and until the series of reads are completed (which can be a third set of transition conditions), then the software state machine may transition from read 106 to resume erase 108. In state resume erase 108, upon observing a fourth set of transition conditions, the software state machine may return to the state erase 102.
FIG. 1B illustrates an implementation of the conventional software state machine of FIG. 1A in a parallel processing system. In the approach shown in FIG. 1B, a CPU and its associated components are dedicated to implement each software state of FIG. 1A. In this example, CPU1 112 and its associated components can be dedicated to implement the function (i.e. software state) erase 102; CPU2 116 and its associated components can be dedicated to implement the function (i.e. software state) suspend erase 104; CPU3 118 and its associated components can be dedicated to implement the function (i.e. software state) read 106; and CPU4 120 and its associated components can be dedicated to implement the function (i.e. software state) resume erase 108; and so on. The associated components of a CPU, such as CPU1 112, may include a plurality of first-in-first-out random access memories or registers (shown as 113a to 113z) and a plurality of hardware components (shown as 115a to 115z). In addition, the CPUs are communicated with each other through inter-processor communication (IPC) units, such as IPC1, IPC2, IPC3, and IPC4.
There are at least two drawbacks associated with the software state machine shown in FIG. 1A and FIG. 1B. First, although CPUs may operate independent of each other, however, the inter-processor communications among the CPUs, typically through software interrupts, adds inefficiencies to the system and thus adversely impacts the performance of the system. As the number of CPUs increase in the system, the performance benefit of the additional CPUs are less significant as the performance benefits of the additional CPUs are lost in the inefficiencies of inter-processor communications. Second, since each CPU in the software state machine of FIG. 1B is dedicated to implement a particular function, when one CPU is performing a particular function, such as an erase operation or a read operation, the other CPUs are typically idle, which may further introduce inefficiencies to the system.
FIG. 1C illustrates a conventional hardware state machine. Similar to FIG. 1A, the hardware state machine includes states erase 102, suspend erase 104, read 106, and resume read 108. In the conventional hardware state machine shown in FIG. 1C, the hardware states and the transitions among the hardware states are typically implemented with application specific hardware or dedicated CPUs and their associated components. There are at least two drawbacks with the conventional hardware state machine. First, since the implementation of the hardware states and the transitions among the hardware states are fixed in application specific hardware or dedicated CPUs and their associated components, if there is a mistake in the implementation, the entire system needs to be manufactured to include the fixes, which can be extremely costly and may cause months of delay to the development of the system. Second, since the implementation of the hardware states and the transitions among the hardware states are fixed in application specific hardware or dedicated CPUs and their associated components, this implementation prevents the option of adding another state to the hardware state machine, in the event when there is a need to add a new function to the system or to temporarily add a new state to the system for debugging purposes.
FIG. 7 illustrates a conventional arbitration scheme in a parallel processing system. As shown in FIG. 7, the parallel processing system includes a plurality of task queues labeled as 702, 704, to 706. Each task queue may include tasks having priorities in certain priority range. For example, task queue 702 includes tasks having priorities in range A; task queue 704 includes tasks having priorities in range B; and task queue 706 includes tasks having priorities in range C. One arbitration scheme is round-robin, where the arbitrator/controller 708 visits all tasks queues one at a time in sequence, taking a task from the visited task queue for it to access data 710. One drawback with this scheme is that the task queues having high priorities would be visited at the same frequency as the task queues having low priorities, which may adversely impact users' experience of the parallel processing system. Another arbitration scheme is to arbitrate based on the priorities of the task queues. For example, tasks in task queue 702 that have priorities in range A would be served first before tasks in task queue 706 that have priorities in range C (assuming priorities in range A is higher than priorities in range C). In this case, tasks having higher priorities would be serve first, while tasks having lower priorities would have to wait until other higher priority tasks have been served. A drawback with this scheme is that it would lead to pile up of lower priority tasks or there would be an excessive long period of wait time for some of the lower priority tasks, which leads to processors that handles the lower priority tasks being idle, and which in turn compromises the performance of the system.
Therefore, there is a need for methods and systems that address the issues of the conventional arbitration scheme described above. Specifically, there is a need for data flow control in a parallel processing system.