Prior art wireless communication systems are defined in IEEE protocols 802.11 and its various derivatives 802.11a, 802.11b, and 802.11m. In a typical wireless communications system, an RF signal is heterodyned to an intermediate frequency and signal processing occurs to generate a stream of data forming a frame, and a device which performs this processing is known as the physical layer device (PHY) in the OSI layer definitions. The PHY acts as an interface between the RF signal and the stream of unframed data moving to the media access controller (MAC). The media access controller (MAC) layer receives unframed data and separates header information and CRC information to perform data integrity checking, producing a data stream to a host interface, where such data may be moved via a FIFO interface, or into a packet buffer whereby data is held in structures which contain pointers to the next data structure, as is typical for PCI host adapters. In a prior art system, the signal processing from an antenna to the packet memory may be called a wireless host adapter, and each processing stage of the host adapter requires specialized circuitry for the performance of each specific function. If it is desired to allow multiple simultaneous wireless sessions, which requires the user have more than one wireless host adapter, then each host adapter contains its own circuitry, which performs the required PHY and MAC functions independently from any other host adapter. Each host adapter carries one wireless session, and consumes a particular amount of space and power, and each additional host adapter linearly increases the requirement for space and power. Additionally, there are several different protocols for wireless LANs, and other protocols are under development. Presently, each protocol may require its own host adapter which operates for that particular protocol only.
In a wireless communications system, there are often two types of processors used: a micro-controller for handling data movement to and from a host adapter memory, and a DSP to handle signal processing calculations done on incoming signals. Compared to prior art uses of micro-controllers and DSPs, the bandwidths involved in wireless communications are lower, however most modern micro-controllers and DSPs have a surplus of bandwidth available, which translates into higher power dissipation. The higher power dissipation and inseparability of the DSP function and IO function results in both types of processors being used in a typical systems, which also contributes to higher power dissipation and shorter battery.
In addition to the need for a hybrid DSP and micro-controller, there is also the need to be able to separate processing of two channels into fixed-bandwidth processing threads. In the current art of multi-tasking real-time operating systems, multiple instances of a program are executed using separate storage contexts and a Real-Time Operating System (RTOS) which allocates a certain amount of time to each task. The overhead of an RTOS is fairly high, and context switching from one task to another takes hundreds to thousands of processor clock cycles. Because of the high overhead of context switching and the requirement of guaranteed processing bandwidth in a digital signal processor, real-time operating systems with task switching are not implemented in current DSP processors, since the processing needs to be done in something much closer to real-time and without one task blocking the others. Currently, RTOS task switching is accomplished by buffering data after the task of interest is switched out of context, which means switching to an inactive state either in memory or some form of storage for recovery when the task is switched back in context at some time in the future. For this reason, a typical DSP is typically a single context performing computations, and a micro-controller handling IO uses an RTOS, and does task switching.
It is desired to enable a finer grained context switching which is optimized for the needs of a small plurality of channels of wireless communications links. Each of these links requires processing tasks of performing DSP calculations on incoming data and moving data from one network layer to the next.
FIG. 1 shows a prior art pipelined processor 10. Each stage performs an operation in a single stage clock cycle, although the clocks within a single stage may operate at higher rates than the stage clock. The stages are separated by registered boundaries shown as dashed lines, such that anything crossing a dashed line in FIG. 1 is fed through a clocked register such as a D flip flop on each clock cycle. As known to one skilled in the art, data is generally available from one stage to the next on each clock cycle, unless a condition known as “stall” occurs. In a stall condition, for example when accessing slow external memory 42, the entire pipeline receives a stall signal 46 and remains in this state until data becomes available from external memory before resuming movement of data across stage boundaries. The interval of time spent waiting for external memory to become available is known as “pipeline stall time”. When a pipeline stall condition occurs, all data processing comes to a halt until the stall condition is cleared, as indicated by the stall indicator signal 42.
In the prior art processor 10 of FIG. 1, a program counter 12 provides a memory address to a Fetch Address Stage 14, which passes along the address to a Program Memory 18 via an address buss 16. A data buss 20 returns the program data on the next stage clock to the Program Access Stage 22. The Decode stage 28 separates the data returned from program access 22 into opcodes and data, where the opcode comprises a specific instruction to perform a particular operation using either registers 24, immediate data associated with the opcode, or data memory 40. The Decode stage 28 may determine that a data value accompanying the opcode is to be loaded into a particular register location 24, or the contents of a particular register is to be rotated, etc. The decoded operation is passed to a first execution stage EX1, which may include some multiplier operations, and to a second execution stage EX2, which contains an arithmetic logic unit (ALU) 36 for performing arithmetic operations such as add, subtract, rotate, and other functions known to one in the art of processor design. Data memory 40 which is to be written or read is accessed by providing an address, and the returned data is recovered by memory access stage 38. Memory Access stage 38 is also responsible for reading and writing external shared memory, which is typically much slower than data memory 40 or register memory 26. The Write Back stage 44 writes data back to the register controller 26.
The prior art processor of FIG. 1 performs many functions well. However, any stall condition which may occur, for example, when data is read from external memory 42, causes stoppage of the entire data path through assertion of the Stall signal 46 which indicates to all pipeline stages to stop forwarding information until the stall condition is cleared. For time-sensitive calculations, this stall condition can be catastrophic. It is desired to provide an architecture which allows more than one thread to simultaneously proceed through the core pipeline during a stall condition.