Microprocessors, as is well-known in the art, are integrated circuit (IC) devices that are enabled to execute code sequences which may be generalized as software. In the execution most microprocessors are capable of both logic and arithmetic operations, and typically modern microprocessors have on-chip resources (functional units) for such processing.
Microprocessors in their execution of software strings typically operate on data that is stored in memory. This data needs to be brought into the memory before the processing is done, and sometimes needs to be sent out to a device that needs it after its processing.
There are in the state-of-the-art two well-known mechanisms to bring data into the memory and send it out to a device when necessary. One mechanism is loading and storing the data through a sequence of Input/Output (I/O) instructions. The other is through a direct-memory access device (DMA).
In the case of a sequence of I/O instructions, the processor spends significant resources in explicitly moving data in and out of the memory. In the case of a DMA system, the processor programs an external hardware circuitry to perform the data transferring. The DMA circuitry performs all of the required memory accesses to perform the data transfer to and from the memory, and sends an acknowledgement to the processor when the transfer is completed.
In both cases of memory management in the art the processor has to explicitly perform the management of the memory, that is, to decide whether the desired data structure fits into the available memory space or does not, and where in the memory to store the data. To make such decisions the processor needs to keep track of the regions of memory wherein useful data is stored, and regions that are free (available for data storage). Once that data is processed, and sent out to another device or location, the region of memory formerly associated with the data is free to be used again by new data to be brought into memory. If a data structure fits into the available memory, the processor needs to decide where the data structure will be stored. Also, depending on the requirements of the processing, the data structure can be stored either consecutively, in which case the data structure must occupy one of the empty regions of memory; or non-consecutively, wherein the data structure may be partitioned into pieces, and the pieces are then stored into two or more empty regions of memory.
An advantage of consecutively storing a data structure into memory is that the accessing of this data becomes easier, since only a pointer to the beginning of the data is needed to access all the data.
When data is not consecutively stored into the memory, access to the data becomes more difficult because the processor needs to determine the explicit locations of the specific bytes it needs. This can be done either in software (i.e. the processor will spend its resources to do this task) or in hardware (using a special circuitry). A drawback of consecutively storing the data into memory is that memory fragmentation occurs. Memory fragmentation happens when the available chunks of memory are smaller than the data structure that needs to be stored, but the addition of the space of the available chunks is larger than the space needed by the data structure. Thus, even though enough space exists in the memory to store the data structure, it cannot be consecutively stored.
In the provisional patent application listed as one of the references in the Cross-Reference to Related Documents above, there are descriptions and drawings for a preferred architecture for a dynamic multi-streaming processor (DMS) for, among other tasks, packet processing. One of the functional areas in that architecture is a queue and related methods and circuitry, comprising a queuing system. The dynamic queuing system and its related components are described in priority patent application Ser. No. 09/737,375. In priority application Ser. No. 09/737,375, a novel packet management unit (PMU) is described that offloads a processing core (termed a streaming processor unit, or SPU) from having to upload packets into or download packets from memory, as well as relieving the unit of some other functions such as memory allocation. This is accomplished by providing a local packet memory (LPM) that is hardware-controlled, wherein data packets that fit therein are uploaded and downloaded by a hardware mechanism.
A background memory manager (BMM) for managing a memory in a data processing system is known to the inventor. The memory manager has circuitry for transferring data to and from an outside device and to and from a memory, a memory state map associated with the memory, and a communication link to a processor. The BMM manages the memory, determining if each data structure fits into the memory, deciding exactly where to place the data structure in memory, performing all data transfers between the outside device and the memory, maintaining the memory state map according to memory transactions made, and informing the processor of new data and its location. In preferred embodiments the BMM, in the process of storing data structures into the memory provides an identifier for each structure to the processor. The system is particularly applicable to Internet packet processing in packet routers.
Because software-managed memory is costly in terms of developing instructions to figure out which portions of memory within a memory block are free and which are available, a hardware mechanism such as the one described with reference to Ser. No. 09/602,279 enables more efficiency and therefore, cost savings. However, in order to optimize the function of such a hardware controller, a process must be provided to enable integrated and optimum function between hardware control and software control of memory. One of the preferred areas of use for such innovation is in the area of packet processing in data routing over networks.
A system described with reference to Ser. No. 09/881,934 is known to the inventor for allocating storage space for incoming data packets into a memory (LPM) of a packet processor. The system is implemented in hardware and has a number of capabilities for pre-configuring a LPM with atomic and virtual memory blocks or pages, which are allocated for packet storage.
One of the components of the system described above ascertains packet size of incoming data packets and determines whether or not they fit into LPM. This is accomplished by checking allocation state for virtual pages of a smallest size that is equal to or larger than the packet size, checking the allocation state for next larger virtual page, and so on, until an available (not used or allocated) virtual page is found of a size that accommodates the next data packet.
In this process, software is notified of the packet's presence and is provided the correct information for “core processing” of packet information. In some cases, however, data packets may arrive at a DMS processor of a size that do not fit in LPM under configured restrictions, and must therefore either be dropped, delayed until LPM space is available, or uploaded into some other memory. Causing a data packet to wait until LPM has a storage block of a size to store the data packet is not a desirable option because of priority concerns in that higher priority packets may be held up behind a packet waiting for LPM storage. Dropping data packets that do not fit is an option, but not the most desirable option, as many dropped packets could result in a negative effect on particular packet flow.
What is clearly needed is an efficient method of diverting or overflowing data packets that do not fit into LPM into a software-controlled memory. Such a method would enable efficient processing while minimizing problems in packet management and accounting.