A packet processing apparatus such as switches and routers has been used. The packet processing apparatus generally includes a plurality of interface cards that input and output packets, switch cards that control packet input and output between respective interface cards, and a control card that manages an apparatus status based on control information. The interface cards receive optical signals via optical fibers from an external device. The optical signals are converted into electrical signals by an optical module and then inputted into a device having a function to perform physical/media access control (PHY/MAC) processing. This device extracts packets from the electrical signals and inputs them into a subsequent packet processing circuit. Examples of the packet processing circuit include a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), and a network processing unit (NPU).
The packets that have reached an interface card on the packet receiving side are subjected to flow control by a policer, so that their input rate is limited to a specified rate or lower. The packets which passed the receiving-side interface card travel through a switch card and then reach an interface card on the packet transmitting side. The transmitting-side interface card performs copy processing, and then executes quality of service (QoS) processing. In the QoS processing, the packets are subjected to bandwidth control processing (shaping) and/or priority control processing based on contracts. In the priority control processing, voice packets are outputted in priority to Web packets, for example.
Hereinafter, the packet control in the QoS processing (hereinafter referred to as “QoS control”) will be described in detail with reference to FIGS. 20 and 21. FIG. 20 is a diagram illustrating input and output orders changed by the QoS control. As illustrated in FIG. 20, the packets inputted in order of alphabets A to G are divided into flows F0 to Fn (n being natural numbers) per QoS control unit (hereinafter referred to as “flow”), and queued into the packet processing circuit. Each of the queued packets is subjected to priority control processing (such as Strict Priority (SP)) by a scheduler SC. Packets are outputted in sequence from the flow selected based on a scheduling result. In the example illustrated in FIG. 20, the priority is set in an ascending order of flow IDs. Accordingly, packets of alphabets A, C, E, and G are outputted from the flow F0 first. Next, a packet of the alphabet F is outputted from the flow F1. Next, a packet of the alphabet D is outputted from the flow F2. Then, a packet of the alphabet B is outputted from the flow Fn. In this manner, the scheduler SC changes the order of the packets from an input order (order of alphabets A to G) to an output order (order of alphabets A, C, E, G, F, D, and B), and outputs the packets in the output order.
However, packets have various packet lengths, and some packets have an information amount as large as about 10 K bytes. Accordingly, it is inefficient for the packet processing apparatus to queue the packets themselves inside the packet processing circuit. Therefore, the packet processing apparatus generally queues into the packet processing circuit only the minimum information (for example, a flow ID, a packet length, and a buffer address) for use in the QoS control as a packet pointer (hereinafter simply referred to as “pointer”). In this case, the packets are stored in a large-capacity dynamic random access memory (DRAM) constituted separately from the packet processing circuit. The buffer address is information indicative of a storage location of a packet inside the DRAM that is used as a packet buffer. For example, the buffer address includes an ID of a cell array (hereinafter referred to as “bank”) in the DRAM and an address in each bank.
FIG. 21 is a diagram illustrating processing to write and read packets by using pointers. As illustrated in FIG. 21, in QoS control, the packet processing apparatus queues not the packets themselves but only their pointers, and then reads out the pointers in the ascending order from the flow F0 selected by the scheduler SC based on the QoS scheduling result. Further, a read unit reads the packets stored in the addresses based on the buffer addresses indicated by the pointers read out by the scheduler SC.
Patent Document 1: Japanese Laid-open Patent Publication No. 2010-88102 and Patent Document 2: Japanese Laid-open Patent Publication No. 2009-157680 are introduced as the Rerated Art Documents.
It is possible to produce DRAMs with larger capacity and higher speed by forming a plurality of banks therein. However, because of the structure of the DRAMs, successive access to the same bank is made at a specified interval called random cycle time (tRC). Accordingly, it is desirable to execute sequential access to the banks as much as possible so as to demonstrate the maximum access speed of the DRAMs. Hereinafter, such access is referred to as a bank sequential access. While the tRC varies depending on operating frequencies, grades, and types of the DRAMs, the tRC is about ten clk (clock) cycles.
For example, assume the case where a DRAM includes eight banks and tRC=10 clk as illustrated in FIG. 21. In this case, if the packet processing circuit can read or write one pkt (packet) per access, and all the different banks in the DRAM are accessed during ten clk cycles, write or read access of 8 pkt/10 clk can be executed at the maximum. However, when the packet processing circuit successively accesses a plurality of packets stored in the same bank (for example, the bank B0 in FIG. 21) in the DRAM, it is difficult to execute write or read access of about 1 pkt/10 clk since there is a limitation of the above-stated latency time of tRC(=10 clk). Hereinafter, the situation wherein the same bank is successively accessed during the tRC is referred to as “bank conflict.”
Here, assume a 100-gigabit Ethernet (registered trademark) for example. In this case, when packets have a shortest length of 64 bytes, the packet processing circuit theoretically needs to process the packets with the packet processing performance of about 150 M pkt/s. Accordingly, when the packet processing apparatus uses a DRAM having an operating frequency of, for example, 300 MHz (number of banks=8, tRC=10 clk), the packet processing circuit needs to process the packets with the processing performance of 1 pkt/2 clk, i.e., 5 pkt/10 clk. Therefore, if the packet processing circuit can uniformly access all the banks as in the bank sequential access, the performance requirement of 5 pkt/10 clk can be fulfilled. However, when the banks are not uniformly accessed, bank arbitration is performed.
FIG. 22 is a diagram illustrating the bank arbitration performed at the time of writing and reading the packets. As illustrated in FIG. 22, when a bank conflict occurs in the bank B1, it is difficult for the packet processing circuit to access the bank B1 and then access again the same bank B1 in a predetermined period of tRC. Accordingly, the packet processing circuit momentarily waits (waits for bank arbitration) in a FIFO (First In First Out) queue, and then access the bank B1 anew. This creates a delay equal to the tRC(=10 clk) between the packets. As a result, the performance of the packet processing apparatus is degraded to less than the above-stated performance requirement (for example, about 1 pkt/10 clk).
The performance requirement of 5 pkt/10 clk can be fulfilled at an access speed of 1 bank/10 clk if the DRAM which operates on a frequency (for example, 1.5 GHz) about five times as large as the current frequency is mounted on the packet processing apparatus. However, this is not feasible. Or an alternative way to avoid the bank conflict is to mount a static random access memory (SRAM) that is a single array memory on the packet processing apparatus as a memory other than the DRAM. However, the SRAM is smaller in capacity than the DRAM, and therefore it is difficult to cover the entire packet buffer with the SRAM in actuality. Mounting a plurality of SRAMs on the packet processing apparatus is not feasible because costs, power consumption, and the number of input-output (IO) pins are increased accordingly.
Accordingly, the packet processing circuit writes the packets to the DRAM in a packet input order in the bank sequential manner, so that the bank conflict at the time of writing is avoidable. FIG. 23 is a diagram illustrating packet processing executable without bank arbitration on the writing side. In FIG. 23, numeric characters in the pointers and packets represent bank IDs. As described in the foregoing, the bank conflict can occur at the time of both packet writing and reading. As illustrated in FIG. 23, the bank conflict at the time of writing is avoidable if the packet processing circuit performs bank-sequential access simply in the input order without taking the flow type into consideration. Therefore, the bank arbitration on the writing side can be omitted.
Contrary to this, the packet output order on the reading side is dependent on the QoS scheduling result. Therefore, bank-sequential access is not performed in some cases and there is a high possibility that the bank conflict occurs. Accordingly, bank arbitration is executed on the reading side as necessary. For example, as illustrated in FIG. 23, when four packets of the flow F0 are uninterruptedly stored in the same bank B0, the scheduler SC reads these four packets from the flow F0 in succession. As a result, the bank conflict occurs three times, and a delay of the tRC(=10 clk) between read packets is generated three times. As a result, the performance of the packet processing apparatus is degraded to less than the above-stated performance requirement (for example, about 1 pkt/10 clk).