The invention relates to a method and apparatus for storing efficiently data by a memory system of an apparatus having ports and in particular by a memory system of a high speed switching device.
High speed, multi-port memory systems are required in many applications, specifically memory systems with a single read port and multiple write ports where all ports can be operational simultaneously. Examples for these applications, but not limited to, are queuing engines as part of switching systems in the networking area or central processing units CPUs in multi-core environments.
Conventional designs of memory systems mainly use single-port or dual-port memories. The single-port memory can be accessed at one address at a time. Accordingly, one read/write operation can be performed for one memory cell during each clock cycle. The dual-port memory has the ability to simultaneously read and write different memory cells at different addresses. However, the dual-port RAM typically consume a silicon area that is in the order of a factor two larger than the area occupied by a single-port RAM of the same memory size. Multi-port memories can be built by replicating data into few individual single or dual-port memories. However, replication of data in memories requires the usage of bigger memories, hence spending more silicon area and higher power than the case where the data is stored only once.
FIG. 1 shows a conventional switch within a network system wherein the switch has a predetermined number n of ports. Each port of the conventional switch shown in FIG. 1 is bidirectional which means that each port has a dedicated interface to transfer data into the switch and a dedicated interface to transfer data out of the switch as illustrated by FIG. 1.
There are many different conventional architectures to built a switching system as shown in FIG. 1 as illustrated by FIGS. 2 to 4.
FIG. 2 shows a conventional time-division multiplexing TDM switching system of the shared memory. In this conventional arrangement data traffic from all input ports is broken into chunks of data. The data chunks are delivered from the input ports to a shared memory using a Time-Division Multiplexing (TDM) method. Chunks of data are delivered to the output ports from the shared memory by a decision or scheduler block as illustrated in FIG. 2. This scheduler can employ various algorithms such as a round-robin, a weighted-round-robin or a priority algorithm.
A limitation of the conventional arrangement as shown in FIG. 2 is that the shared memory between the multiplexing unit and the de-multiplexing unit as shown in FIG. 2 is required to support a bandwidth which is n “port bandwidth”, wherein n is the number of the ports. This poses limitations on the implementation of the shared memory as it needs to support a very high bandwidth and therefore usually needs to operate at a very high frequency relative to the port rate.
FIG. 3 shows a further conventional arrangement, wherein the switching system as shown in FIG. 1 uses a cross bar. In this conventional arrangement a grid as illustrated in FIG. 3 is provided. By means of the cross bar or grid each input can be connected to each output. Data traffic from all input ports is broken to chunks of data. These chunks are stored in each source port using input queues prior to delivery to the proper output ports. Once a scheduler decides to deliver traffic such as a chunk of data from a specific input port to a specific output port, it opens the exact point in the grid as shown in FIG. 3 to allow this data traffic. The disadvantage of the conventional memory system using a cross bar as illustrated in FIG. 3 is that the scheduler is relatively complex and needs to perform a match between the input ports to the output port to maximize the delivery between inputs and outputs at each switching cycle.
A further conventional arrangement is shown in FIG. 4. FIG. 4 shows a CLOS based switching system which can contain three or more switching levels. Each switching level contains many small switches that can be implemented using any of the above conventional arrangements as shown in FIGS. 2, 3. Conventional CLOS based switching system as shown in FIG. 4 generates other problems such as out of order of traffic delivery to the destination ports. Further the conventional CLOS based switching system is quite complex and difficult to implement.
Most of the single switching devices employ a time-division multiplexing TDM scheme with a shared memory as illustrated in FIG. 2. However, this poses a limitation to the total bandwidth of a single device using this technology as the shared memory is required to support at least double the bandwidth since, at most, all ports are writing to the memory and all ports are reading from the memory at the same time.
As technology is reaching the high frequency barrier in VLSI design or even in PCB boards there is a need for memory systems and methods which allow to meet bandwidth requirements and the amount of required accesses of the memory system without increasing the frequency beyond the technological limit.