In the field of communications, voice service such as, but not limited to the Plain Old Telephone Service (POTS), audio conferencing, facsimile, video conferencing, etc. are being provisioned over a redundant circuit-switched infrastructure which provides dedicated redundant end-to-end connections. The benefits enjoyed in employing circuit-switched technologies include a high quality-of-service provisioning at a guaranteed bandwidth enabled via the dedicated end-to-end connectivity provided. Circuit-switched technologies suffer from an inefficient use of the available bandwidth and high costs associated with the development, deployment, and maintenance of the redundant circuit-switched infrastructure.
Take for example the provisioning of the ubiquitous POTS service to deliver digitized human voice between end-stations in a circuit-switched communications network known as the Public Switched Telephone Network (PSTN). Human voice is sampled at 8 kHz every 125 μs. Each derived voice sample has 8 bits and thus a 64 kb bandwidth dedicated connection is established end-to-end. However, talking human voice is characterized by variable sound time slots and silent pauses. The activity factor of human voice is 0.4 and therefore 60% of the guaranteed bandwidth reserved for each telephone session, is unused.
Solutions have been proposed and implemented in accordance with which, multiple telephone conversations are multiplexed together over the same transmission medium to take advantage of the 60% unused bandwidth. However, these solutions only have a limited success, as only talking human voice has a 0.4 activity factor; singing human voice, facsimile transmissions, video conferencing, etc. have higher activity factors.
In the field of communications, data services have been provisioned over a packet-switched infrastructure which provides best-effort packetized payload transport. Packetized data payloads are only transmitted when generated. Packets include station addressing information. Communications network nodes constituent of the packet-switched infrastructure, route the packets to the intended destination at run-time. The run-time routing decisions are dependent on the operational status of the packet-switched infrastructure encountered in transit. The run-time routed transport of packetized payloads provides bandwidth utilization efficiencies over an economical packet-switched infrastructure.
Given the above, there is pressure to provision voice services over a packet-switched infrastructure. Intense research and development is currently underway towards this end, with exemplary solutions known as Circuit Emulation Services and Packet-Voice solutions. Voice over Internet Protocol (VoIP) Packet-Voice implementations address the generation/play back of voice sample payloads, voice sample payload encapsulation/decapsulation, etc.
The actual transport of VoIP packets in a packet-switched communications network is handled by the packet-switched infrastructure in accordance with a best-effort transport discipline. The best-effort packet transport reduces the need for the deployment of a fully redundant infrastructure achieving reduced comparative cost, while the run-time routing of packets introduces packet processing delays which reduces the quality-of-service delivered. Solutions are being sought for minimizing the negative effects of induced packet processing delays associated with best-effort packet transport to achieve close to real-time conveyance of VoIP packets.
Whether a packet processing function is implemented in hardware or software is always a difficult design choice. Software packet processing implementations benefit from a relatively easy development, fast deployment, and easy maintenance while introducing an uncertainty in the timeliness of the run-time response provided. Hardware packet processing implementations benefit from certainty in the timeliness of the response provided, while hardware solutions tend to provide very specific solutions to particular problems solved and thus lack generality. Large efforts are being undertaken to achieve real-time packet processing.
At packet-switching communications network nodes in a transport path of a conveyed packet, packet classification, switching, and routing decision-making in packet processing involves using extracted packet header field values as a query key in consulting a look-up table to ultimately determine a corresponding switching/routing response. Servicing such a query in software can be a very involved procedure typically performed over a large number of system clock cycles.
In provisioning high density packet-voice solutions, there is pressure for run-time packet classification, switching and routing packet processing functionality, typically implemented in software executed by a packet-switching network node, to migrate from software implementations to hardware implementations seeking benefit of predefined response times ultimately to be processed in real-time. Real-time packet processing refers to performing functions related to handling a received packet at a network node with a maximum processing delay incurred such that the aggregate rate of out going processed packets at least equals the aggregate rate of incoming packets. This real-time packet processing requirement is referred to as “processing packets at wire-speed”, where wire-speed relates to the throughput supported on links connected to a communications network node.
A co-pending commonly assigned U.S. patent application Ser. No. 10/033,498 entitled “Generic Header Parser Providing Support for Data Transport Protocol Independent Packet Voice Solutions” filed on Dec. 27, 2001, and incorporated herein by reference, describes methods and apparatus for configurable packet header field value hardware extraction at wire-speed in real-time minimizing the introduction of packet processing latencies.
Using extracted packet header information, recent prior art hardware assisted solutions, make use of a Content Addressable Memory (CAM) 100, schematically shown in FIG. 1, to implement the look-up table for determining a packet treatment discipline to be used in processing each received packet. Generically packet treatment includes, but is not limited to: packet traffic statistics generation, packet traffic shaping, billing, connection access control enforcement, etc. which control packet routing, packet switching, and packet forwarding.
With respect to the implementation of VoIP solutions, determining packet treatment is dependent on a determination of a context identifier (context ID) associated with each received packet. A context, depending on a particular implementation, may refer to: a single point-to-point telephone connection (an application level concept) provisioned using VoIP technologies, a multipoint-to-multipoint audio/video conference provisioned using VoIP technologies, convergent service wherein a multimedia connection/conference simultaneously conveying audio, video, slide show, ticker data, etc.
As a result of processing of a VoIP packet, at least one interface of the network node is determined, based on the VoIP context, to forward at least the voice sample payload over. Forwarding details and disciplines are beyond the scope of the present description and described elsewhere.
In general the CAM 100 employs a table 110 storing, in its entries 112, matching rule bitmasks to be applied, for bit comparison, against a matching key 114 generated from extracted packet header field values. A comparison operation (120) is implemented in hardware using a comparator 122 for each table entry 112. Extracted packet header field values, such as, but not limited to, VoIP flow identifiers and station addressing information, are bit level compared, in performing the comparison operation 120, with the table entries 112 to determine a matching rule which in a VoIP implementation may includes determining a communication session context ID.
Legacy content addressable memory, employed in data switching or router applications using Internet Protocol (IP) Ethernet packets, is used to compare the matching key 114 to all the entries 112 of the table 110, in parallel, using all comparators 122 for a complete match of every matching key 114 bit. A rule is typically codified in a corresponding rule entry 112 for each provisioned connection. Results 124 of the comparison 120 from each comparator 122 is provided to a rule decoder block 130 which typically provides two outputs. The first output is a match result output 132 which signals whether a matching rule/context ID was found. The second output is a rule identifier/context ID output 134.
A variation of content addressable memory, known as Ternary CAM (TCAM), is presented in FIG. 2. TCAMs 200 are used to implement rule matching wherein each bit of the rule entry 112 has three states: ‘0’, ‘1’ or ‘X’. The ‘X’ bits are not taken into account during comparison. In the exemplary implementation presented, bit masking techniques 240 are used: for each rule entry 112, results 224 of bitwise comparisons 120 from the comparators 222 are provided to a corresponding masking block 242. Masking bitmasks are provided 244 from a bitmask table 250 having bitmask entries 252 corresponding to rule entries 112. Only an unmasked subset of comparison results 246 are provided to the rule decoder block 230.
Processing delays introduced in determining a packet treatment discipline such as, determining a context ID in hardware, are dependent on the implementation of the CAM/TCAM 100/200 itself. The intended goal is to determine the match result 132 and the rule ID/context ID 134 preferably in one comparison clock cycle. Then, to shorten that comparison clock cycle as much as possible for a given hardware implementation to achieve high processing speeds. For these reasons, N comparators 122/222 are used for N rule entries 112. Each rule entry 112, shown in FIG. 1 and FIG. 2, is K bits wide and therefore the comparators 112/222 are K bits wide also. The K×N simultaneous comparisons require a large amount of hardware logic which draws a lot of current to operate at high speeds.
Implementations include CAMs/TCAMs 100/200 custom made for each application. As the number of rule entries 112 in the rule table 110 grows, the use of custom-made CAMs/TCAMs 100/200 becomes very expensive due to a prohibitive large physical size (as CAMs/TCAMs are implemented from electronic components the term “size” used here refers to the number of electronic components) the power consumption also becomes very high.
In accordance with a first typical approach of implementing a CAM/TCAM 100/200, a dedicated custom-made CAM/T-CAM is implemented as a stand-alone integrated circuit or as a stand-alone block integrated circuit sub-block. The necessary electronic components are formed on a silicon substrate of an integrated circuit. The implementation has a fixed number N of rule entries each K bits wide exactly. In accordance with this approach, a new CAM/TCAM chip design is required for each implementation which is a very expensive solution which incurs long time-to-market delays.
In using custom-made CAM/TCAM chips already available on the market, the design of a particular solution has to be adapted to fit the available custom-made CAM/TCAM chip leading either to inefficient solutions and/or high implementation costs if it is at all possible to adapt the solution.
In accordance with a second typical approach to implement the functionality of CAMs/TCAMs, is to use discrete components such as compiled Random Access Memory (RAM) and standard logic cells to build a CAM/TCAM 100/200. The high expense associated with designing custom-made integrated circuit CAMs/TCAMs can be avoided, and to some extent the CAM/TCAM design can be kept comparatively more flexible without incurring long time-to-market delays. However, this second approach can only be applied to small to medium sized CAM/TCAM implementations in order to keep the size practical and small, and the processing speed practical and high for target applications. Employing discrete components in a CAM/TCAM implementation becomes troublesome as signal propagation timing and synchronization become harder and harder to guarantee at high processing speeds as CAM/TCAM sizes increase.
Integrated circuit component level advances in CAM design includes prior art United States Patent Aapplication 2002/0039303 entitled “CAM Cell Circuit Having Design Circuit” which was published on Apr. 4, 2002. Hayakawa et al. describes integrated circuit component level design techniques to connect integrated circuit transistor components in order to reduce the number of the integrated circuit components and therefore the integrated circuit area. A speed-up benefit may be enjoyed in the comparison operation. This proposed solution does not address limitations imposed on the number N of, and the width K of, rule table entries and matching key lengths directly, although compact designs presumably could enable increased numbers of parallel rule entry comparisons to be performed in the same area assuming signal timing synchronization can be maintained.
Another integrated circuit component level advance in CAM design includes prior art United States Patent Application 2002/0036912 entitled “Content Addressable Memory (CAM) for Data Lookup in Data Processing Unit” which was published on Mar. 28, 2002. Helwig describes a transistor-level integrated circuit design reducing the number of comparator output circuit nodes which switch states. Power savings are benefited from when a smaller number of circuit nodes change potentials. This proposed solution does not address limitations imposed on the number N of, and the width K of, rule table entries and matching key lengths directly, although reduced state switching presumably could enable increased numbers of parallel rule entry comparisons to be performed at a lower comparative rate of power consumption.
A further integrated circuit component level advance in CAM design includes prior art U.S. Pat. No. 6,373,738 entitled “Low Power CAM Match Line Circuit” which issued on Apr. 16, 2002 to Towler et al. which addresses potential rise times in circuit node switching states. Towler, describes a clever transistor-level circuit design technique for careful timing control in turning-on and turning-off of the match line. Leakage current is reduced when the match line goes through the turn-on-to-turn-off state transition or from the turn-off-to-turn-on state transistor achieving a reduced power consumption. The proposed solution however is dependent on very tightly controlled timing to be performed within the duration of a comparison clock cycle. This proposed solution does not address limitations imposed on the number N of, and the width K of, rule table entries and matching key lengths directly, although reducing leakage currents presumably could enable a larger number of parallel rule entry comparisons to be performed at a lower comparative rate of power consumption. It is not clear whether a speed-up can also be enjoyed given the tight timing control.
A further integrated circuit component level advance in CAM design includes prior art U.S. Pat. No. 6,438,674 entitled “Hash CAM Having Reduced Size Memory Array and its Application” which issued on Aug. 20, 2002 to Perloff. Perloff describes an innovative hash algorithm used to enable larger matching key and rule entry bit widths to be compared using a CAM having a reduced comparative size to what necessarily would otherwise be needed. In implementing the proposed solution, the rule entries 112 and the matching key 114 must be such that a hash function for which the n-bit index for each 2^n m-bit inputs having m-n common bits is always unique. This proposed solution does address limitations imposed on the number of, and the width of, rule table entries and matching key lengths directly, however the proposed solution only provides advantages if the packet header field values have a particular structure.
Yet another prior art United States Patent Application 2002/0126672 is entitled “Method and Apparatus for a Flexible and Reconfigurable Packet Classifier using Content Addressable Memory” which was published on Sep. 12, 2002. Chow et al. describe a ‘Reconfigurable Buffet Selector/Parser’ circuit used to dynamically form rules by removing the fields not required for comparison prior to storing rules in rule entries 112. Discarding field values, reduces the necessary width of rule entries 112 to reduce size and cost. This proposed solution does address limitations imposed on the number N of, and the width K of, rule table entries and matching key lengths directly, however the proposed solution requires additional hardware logic to implement the field value filtering in real time while incurring a corresponding development cost. The proposed solution may provide benefits in an environment where a large number of changes to the rule table are necessary at run-time per packet throughput expected to be supported. The additional hardware logic increases both the number of components and latency as the rule determination, rule writing, and comparison need to be performed sequentially.
There therefore is a need to solve the above mentioned issues.