As is commonly understood, there is a continuing need to increase the data communication rate. Certain communication systems are equipped with multiple channels between stations, and each station communicates using multiple channels. As a result, the total aggregate throughput is greatly increased as compared to systems that utilize a single channel. A particularly noteworthy example is in current 1000BASE-T and future multiple gigabit Ethernet, where all four twisted copper pairs of the Category 5 unshielded twisted pair (UTP) cable are used for transmission of data.
While multi-channel communication links speed data transfer, such systems suffer from drawbacks. One such drawback is that multi-channel communication links suffer from coupling. In wired communications systems, a major source of interference is reflections of the transmitted signal due to imperfect impedance matching, often due to connectors. In systems which use multiple pairs, such as Ethernet, interference is caused both on the pair a signal is transmitted on (‘echo’) and on the other pairs (near end crosstalk, known as ‘next’). These signals degrade the performance of the receiver, and inhibit operation, particularly when a full duplex link is established over long distances. This is particularly true when the received signal has much lower power than the reflections.
In an ideal communication link, each channel of a multi-channel link would be completely decoupled from the other channels. Thus, each received signal would consist of the desired far end (FE) signal and a small amount of random noise. However, an ideal environment rarely exists, and hence the interference of NEXT and echo invades the signal that is received. Thus, the received signal is largely a combination of the far end signal and unwanted NEXT and echo components. This undesirably limits the detection of the far end signal such that some form of active interference cancellation must be implemented.
Therefore, transceivers employ time domain echo and next cancellers, which are adaptive filters. These structures use their knowledge of the transmitted signal to iteratively update a model of the transfer function so that they can accurately reproduce the echo and next signals, and cancel them at the receiver. At high sampling rates however, the echo response can be many taps long. Interference cancellation is complex, and implementation of interference cancellation in an electronic system requires a large degree of processing capability which for an integrated circuit has implications on area and power requirements.
In this regard, the prior art interference cancellation processing consumes an undesirably large amount of electrical power and generates an undesirable amount of heat. These factors lead to increase cost of ownership for products that incorporate prior art interference cancellation systems.
One approach to improve efficiency and reduce power consumption and complexity is to use a frequency domain canceller. Typically a set of transmit samples are collected in a block, the block is transformed into the frequency domain, and then filtering is applied in the frequency domain. Finally the data is transformed back into the time domain and used for cancellation. This reduces the number of multiplies needed, which can result in significant power reduction, particularly for a hardware implementation.
The disadvantage of this approach is that the block operations introduce substantial latency into the canceller. Excessive latency can limit the application of communications systems; end-users may prefer other technologies. For example, for applications such as scientific computing, excessive latency lowers the performance of ‘clusters’ of high performance computers.
A second issue with the application of the frequency domain approach is that well-known FFT structures are most efficient for complex signals. However, directly using complex Cooley-Tukey based transforms for real signals results in substantial inefficiencies. In a hardware implementation this translates into increased power consumption, which is undesirable. One way to efficiently use a complex transform engine for real signals is through the use of a ‘real adjust’ operation. This enables an N-point real FFT to be calculated using an N/2-point complex FFT and N additional complex multiplications, as well as some low-cost addition operations. This technique is described in 12.3 FFT of Real Functions, Sine and Cosine Transforms, pages 510-520, Numerical Computing in C: The Art of Scientific Computing, Second Edition, William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, 1992. Techniques such as the real adjust are often used in software implementations, and they can also be used in hardware. Typically in hardware systems the FFT is implemented using a pipelined architecture, to maximize power efficiency and minimize latency. Use of this structure places restrictions on the output ordering. These efficient implementations of FFT algorithms produce outputs in an order which does not match that required by the real adjust algorithm. This breaks the pipeline structure of the datapath, and requires that additional buffering be introduced to line up the data for the real adjust process. This adds significant latency and power to the canceller. Furthermore, during the inverse transform an inverse real adjust operation must be performed, which consumes further power and increases latency for the same reasons.
Other known operations or transforms used to implement real or complex FFTs include the Cooley-Tukey algorithm, split-radix transforms, real-split-radix transforms, Winograd transforms, Prime-Factor transforms, the Bruun algorithm, Rader's algorithm, Bluestein's algorithm but these operations do not overcome the drawbacks of the prior art.
Latency can be reduced by using a smaller block size in the frequency domain transform. Upon implementation however, this increases power, as the block operation (transform-multiply-inverse transform) must then be performed more frequently. This is particularly a concern for very long filters. Another proposed solution to reduce latency is to ‘parallelize’ the transform and/or multiplications by using more physical circuitry to perform the calculation in parallel, but this introduces complications in the implementation and can lead to inefficiencies and increased power consumption. Furthermore in a hardware implementation parallelization increases area which increases power loss through leakage even when the circuit is not active.