A number of difficult issues may arise in the design of hardware-based Z-domain transfer functions for digital signal processing. Presently, there is a tight coupling between the higher-level transfer function design and the low-level (logic and physical) design. Specifically, the approach taken to carrying out a transfer function has previously been strongly influenced by the target implementation technology and its associated function libraries.
A companion issue is the preparation of the “test bench”—the suite of simulation vectors and modules (frequently implemented in either VHDL or the C programming language) that apply stimuli, compare the simulated and expected results, and report differences. If there is a tight coupling between the transfer function design and the target implementation technology, then major portions of the test bench may need to be redone if the target technology changes. This can be a major undertaking.
Popular implementation technologies include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), as well as semi-custom and custom approaches. Because of the economics of physical design, tooling costs, and manufacturing costs, the implementation technology is often chosen based on anticipated production volumes. As production volumes increase or revised production estimates are made, and for many other technical or business reasons, a shift to a different implementation technology may become desirable. However, the approach taken to the logic design of a desired transfer function is constrained in different ways by the vagaries of each particular technology. This can make retargeting the transfer functions from one technology to another a significant effort.
Consider the design of a feedback system 1000, canonically modeled in FIG. 1. R(z) 50 is the reference signal, A(z) 100 is the open loop transfer function, C(z) 60 is the control signal (the output in this case), and summer 200 compares R(z) and C(z) to generate the input 55 to A(z). T(z), the closed loop transfer function is well known as:             T      ⁡              (        z        )              ≡                  C        ⁡                  (          z          )                            R        ⁡                  (          z          )                      =            A      ⁡              (        z        )                    1      +              A        ⁡                  (          z          )                    
In general, the implementation of A(z) may be complex and require more than one hardware clock cycle to compute. In fact, in order to achieve system throughput requirements, A(z) may need to be very deeply pipelined. The hardware clock cycles may be chosen to be at some integer sub-multiple of the sampling rate. However, in pipelines designed for optimum performance on an individual operation basis, generally the latency of the pipeline is an implementation-specific number of hardware clock cycles. If such pipelines are used within A(z), samples emerging from the output of A(z) may not coincide with samples at the input. In such cases the “sampling clock,” a signal indicating which hardware clock cycles contain valid samples, must itself be stepped down the pipeline with the data.
Furthermore, performance-optimized pipelines generally process some samples using 11 more hardware clock cycles than for other samples. Hence, the sampling rate at the input of A(z) may not be regular and the number of samples “in flight” down the pipeline may vary. Thus the order (z−N) of A(z) may change dynamically, and if expressed in reduced form, the closed loop transfer function T(z) would be a very non-linear time-varying function. Such variable latency is not suitable for use in implementing signal processing transfer functions.
In order to guarantee an implementation of A(z) that produces output samples of constant latency and coincident with the input samples, the pipeline of A(z) could be specifically designed to shift in “lock-step” with the sampling clock at the input. Unfortunately, depending on the implementation specifics, this is often not a viable approach. For example, in an FPGA implementation the multipliers need to be deeply pipelined to achieve ASIC-like clock rates. Because of this, the time to compute an intermediate result may exceed a single sample and thus not be coincident with the input samples.
What are needed are hardware architectures and methods that permit the higher-level design of signal processing transfer functions to be completely decoupled from the specifics of the low-level circuitry associated with the target implementation technology.
What are needed are hardware architectures and methods that permit the test bench modules and vectors prepared for testing the transfer function to be completely decoupled from specifics of the low-level circuitry associated with the target implementation technology.
What are needed are hardware architectures and methods that permit an abstract generic “data processor” approach to the design of higher-level signal processing transfer functions while the design of the underlying low-level circuitry is driven solely by target implementation technology issues.
What are needed are hardware architectures and methods that permit the straightforward mapping of signal processing transfer functions onto any of multiple target implementation technologies.
What are needed are hardware architectures and methods that permit changes in an underlying arithmetic library to be made without requiring changes in the higher-level signal processing transfer function design.