Advances in digital signal processor (DSP) technology in the application areas of telephony, imaging, video, and voice are often results of years of intensive research and development. For example, algorithm standards for telephony have taken years to develop. The implementation of these DSP algorithms is often very different from one application system to another because systems have, for example, different memory management policies and I/O handling mechanisms. Because of the lack of consistent system integration or programming standards, it is generally not possible for a DSP implementation of an algorithm to be used in more than one system or application without significant reengineering, integration, and testing.
Digital Signal Processors (DSPs) are often programmed like “traditional” embedded microprocessors. That is, they are programmed in a mix of C and assembly language, directly access hardware peripherals, and, for performance reasons, almost always have little or no standard operating system support. Thus, like traditional microprocessors, there is very little use of Commercial Off-the-Shelf (COTS) software components for DSPs.
However, unlike general-purpose embedded microprocessors, DSPs are designed to run sophisticated signal processing algorithms and heuristics. For example, they may be used to detect DTMF digits in the presence of noise, to compress toll quality speech by a factor of 20, or for speech recognition in a noisy automobile traveling at 65 miles per hour.
Such algorithms are often the result of many years of doctoral research. However, because of the lack of consistent standards, it is not possible to use an algorithm in more than one system without significant reengineering. Since few companies can afford a team of DSP PhDs and the reuse of DSP algorithms is so labor intensive, the time-to-market for a new DSP-based product is measured in years rather than months.
Most modern DSP system architectures can be logically partitioned into algorithms, core run-time support and a framework that integrates the algorithms with the hardware and software comprising the system. The framework defines a component model with which the algorithms must comply. The framework includes a device independent I/O sub-system and specifies how algorithms interact with this sub-system. For example, does the algorithm call functions to request data or does the framework call the algorithm with data buffers to process? The framework also defines the degree of modularity within the application; i.e., what components can be replaced, added, removed, and when components can be replaced (compile time, link time, or real-time). Unfortunately, for performance reasons, many DSP system architectures do not enforce a clear line between algorithm code and the framework. Thus, it is not possible to easily use an algorithm in more than one system.
Even within the telephony application space, there are a number of different frameworks available and each is optimized for a particular application segment; e.g., large volume client-side products and low volume high-density server-side products. Given the large number of incompatibilities between these various frameworks and the fact that each framework has enjoyed success in the market, any model for algorithm reuse should ideally make few requirements on existing frameworks.
Careful inspection of the various frameworks in use reveals that, at some level, they all have “algorithm components”. While there are differences in each of the frameworks, the algorithm components share many common attributes: algorithms are C callable; algorithms are reentrant; algorithms are independent of any particular I/O peripheral; and, algorithms are characterized by their memory and instruction processing rate (MIPS) requirements. In approximately half of the known available frameworks, algorithms are also required to simply process data passed to the algorithm. The others assume that the algorithm will actively acquire data by calling framework-specific hardware independent I/O functions. Generally, algorithms are designed to be independent of the I/O peripherals in the system.
Given the similarities between the various frameworks, a need has arisen to create a model permitting simple reuse at the level of the algorithm. Moreover, there is real benefit to the framework vendors and system integrators from such a model: algorithm integration time will be reduced; it will be possible to easily comparison shop for the “best” algorithm; and more algorithms will be available.
A huge number of DSP algorithms are needed in today's marketplace, including modems, vocoders, speech recognizers, echo cancellation, and text-to-speech. It is not easy (or even possible) for a product developer who wants to leverage this rich set of algorithms to obtain all the necessary algorithms from a single source. On the other hand, integrating algorithms from multiple vendors is often impossible due to incompatibilities between the various implementations. To break this catch-22, algorithms from different vendors must inter-operate.
Dozens of distinct third-party DSP frameworks exist in the telephone vertical market alone. Each vendor has hundreds and sometimes thousands of customers. Yet, no one framework dominates the market. To achieve the goal of algorithm reuse, the same algorithm must be usable in all frameworks with minor impact on the framework implementation.
Marketplace fragmentation by various frameworks has a legitimate technical basis. Each framework optimizes performance for an intended class of systems. For example, client systems are designed as single-channel systems with limited memory, limited power, and lower-cost DSPS. As a result, they are quite sensitive to performance degradation. Server systems, on the other hand, use a single DSP to handle multiple channels, thus reducing the cost per channel. As a result, they must support a dynamic environment. Yet, both client-side and server-side systems may require exactly the same vocoders.
Algorithms must be deliverable in binary form both to protect the vendor's intellectual property and to improve the reusability of the algorithm. If source code were required, all frameworks would require re-compilation. This would destabilize the frameworks and version control for the algorithms would be close to impossible.
Each particular implementation of a system (such as a speech detector) represents a complex set of engineering trade-offs between code size, data size, MIPS, and quality. Moreover, depending on the system designed, the system integrator may prefer an algorithm with lower quality and a smaller footprint to one with higher quality detection and a larger footprint (such as an electronic toy doll vs. a corporate voice mail system). Thus, multiple implementations of exactly the same algorithm sometimes make sense. There is no single best implementation of many algorithms.
Unfortunately, the system integrator is often faced with choosing all algorithms from a single vendor to ensure compatibility between the algorithms and to minimize the overhead of managing disparate APIs. Moreover, no single algorithm vendor has all the algorithms for all their customers. The system integrator is, therefore, faced with selecting a vendor that has “most” of the required algorithms and negotiating with that vendor to implement the remaining algorithms.
Most modern DSP hardware architectures include both on-chip data memory and off-chip memory. The performance difference between these is so large that algorithm vendors design their code to operate within the on-chip memory as much as possible. Since the performance gap is expect to increase dramatically in the next 3–5 years, this trend will continue for the foreseeable future.
While the amount of on-chip data memory in a given DSP architecture may be adequate for each algorithm in isolation, the increased number of MIPS available on modern DSPs enables the creation of complex software systems that perform multiple algorithms concurrently on a single chip. There may not be enough on-chip memory to allow all algorithms to have their full, required complement of data memory resident in on-chip memory concurrently. There is a need for a method to permit efficient sharing of this resource among the algorithms. Generally, prior art methods for memory sharing in a complex system have required that an algorithm be partially or substantially rewritten each time it was used in a new framework. This situation makes it very costly, both in time and money, for a third party algorithm vendor to provide algorithms for multiple frameworks.