1. Field of the Invention
The present invention relates to speech recognition and, more particularly, to a system and method for transforming sampling rates without retraining.
2. Description of the Related Art
Speech recognition can achieve the best performance when test or operating conditions match training conditions. In general, these matched conditions include acoustic environments, speakers, application corpora, etc. An issue arises in conventional systems when a sampling frequency mismatch occurs between the training conditions and the test/operating conditions. The frequency mismatch inevitably leads to severe performance degradation in speech recognition.
When a conventional speech recognition system is deployed, it is designed for a specific data sampling frequency. When another sampling rate is considered, it is customary to re-train the system for the new specific sampling rate. While it is straightforward to transform signals and retrain systems, this presents at least two major problems in many real-time applications. First, extra efforts are needed to supply training data at the new sampling frequency by either collecting new data or transforming existing training data. Second, the training process must be repeated to generate new system parameters.
For systems that have undergone calibration processes such as speaker adaptation or acoustic adaptation, it is even more tedious to repeat the processes, let alone the complication of maintaining multiple prototypes.
Therefore, a need exists for a system and method for providing sampling frequency change without the burden of retraining.