1. Field of the Invention
The present invention relates to a speech processing apparatus, and more particularly to a speech processing apparatus which is capable of discriminating between significant information and unnecessary information in a large amount of speech information, extracting significant information, and processing it.
For example, the present invention relates to an apparatus which, when a large amount of speech data input from a plurality of talkers is handled, is capable of extracting as an object the speech information from a particular talker in the input information and processing it with respect to its vowels, consonants, accentuation and so on, and processing this speech.
2. Description of the Related Art
There are now demands in a wide range of industrial fields for information processing systems which function to extract significant data contained in a large volume of data such as speech input from a plurality of talkers therefrom and to process speech from a particular talker. Each of the conventional speech processing systems of the type which has been put into practical use comprises a speech input unit 300, a processing unit 305 and an output unit 304, as shown in FIG. 9. The speech input unit 300 contains, for example, a microphone or the like, and serves to convert sound waves traveling through air into electrical signals which are input as aural signals. The processing unit 305 comprises a feature-extracting section 301 for extracting the features of the aural signals that are input, a standard pattern-storing section 303 in which the characteristic patterns of standard speech have been previously stored and a recognition decision section 302 for recognizing the speech by collating the features extracted by the extracting section 301 with the standard patterns stored in the storing section 303.
Lately, digital computer systems have been often used as the processing unit 305 which employ a method in which various types of features are arithmetically extracted from all the input speech data and in which the intended speech is classified by searching for common features of the aural signals thereof from the various types of features extracted.
Speech processing is performed by collating the overall feature obtained by combining the above-described plurality of features (partial feature) extracted with the overall feature of the speech stored as the object of recognition in the storing section 303.
The above-described processing is basically performed for the entire local data of the aural signals input. In order to cope with the demand for high speed processing of complicated and massive speech data which is the first priority of industry, the processing of such complicated and massive speech data is generally conducted by devising an algorithm for the operational method, searching method and the like in each of the sections or by specializing, i.e., specifying, the information regions to be handled, on the assumption that the above-described arrangement and method are used. For example, the processing in the feature-extracting section 301 is based on digital filter processing, which is premised on the use of large hardware or signal processing software.
In regard to speech processing, in particular, conventional talker recognition processing for recognizing the speech of a designated talker by extracting it from the speech input from a plurality of talkers, high speed processing and a reduction in the size of a processing apparatus are contrary to each other.