The present invention relates to a method of processing received data in a distributed speech recognition process. The present invention also relates to an apparatus for processing received data in a distributed speech recognition process. The present invention is suitable for, but not limited to, processing received data relating to speech recognition parameters when it is transmitted over a radio communications link.
Speech recognition is a process for automatically recognising sounds, parts of words, words, or phrases from speech. Such a process can be used as an interface between man and machine, in addition to or instead of using more commonly used tools such as switches, keyboards, mouse and so on. A speech recognition process can also be used to retrieve information automatically from some spoken communication or message.
Various methods have been evolved, and are still being improved, for providing automatic speech recognition. Some methods are based on extended knowledge with corresponding heuristic strategies, others employ statistical models.
In typical speech recognition processes, the speech to be processed is sampled a number of times in the course of a sampling time-frame, for example 50 to 100 times per second. The sampled values are processed using algorithms to provide speech recognition parameters. For example, one type of speech recognition parameter consists of a coefficient known as a mel cepstral coefficient. Such speech recognition parameters are arranged in the form of vectors, also known as arrays, which can be considered as groups or sets of parameters arranged in some degree of order. The sampling process is repeated for further sampling time-frames. A typical format is for one vector to be produced for each sampling time-frame.
The above parameterisation and placing into vectors constitutes what can be referred to as the front-end operation of a speech recognition process. The above described speech recognition parameters arranged in vectors are then analysed according to speech recognition techniques in what can be referred to as the back-end operation of the speech recognition process. In a speech recognition process where the front-end process and the back-end process are carried out at the same location or in the same device, the likelihood of errors being introduced into the speech recognition parameters, on being passed from the front-end to the back-end, is minimal.
However, in a process known as a distributed speech recognition process, the front-end part of the speech recognition process is carried out remotely from the back-end part. The speech is sampled, parameterised and the speech recognition parameters arranged in vectors, at a first location. The speech recognition parameters are quantized and then transmitted, for example over a communications link of an established communications system, to a second location. Often the first location will be a remote terminal, and the second location will be a central processing station. The received speech recognition parameters are then analysed according to speech recognition techniques at the second location. The quantized speech recognition parameters, and their arrangement in vectors, constitute data that is transmitted from the first location and received at the second location. In order to facilitate transmission of this data, the data is typically arranged in a frame structure comprising a plurality of data frames each preceded by a respective header frame comprising common header information. The header frames can also be such that a header frame additionally includes header information specific only to that header frame or the particular data frame corresponding to it.
Many types of communications links, in many types of communications systems, can be considered for use in a distributed speech recognition process. One example is a conventional wireline communications system, for example a public switched telephone network. Another example is a radio communications system, for example TETRA. Another example is a cellular radio communications system. One example of an applicable cellular communications system is a global system for mobile communications (GSM) system, another example is systems such as the Universal Mobile Telecommunications System (UMTS) currently under standardisation.
For the sake of avoiding any confusion, it is pointed out that the data frames described above should not be confused with transmission frames that are then used in the transmission of the data over the communications link of the communications system in which the data is transmitted from a first location to a second location, for example the time division multiple access (TDMA) time frames of a GSM cellular radio communications system.
The use of any communications link, in any communications system, causes the possibility that errors will be introduced into the data and also the header information that is transmitted from the first location to the second location over the communications link.
Due to the specialised speech recognition techniques the speech parameters are subjected to, it is desirable to provide means for processing the received data that offer a degree of resilience to errors introduced in the header information in such a way that is particularly suited to the characteristics of distributed speech recognition processes.
Additionally, it is known to provide error detection techniques in communications systems such that the presence of an error in a given portion of transmitted information is detectable. One well known technique is cyclic redundancy coding. It is also known to provide automatic error correction techniques in communications systems such that an error in a given portion of transmitted information is corrected. One well known technique is Golay error correction. It is also known to employ error detection and error correction in combination.
When automatic error correction is applied there is a risk that the corrected form of the overall portion of information being corrected will contain further discrepancies other than the original error part, since such methods tend to involve an approximation to a best overall assumed correct solution. This is the case for forward error correction techniques which employ encoding using a block-based coding scheme. One such example is Golay coding, which allows for example 12 bits of information to be sent in 24 bits whilst allowing for up to 3 errors to be corrected. The correction technique involves correction of a whole portion of information, for example a whole header frame in a composite fashion. If however more than 3 errors occur in the 24 bits, then the correction technique will correct the whole header to a wrong corrected version. It is desirable to provide means processing received data that alleviates problems associated with composite correction of a whole header frame to a wrong corrected version in a distributed speech recognition process.
Also, techniques of automatic error correction that may not cause secondary problems when applied to other forms of information are not necessarily without problem when applied to errors in the above described header frames in a distributed speech recognition process, due in part to the way the data in the corresponding data frames is processed using respective header frame information. Hence it is desirable to provide means for processing received data in a distributed speech recognition process that alleviate secondary problems.
The present invention addresses some or all of the above aspects.
According to one aspect of the present invention, there is provided a method of processing received data in a distributed speech recognition process, as claimed in claim 1.
According to another aspect of the invention, there is provided an apparatus for processing received data in a distributed speech recognition process as claimed in claim 7.
Further aspects of the invention are as claimed in the dependent claims.
The present invention tends to provide means for processing received data which are particularly appropriate to the nature of the distributed speech recognition process, the form in which data is received therein when transmitted from a first location to a second location, and the way in which such data is processed after receipt at the second location in a distributed speech recognition process.
Particularly, the possibility of allowing latency in a speech recognition process is exploited in the method of the present invention. More particularly exploited is the factor that in a distributed speech recognition process latency towards the start of a message is often particularly acceptable when combined with low latency at the end of the message.