Speech recognition applications are widely used in many enterprises, in order to extract transcriptions of speech. These applications may be for example, dictation, voice requests, indexing audio data (the ability to identify speakers involved in a conversation), voice to text alignment (i.e., time alignment between speech signals and their corresponding text), key word spotting (the identification of keywords in utterances), etcASR systems typically consist of the following components: A Frontend (FE), which is responsible for feature extraction from audio stream and includes a set of algorithms for Digital Signal Processing (DSP), Voice Activity Detection (VAD) and noise reduction.
A Language Knowledge Base (LKB) that typically consists of a language model (LM), an acoustic model (AM) and a dictionary.                A Decoder that receives feature vectors from the FE and seeks the best path in a search space constructed from LKB.        
Nowadays, there is no standard for the organization of client-server architecture for (ASR) applications. There is a specification for Session Initiation Protocol (SIP—is a signaling protocol for controlling communication sessions such as voice and video calls over IP) Interactive Voice Response (IVR—a technology that allows a computer to interact with humans through the use of voice and DTMF tones input via keypad). However, this specification is dedicated to the field telephony and is not adapted to the operating conditions of advanced IP-based ASR systems.
It is therefore an object of the present invention to provide a client-server platform for a variety of automatic speech recognition (ASR) applications that are working in the Internet or Intranet data networks.
It is another object of the present invention to provide a client-server platform for a variety of automatic speech recognition (ASR) applications that has a minimal time response for a given environment.
It is another object of the present invention to provide a client-server platform for a variety of automatic speech recognition (ASR) applications that is capable of distributing computations between clients and a server, depending on the client capabilities and security requirements.
It is another object of the present invention to provide a client-server platform for a variety of automatic speech recognition (ASR) applications that minimizes the network data traffic.
It is another object of the present invention to provide a client-server platform for a variety of automatic speech recognition (ASR) applications that has a scalable architecture, in which all the components are scalable.
Other objects and advantages of the invention will become apparent as the description proceeds.