The invention concerns a process for automatic control of one or more devices by speech control or by real-time speech dialog, by a process for automatic control of one or several devices by speech commands or by speech dialog in real-time operation. The invention further concerns an apparatus for carrying out this process according to the invention by an apparatus for carrying out the above process, in which a speech input/output unit is connected via a speech signal preprocessing unit with a speech recognition unit, which in turn is connected to a sequencing control, a dialog control, and an interface control.
Processes or apparatuses of this kind are generally used in the so-called speech dialog systems or speech-operated systems, e.g. for vehicles, computer-controlled robots, machines, plants etc.
In general, a speech dialog system (SDS) can be reduced to the following components:                A speech recognition system that compares a spoken-in command (“speech command”) with other allowed speech commands and decides which command in al l probability was spoken in;        A speech output, which issues the speech commands and signaling sounds necessary for the user control and, if necessary, feeds back the results from the recognizer;        A dialog control and sequencing control to make it clear to the user which type of input is expected, or to check whether the input that occurred is consistent with the query and the momentary status of the application, and to trigger the resulting action during the application (e.g. the device to be controlled);        A control interface as application interface: concealed behind this are hardware and software modules for selecting various actuators or computers, which comprise the application;        A speech-selected application: this can be an order system or an information system, for example, a CAE (computer added engineering) work station or a wheel chair suitable for a handicapped person;        
Without being limited to the general usability of the described processes, devices, and sequences, the present description focuses on the speech recognition, the dialog structure, as well as a special application in motor vehicles.
The difficulties for the solutions known so far include:    (a) The necessity for an involved training in order to adapt the system to the characteristic of the respective speaker or an alternating vocabulary. The systems are either completely speaker-independent or completely speaker-dependent or speaker-adaptive, wherein the latter require a training session for each new user. This requires time and greatly reduces the operating comfort if the speakers change frequently. That is the reason why the vocabulary range for traditional systems is small for applications where a frequent change in speakers and a lack of time for the individual speakers must be expected.    b) The insufficient user comfort, which expresses itself in that            the vocabulary is limited to a minimum to ensure a high recognition reliability;        the individual words of a command are entered isolated (meaning with pauses in-between);        individual words must be acknowledged to detect errors;        multi-stage dialog hierarchies must be processed to control multiple functions;        a microphone must be held in the hand or a headset (combination of earphones and lip microphone) must be worn;            c) The lack of robustness            to operating errors;        to interfering environmental noises.            d) The involved and expensive hardware realization, especially for average and small piece numbers.
It is the object of the invention to provide on the one hand a process, which allows the reliable control or operation of one or several devices by speech commands or by speech dialog in the real-time operation and at the lowest possible expenditure. The object is furthermore to provide a suitable apparatus for carrying out the process to be developed.