Modern operating rooms for performing surgery have seen several advancements over the past two decades. In the late 20th century, state-of-the-art operating rooms included several electronic surgical instruments (i.e. electrosurgical units, insufflators, endoscopes, etc.). These instruments were separately operated by the surgeon and members of the surgical team. The industry improved upon this type of operating room by integrating the various instruments into a unified system. With this configuration, the surgeon and/or members of the team use a central controller (or surgical control unit) to control all of the instruments through a single interface (often a graphical-user interface). Generally speaking, these central control units were built using modified personal computers and the operating rooms using them are commonly referred to as “digital operating rooms”.
The establishment of the digital operating room paved the way for the voice controlled operating room. With this system, a member of the surgical team (usually the surgeon) wears a headset with a microphone. The surgeon issues spoken commands into the headset, these commands are sent to the central controller which controls the various instruments to perform desired tasks or make on-the-fly adjustments to operating parameters. The central controller operates software including a speech-to-text converter (i.e. speech recognition software) to interpret and execute the voice commands. Since computers often have difficulty understanding spoken language, typical systems include audible confirmation feedback to the surgical team, notifying them that a command has been understood and executed by the controller. Since sterility is critically important in all surgical procedures, this touch-free control system represented a significant advancement.
The voice-controlled digital operating room was further improved by the introduction of the wireless voice-control headset. This gave the surgeon greater mobility and eliminated the microphone cable as a possible source of contamination or nuisance for the surgeon. Voice controlled digital operating rooms with wireless headsets represent the modern state-of-the-art in the field.
Although this type of system has worked well for the convenience and efficacy of the surgical team and the maintenance of sterility, it has introduced certain heretofore unknown safety issues. One such safety issue is the problem of surgeons issuing commands into wireless headsets and input devices that are mated with a nearby room's surgical control unit. In that situation, a surgeon may attempt to control a surgical control unit present in the room they are occupying, only to inadvertently control another surgical control unit in a nearby room where an unrelated procedure is being performed. This problem is exacerbated by the fact that a surgeon may repeat commands in a vain attempt to operate the surgical control unit in the room they are occupying. This can result in injury to the patient and surgical team and/or damage to the equipment in the nearby room.
Moreover, a surgical team must keep track of the headset and ensure that the surgeon is wearing it prior to the procedure. Although they are less intrusive and more convenient than prior systems, the wireless headsets are still a source of potential contamination and nuisance for the surgeon.
The problems associated with wireless headset microphones can be eliminated by replacing them by ambient microphones located inside the operating room to receive the surgeon's commands. By using ambient microphones, the wireless headset is eliminated as a potential source of contamination. Furthermore, issuing commands to the wrong operating room control unit is impossible. However, the use of ambient microphones introduces new problems. Ambient microphone voice control systems use similar speech recognition software as headset voice control systems. Headsets receive relatively “clean” speech input with a high signal-to-noise ratio as a result of being very near the source of the speech commands. However, this advantage is not present with ambient microphones and the software that interprets speech commands is poorly adapted to deal with the additional background noise and reverberations present in the audio data gathered by ambient microphones.
One way to improve the voice control software's ability to selectively analyze speech commands is to calibrate the voice control system after the surgical system and ambient microphone are installed in the operating room. A modern speech recognition system is typically trained on several hundreds or even thousands of hours of speech data produced by a large number of speakers. Preferably, these speakers constitute a representative sample of the target users of the system. Such a speech recognition system will perform at its best when used in an environment that closely matches the noise conditions and type of microphone used for recording the training speech data. Most commonly, training speech data are recorded in relatively controlled and quiet environments, and using high quality close-talking microphones. When a speech recognition system trained on this data is used in a noisy operating room and with a far-field microphone, the accuracy of recognition tends to degrade dramatically. Theoretically, such degradation could be corrected by recording the training data in a noisy operating room and with the same far-field microphone. However, each operating room has its own noise characteristics and specified installation location for the far-field microphone, which means that in order to achieve the best possible performance with such a strategy, the speech data would have to be recorded in every individual room. Obviously, this would be extremely costly and impractical. It is desirable then to develop a technique that makes it possible to take a generic speech recognition system, trained on standard speech data (quiet environment, close-talking microphone), and quickly adapt it to the new noisy environment of a particular operating room and far-field microphone combination. The word “quickly” is used here to mean that only a little amount of new audio data is needed to be recorded in the target environment.
Using a technician issuing commands in the operating environment to calibrate the system is currently not a viable alternative because current calibration algorithms such as maximum-likelihood linear regression (MLLR) and maximum a posteriori estimation (MAP) would adapt the voice interpreting system to both the technician and the operating environment, substantially degrading performance for other users (i.e. the intended users of the system such as the surgeons and nurses).
There remains a need in the art for calibration system for an ambient microphone voice controlled surgical system that calibrates the voice control system using limited audio data collected in the operating environment. The calibration system would be capable of calibrating the control system to the unique properties of an operating environment while preventing unique characteristics of a technician's voice or other audio sources used for calibration from affecting the final calibration.