1. Field
The technology of the present application relates generally to speech-to-text conversion for dictation systems, and more specifically, to methods and systems to train user profiles associated with speech recognition engines and test the trained profile to inhibit training a profile in a manner that may reduce accuracy.
2. Background
Many companies provide customers the ability to contact the company using a call center to field customer calls, correct customer problems, or direct the customer to the appropriate resource to solve the problems that initiated the call, real or imagined. Conventionally, a call center operates by a call being directed from a customer to an available agent or representative. Along with the telephone call, the agent or representative typically has a customer relation management screen that the company has authorized or specifically designed to facilitate assisting the customer.
Referring now to FIG. 1, a conceptual representation of the systems within a call center 100 is shown. The call center 100 includes both voice technologies which use the signaling and audio path and terminate at the agent's phone (or headset) and IP-based technologies that support the customer relationship management (CRM) application, whose graphical user interface (GUI) runs on the agent's processor, such as, for example, a personal computer or the like. To support this, the call center 100 includes, an automatic call distributor (ACD) 102 having an audio connection 104 to an agent phone 106. ACD 102 also has an audio connection 108 to an interactive voice response unit (IVR) 110. Audio connection 104 and 108 may be overlapping, completely separate, or a combination thereof. IVR 110 has a data connection 112 to CTI 114. Computer/Telephony Interface (CTI) 114 typically provides call control 116 to ACD 102 and data and application control 118 to an agent's computer 120. Thus, when a customer uses a telephone 122 or the like to call the call center over a conventional network 124, such as the public switch telephone network (PSTN) shown, the audio, data, and applications necessary for the agent to assist the caller are provided.
While FIG. 1 identifies a customer calling over a conventional PSTN as shown, calls from customers may originate from a computer or cable based voice over internet protocol (VoIP) network instead. The network 124 may be a conventional PSTN network as shown, such as, for example, when the customer is using a conventional landline or cellular telephone. Alternatively, network 124 may be a computing network, such as, for example, a LAN, a WAN, a WLAN, a WWAN, a WiFi, the internet, an Ethernet, or other private area network. When network 124 is a computing network, the call from the customer may originate from a VoIP enabled device, such as, for example, a computer telephone. Notice, VoIP telephones may be transferred to conventional PSTN networks using conventional technology. Moreover, conventional landlines, for example, may be connected to a computer network using a soft phone or media gateway.
Once the call between the customer service representative is established, and the CRM application is running on the representative's user interface, the customer service representative (CSR) would solicit input from the customer. Such input may consist of information such as, customer name, address, nature of the problem, and the like. Traditionally, the representative inputs this information by typing the information into the respective fields for input. At the end of the call, often the customer service representative would fill out a field in the CRM application generically known as notes or end of call notes. This field would typically be typed by the representative to acknowledge information such as, for example, the disposition of the customer complaint or the like.
To facilitate entering data into the CRM application, including entering data regarding the end of notes field, it may be possible to use a dictation system having a speech to text engine or speech recognition engine to allow the data to be dictated or captured during normal conversation/dialogue between the agent and the customer. See, for example, U.S. Pat. No. 7,702,093, issued Apr. 20, 2010, and incorporated herein by reference as if set out in full, and U.S. patent application Ser. No. 12/694,115, filed Jan. 26, 2010, and incorporated by reference as if set out in full. The audio signal would be directed to the speech recognition engine that would return textual data for input to the appropriate fields. Speech recognition engines may generally be classified as a natural language recognizer, such as DRAGON® NaturallySpeaking® currently available from Nuance Communications, Inc., or as a grammar-based or pattern matching recognizer, such as the Nuance Recognizer V9 also available from Nuance Communications, Inc.
Speech recognition engines, and in particular, natural language recognizers, may require training to function properly. Traditionally, training is accomplished by the user when the user edits the transcribed text. Thus, if the user speaks the word “potato” but the transcription returns the word “tomato,” the user can correct the transcription and this correction is used to train the system. Ideally, once trained, the next time the user speaks the word “potato”, the transcription would return the word “potato.”
Using dictation as a tool to add information to fields in a CRM application, however, to date have been cumbersome and unwieldy. Speech recognition engines that require training to function properly are difficult and expensive as the customer representative must take valuable time to train the speech recognition engine such that it functions properly. Moreover, it has been discovered that in some cases, corrections to transcriptions when used to train the grammar based system, in fact, result in reduced accuracy of the overall system. Thus, against this background, it would be desirous to provide improved training and feedback for dictation based systems as well as testing for increased, or at least no decrease, in system accuracy based on the training.