A variety of automatic speech recognition (ASR) systems exist for recognizing speech and for generating text and/or commands based on such speech. ASR systems that generate text based on speech are typically referred to as “dictation” or “transcription” systems.
Some dictation systems allow users to dictate freeform text that is entered directly into documents exactly as spoken, except possibly with spelling corrections or other minor corrections applied. A common example of such systems is the dictation systems commonly used in conjunction with word processors to dictate memos, articles, and other prose documents. In contrast, some dictation systems are designed to generate and input data in a structured format based on the user's speech. For example, such an ASR system may be used to enter data into a database form, in which data in each field is constrained to have only certain values and to be represented in certain formats. For example, a “Priority” field in such a form may be associated with a dropdown list having only three values, such as “High,” “Medium,” and “Low,” in which case the speech of a user dictating into such a field may be constrained to produce only one of these three possibilities as speech recognition results.
Whether the user provides input using speech input or other kinds of input (such as mouse or keyboard input), both freeform and structured input modalities have a variety of advantages and disadvantages. For example, one advantage of freeform input modalities is that they allow the user to provide a wide range of input and therefore to capture subtleties in the information provided. A corresponding disadvantage of structured input modalities is that by constraining the input options available to the user, they may fail to capture information that cannot accurately be represented using the available options.
One advantage of structured input modalities is that they require data to be stored in the form of discrete data elements that computers can act on easily and automatically. For example, if a patient has an allergy to allergen X and currently takes medication Y, if these two facts are input using structured input modalities (such as by using dropdown lists for selecting allergens and medications, respectively), then the resulting data can be encoded in discrete data elements representing allergen X and medication Y, respectively, and a computer can performing processing on such information easily and automatically, such as by comparing the data to contraindications on a predetermined list to determine whether allergen X is contraindicated with medication Y. Structured input modalities, in other words, enable data to be input in a form such that the meaning of the data is available to and unambiguously processable by a computer without the need for further interpretation. In contrast, data input using freeform input modalities (such as the text “the patient is allergic to X and is taking medication Y”) must be parsed and interpreted in an attempt to discern the meaning of the text before a computer can attempt to parse the information represented by the text. Such parsing and interpretation are subject to errors which can impede the ability to perform the kind of processing that can be performed easily and automatically using data obtained using structured input modalities.
Another advantage of structured input modalities is that they can enable data to be input with fewer errors because the user is prevented, for example, from providing the wrong type of information (such as by inputting a name into a date field) or from providing information that is outside a permissible range of values (such as by entering −10 into a field representing the body temperature of a patient). A corresponding disadvantage of freeform input modalities is that they can allow input to contain a wide variety of errors because the user's input is not constrained.
Yet another advantage of freeform input modalities is that they do not predispose the user toward providing any one particular input over another. In contrast, structured input modalities can bias the input provided by the user towards the set or range of permitted inputs. This bias can be undesirable when the goal of the system is to faithfully capture and represent information. For example, a documentation system that only offers a checkbox as a means for inputting information about the presence or absence of a particular fact—such as whether a patient has hypertension—forces the user to pigeonhole the user's knowledge about the patient's condition into a binary value of “yes” or “no.” If the user's knowledge of the patient indicates, for example, that the patient currently has mild hypertension, that the patient may possibly have hypertension, or that the patient is in the process of developing hypertension, then requiring the user to provide an answer of “yes” or “no” to the question, “Does the patient have hypertension?,” will result in a misrepresentation of the true state of the patient in relation to hypertension.
Some input systems (such as EMR systems) attempt to address this problem by adding additional input choices, such as by enabling the user to provide not only binary answers to questions about facts, but also to provide information about additional qualities related to those facts, such as degree, likelihood, conditionality, and interdependency of such facts. For example, such a system might require the user to provide a “yes” or “no” answer to the question, “Does the patient have hypertension?,” but also ask the user, in connection with that question, to provide a degree to which the user's “yes” or “no” answer is correct, a likelihood that the user's “yes” or “no” answer is correct, and so on.
Although this kind of solution can address some of the problems with structured input modalities, adding such additional input choices can quickly make using the system unwieldy due to the large amount and variety of inputs that the user must provide. Furthermore, the meanings of the additional input choices may not be clear to the users responsible for selecting such choices. For example, if the user is provided with choices of “Low,” “Medium,” and “High” for rating the likelihood that the patient has hypertension, it is not clear whether these choices represent equally-distributed probabilities such as 0-33.3%, 33.3-66.7%, and 66.7-100%, or some other ranges, such as 0-10%, 10-90%, and 90-100%. As a result, the user must make some decision about how to interpret the available input choices, and the result of that decision may differ both from that intended by the designer of the system and from the decisions made by other users when faced with the same set of input choices. As a result, the inputs provided by users may result in inaccurate information being entered into the system.
In light of these various advantages and disadvantages of freeform and structured input modalities, what is needed are improved techniques for capturing data to maximize accuracy and minimize errors.