Various techniques exist for creating structured documents from speech. Examples of such techniques are disclosed in U.S. Pat. No. 7,584,103, issued on Sep. 1, 2009, entitled, “Automated Extraction of Semantic Content and Generation of a Structured Document From Speech”; and U.S. Pat. No. 7,716,040, issued on May 11, 2010, entitled, “Verification of Extracted Data”; both of which are assigned to Multimodal Technologies, Inc. of Pittsburgh, Pa. Similarly, the product AnyModal CDS Speech Understanding, available from Multimodal Technologies, Inc., may be used to create structured documents from speech.
For example, if a doctor dictates a report of a patient visit, the doctor's speech may be transcribed not merely into a verbatim transcript of the dictated report, but instead into a structured document in which the text representing the transcribed speech is organized into sections, sub-sections, paragraphs, and other structures corresponding to concepts represented by the speech. Such concepts may, for example, be represented in the structured document by marking up the text using XML tags, such as those defined by the HL7 CDA document format or other format.
Sections, sub-sections, and other concepts may be annotated within the document using codes that indicate a semantic class of the concept, such as “CurrentMedications,” “Findings,” and “Discharge Instructions.” Furthermore, transcribed text may be annotated with codes representing the meaning of the text in a computer-processable form, such as an “RxNorm” code for medications mentioned in the text, a post-coordinated SNOMED CT term describing a problem of a patient, or a complex data structure describing an allergy using information about the allergen, severity, and adverse reaction associated with the allergy.
Furthermore, the structured document may be annotated with header information that indicates the type of the document (such as “Discharge Summary” or “Progress Note”) and context information (e.g., information about the patient who is the subject of the document, information about the physician who dictated the document).
These and other techniques for creating structured documents from speech are described in more detail in the two above-referenced U.S. Pat. Nos. 7,584,103 and 7,716,040. As indicated above, such structured documents include both text and codings (such as XML tags) associated with the text. The codings encode, in a computer-processable form, concepts represented by the corresponding text.
It is desirable to be able to search such structured documents to find relevant information as quickly, easily, and accurately as possible. Although some techniques for performing such searching exist, there is a need for improved techniques for searching structured documents, particularly when such structured documents are part of a dynamic corpus of structured documents which grows and changes over time.