There is tremendous inefficiency in the preparation of reports, frequently containing similar content, but requiring costly dictation and transcription. In the medical field, many reports contain repetitive information, especially normal reports. Radiology, pathology, and surgery reports often include similar information depending on the context of the test, biopsy, or surgery being performed. Because of the high cost of producing medical reports, estimated at 6-7 billion dollars/year in the United States, individuals and companies are actively seeking to lower costs and improve efficiency using computerized document solutions.
One semi-automated documentation technique which has lowered reporting cost uses templates. Templates are outlines that structure text into blocks, paragraphs, or sentences. Typically, they contain delimiters designed to serve as placeholders for variable text regions that are later completed (instantiated) to produce a document. Templates organize documents and provide “canned” text for phrases, sentences, or document elements that are used repetitively. Templates can improve data collection by reducing missing, incorrect, and inconsistent data. Template elements, e.g. sentences, also can be uniquely identified and stored within a relational database. If more professionals created documents with templates, documents could also be processed more readily in data mining, decision support, and text summarization applications.
Some templates are professionally authored with the specific goal of serving as a base document for newly created documents. Other templates may consist simply of old documents, repurposed to serve as templates. For example, retrieving an old document to a business partner, and saving it as a new document after making minor changes. Either type of template may be useful. The key is locating a closely related “neighbor” document.
Howes (USTPO application 20030101056) described a computerized system for completing normal medical reports. The author first selects a master template from a template repository, and then completes the template with unique case information. However, this approach breaks down when there are many document types. For example, a radiology report for a normal head computed tomography scan, is vastly different from a report documenting left cerebral hemorrhage. In these complex cases, it may be impossible to create a report in the same manner as one fills out a form, using simple fill in the blanks or check boxes. Although a physician might wish to select sentences rather than write them from scratch, the information conveyed is sensitive to the problem-context (also called the document type or document context). Unfortunately, in medicine as well as other complex domains, authors communicate a great deal of non-stereotyped knowledge, difficult to encode in a few master templates. The greater the range of a professional's domain knowledge, the more difficult it is to build a complete master template repository.
Automated document systems employing template elements, consisting of phrases or sentences, have been used to speed document generation in simple or narrow problem-contexts. Dodge et. al. (U.S. Pat. No. 5,655,130) described a method for producing a variety of documents from a common document database partitioned into a number of encapsulated data elements. As in other template systems, creation and selection of these elements was a manual process. Leymaster et al. (U.S. Pat. No. 6,182,095) developed an interactive computer system to display document structures used in report generation. Selecting the correct template involved explicit user questioning. Other document generation systems use a variety of computer interfaces such as trees, nested menus, and check boxes. When the user selects an item, it is added to the document. Buchanan (U.S. Pat. No. 5,267,155) employed such a method to generate patient reports. Like other template management systems, the user must first develop a master template database from scratch, and then remember the association between the template's name and its contents. While this may not be such a serious problem when only a few templates are needed to cover a domain, when potentially hundreds or thousands of templates are needed for describing complex information, template building and selection becomes intractable. Current template systems suffer undesirable tradeoffs—creating more templates increases the probability that a template can be found which is close to the new document the author wishes to produce, while at the same time increasing the cost and effort of building templates, and the burden of selecting them.
Case Based Reasoning (CBR) provides a related but potentially more advanced mechanism for building a document and template management system. CBR can recall relevant reports, which can then serve as templates, based on previous reports (cases) stored in memory. The CBR user would: (1) retrieve relevant reports (2) reuse some of the report information in the current report (3) revise the report, and (4) retain the new report in the system. The essential step is finding one or more prior reports that are similar to the current report. It is unlikely that any one report will be a perfect match to the case being reported; only some of the information is likely to be used. A documentation system which could find the relevant “neighbor” reports to serve as templates could speed the creation of new reports.
Purvis (USPTO application 20040019855) disclosed a CBR system, which used previously created documents stored in a case base, and methods for drawing upon these documents to create a new document. However, her system was mainly targeted to helping authors select the correct formatting and layout of a report, and was not a true text case-based reasoning (TCBR) system, which is necessary for finding similar report content.
Gupta et. al. (U.S. Pat. No. 5,822,743) disclosed a CBR system for solving problems within a selected knowledge domain. Users retrieved solved cases using one or more case attributes. Matching algorithms generated a list of potential solutions. This system did not attempt to extract the meaning from free text reports, and was not used to facilitate document creation.
Finding relevant template documents rapidly is a major unsolved problem TCBR systems must overcome to effectively assist users in document creation. Two dimensions are often measured to judge the quality of case retrieval. Recall measures the completeness of retrieving documents or cases. Precision measures the specificity of returning only relevant cases. Users want the templates to provide similar content desired by the author. Unfortunately, current TCBR systems based on conventional information retrieval technology have relatively poor recall and precision.
TCBR systems using metadata or keyword attributes have significant indexing problems because the semantics of language is complex. Indexing documents by keywords often results in poor precision, because the meaning of document sentences is not just a product of the individual words, but their roles (nouns, verbs, clauses, modifiers, etc.) and inter-relationships. Additionally, semantic information may be implicit—depending on the document context or the knowledge domain, making it very difficult to semantically index a document without a great deal of labor.
The development of high precision TCBR system requires a deep understanding of a knowledge domain. To be effective in locating relevant reports which may serve as templates, it must know how to identify linguistic expressions that are semantically equivalent. However, computational linguists have not yet developed tools which can analyze more than 30% of English sentences and transform them into structured forms [Rebholz-Schuhmann D, Kirsch H, Couto F (2005) Facts from text—Is text mining ready to deliver? PLoS Biol 3(2): e65]. Without identifying most of the linguistic variations that represent the same concept semantically in the case base, the TCBR system will have low precision and recall, and thus limited utility in template selection.
Another type of CBR system uses an expert system, a knowledge base, and a question/answer list designed in such a way that the expert system can usually return the correct case. Building the question/answer section is manpower intensive and may require 70-80% of the total development time. A domain, such as radiology, has thousands of individual concepts. Creating templates and questions for all these concepts would be a significant challenge because no tools exist to automatically extract them. Even after the template database is initially created, there would be an ongoing need for incorporating new ideas, changes in medical terminology, and new procedures. Any CBR system proposed for document generation must deal with these issues.
A CBR system designed for medical reporting must work within the constraint of limited selection time. As the time the CBR system needs to find the correct template increases, its value for producing a new document using this method decreases. At some point, when the burden of selection becomes too high, the author will find it more advantageous to create the material from scratch. Multiple question CBR systems extract a significant time burden. Rapid, high precision TCBR systems for selecting document templates do not exist.
Another problem with a traditional CBR system is that users may be unwilling to interrupt their workflow to retrieve template documents. If sentences from a partially completed documented could be used to retrieve templates without any user intervention, the computer-assisted system would have greater acceptance. Such a system does not exist.
After the reports are retrieved, sentences from the reports must be transferred to the new report under construction. Since the user is likely to have very specific ideas about which sentences add the greatest value to her report, a method for selecting and transferring these sentences is needed that is considerably quicker than electronic cut and paste. No exemplar CBR systems, which retrieve multiple document templates, propose methods to speed information transfer into new documents.
In summary, a semi-automated document system selecting old reports to serve as templates, or professionally authored templates, would be particularly advantageous for physicians, attorneys, or other individuals who prepare reports, if there were good methods to find a reasonably small but specific list of similar reports, and this content was easily accessible without undue burdens on the user's memory or time. The present invention accomplishes this through a high precision TCBR system. The invention facilitates template selection because it has a semantic understanding of the knowledge domain. Relevant templates are returned in near real-time from limited information available in partially completed reports. The invention employs a visual interface which permits information contained in template document(s) to be rapidly and selectively transferred into a new document. Prior art systems are unable to meet these demanding requirements.