The invention relates to a method for improving a radiologists or other interpreters skills in interpreting radiologic images whether film images viewed on a multiviewer or the like, or digital images viewed on a computer screen. The method includes use of standardized terminology during interpretation of radiologic studies to characterize image findings. The method tracks an interpreter""s use of this terminology in addition to their assessments and conclusions concerning the presence or absence of biological processes that cause the image finds and tracks evidence for the presence or absence of these biological processes. The method further employs repetitive feedback to interpreters concerning their diagnostic accuracy and use of standardized feature descriptors to refine their understanding of the appropriate use of the terminology.
Variability in the interpretation of radiological exams is a known phenomenon, and has recently received considerable attention in the area of breast imaging. For example, recent research has demonstrated marked variability in radiologists"" interpretation of mammograms (Elmore, 1995 and 1998; Beam, 1996). The authors of these studies have noted the need for efforts to improve accuracy and reduce variability in the interpretation of mammograms. Unlike analytic tests (e.g. serum electrolytes), the interpretation of radiologic tests involves a significant amount of subjective judgment. This phenomenon has been discussed in numerous publications (Elmore, 1995; Beam, 1996), and includes both (a) failures in detection (i.e., failure to identify an abnormality), and (b) failures of characterization (e.g., failure to properly classify an abnormality as benign or malignant once its features have been assessed). The method of the present invention addresses this second source of variability, focusing on a method for improving diagnostic accuracy and reducing the variability of an interpreters"" characterization of abnormalities seen on radiologic examinations.
Breast imaging is one of the first subspecialties of radiology to embrace standardized reporting. Standardized reporting uses defined terminology and report formats to improve consistency and reduce confusion in the reporting of image findings and abnormalities. Mammography is the first area of breast imaging in which widespread use of standardized reporting is becoming common practice. This results, in part, from Federal regulations which went into effect Apr. 28, 1999, requiring all mammographic facilities in the United States to use standardized assessment and recommendation terminology at the end of mammographic reports. The assessment and recommendation language is nearly identical to that used in the American College of Radiology""s (ACR""s) Breast Imaging and Reporting Data System (BI-RADS). BI-RADS was developed for standardized mammography reporting and was first released in 1993 (Kopans, 1993). The ACR""s promotion of BI-RADS helped influence the Food and Drug Administration""s (FDA""s) requirement that standardized assessment and recommendation terminology appear at the end of mammographic reports. The promotion of BI-RADS has also prompted development of standardized terminology for other imaging modalities. For example, standardized reporting terminology for breast ultrasound is being pursued by several groups (ACR Bulletin 1999; Hawkins, 1998).
Standardized reporting formats include lexicons of feature descriptors used to categorize image findings and abnormalities (Kopans, 1993; D""Orsi, 1995 and 1998; Hawkins, 1998). In the case of BI-RADS, D""Orsi, et al (1993) have attempted to group BI-RADS lexicon features according to the probability of their association with malignancy. However, it is only recently that the association of BI-RADS features with benign and malignant breast disease has been empirically evaluated (Lieberman, 1998). New and/or altered descriptors that better discriminate between benign and malignant breast disease will be incorporated into BI-RADS as they are discovered. As these type of improvements are made in BI-RADS, proper use of the feature descriptors will help guide radiologists to more accurate characterization of mammographic findings. The same is anticipated for feature descriptors of other standardized reporting systems.
Use of standardized feature descriptors in the interpretation of radiologic studies is subject to variability (Baker, 1996; Shile, 1997; Berg, 1997; Orel, 1997). Training radiologists to appreciate the range of presentations of standardized features can reduce observer variability in the use of these descriptor terms. However, it is important to now provide a method for training radiologists to understand the relationship between standardized feature descriptors and pathological entities seen on radiological exams. The current invention is directed to such a method.
Practice audits in breast imaging have been used for a number of years to improve the skills of interpreters. Hence the Agency for Health Care Policy and Research (AHCPR) has strongly encouraged them (Basset, 1994), and the AHCPR audit recommendations became Federal Law in 1999 (Federal Register, 1998). Breast imaging facilities are now required to track mammography assessments and recommendations according to structured assessment and recommendation categories. This aides practice in calculating profiles such as true positive, true negative, false positive and false negative rates, as well as sensitivity, specificity and positive predictive values. Audits containing this information have been shown to be a powerful educational tool for refining radiologist""s interpretive skills (Bird, 1992; Sickles, 1990; Spring, 1991; Linver 1992). However, this type of audit information only provides radiologists with a general overview of the strengths and weaknesses of their interpretive skills. For example, these audits enable radiologists to identify poor specificity in mammographic interpretations. They do not provide radiologists with mechanisms to examine the relationship between features of image findings and diagnostic decision making. The method of the current invention provides this type of mechanism, and like a practice audit, is a powerful educational tool.
The primary object of the present invention is a training method to improve the accuracy and reduce the variability of anyone who reads and interprets radiologic examinations. The method tracks the reader""s (image interpreter""s) diagnostic accuracy and use of standardized feature descriptors during interpretation of radiologic examinations. The method is not only useful for training readers how to appropriately use descriptors of a standardized reporting system, but it also leads to a detailed understanding of the association of findings with specific types of pathology (e.g., benign and malignant disease). As described herein, the method utilizes repetitive feedback concerning an interpreter""s diagnostic accuracy and use of standardized terminology during exam interpretation. It improves accuracy of interpretations and reduces variability in the use of standardized terminology.
It is a further object of the invention to document a radiologist""s diagnostic accuracy and use of standardized feature descriptors for review and training. To achieve this, the radiologist is asked to describe image findings using standardized terminology during exam interpretation. For each image finding, the radiologist is also asked to provide their assessment concerning the presence of a biological process that has caused the finding. These assessments are used to calculate the radiologist""s diagnostic accuracy, this being done, for example, using Receiver Operator Characteristic (ROC) analysis (Metz, 1986) in which ROC curves are plotted and curve areas are calculated. Other accuracy and performance measures include, for example, positive and negative predictive values, sensitivity and specificity values, likelihood ratios, true negative and false negative values, and true positive and false positive values.
An additional objective of the invention is to track outcome data that establishes the presence or absence of biological processes. These data are used in the calculation of the interpreter""s diagnostic accuracy. In the case of a mass seen on a screening mammogram, for example, the outcome assessment of interest is a determination of whether the finding is the result of benign or malignant process. There are several ways of determining this. A biopsy of the mass, with histologic analysis, would determine not only whether the mass is benign or malignant, but would also provide the specific histology of the process causing the radiologic finding. Alternatively, if the mass was not biopsied and considered benign, clinical and radiologic follow-up could establish benignancy. For example, if the mass remains unchanged on subsequent mammograms and there is no clinical evidence of change or malignancy, this would be suitable confirmation of a benign lesion. While these examples involve patient outcomes in screening mammography, standards for confirming biological processes in other types of radiological examinations are known in the art.
A further objective of the invention is to track patient demographic data that is important to the interpretation of a radiologic examination. In the interpretation of mammograms, for example, it is important to track patient age since age influences the probability of a person having breast cancer.
Another objective of the invention is to provide feedback to trainees concerning their diagnostic accuracy and their use of standardized feature descriptors. During feedback, tracked data are reviewed with associated exam images. Feedback sessions enable trainees to review the features of image findings, their descriptions of the features, and the biologic processes that caused the findings. Calculated values of diagnostic accuracy are presented to trainees and provide them with an assessment of their performance. With knowledge of their diagnostic accuracy and the biologic process that caused each image finding, trainees review findings and focus on more accurately describing findings and predicting the biologic processes imaged in radiologic exams.
The method enables several types of feedback to the trainee, and these differ by the manner in which the collected data are sorted for review. One of these involves sorting the data by individual feature descriptors. This type of sorting enables the trainee to review features and their description of them for the purpose of identifying inconsistencies in their use of descriptors.
The method also enables data to be sorted according to the trainees assessments for the presence of different biologic processes and helps the trainee to better appreciate which features are good and bad predictors of a biologic process (e.g. malignancy). When data are sorted for review in this manner, feedback sessions enable the trainee to examine cases in which they were highly confident about the presence of a biologic process and look at the image features that contributed to this certainty. This enables trainees to more accurately predict a disease process.
Similarly, when data are sorted by patient outcome, feedback focuses on the types of features and the range of features that characterize particular types of histology. This aids the test taker in understanding which features predict particular types of histology better than others. By seeing features that are present in multiple types of histologic processes, trainees are also able to better appreciate subtle differences in features that will help them to achieve greater accuracy in their interpretations.
Another object of the invention is to provide feedback to trainees concerning the distribution of pathology and associated image findings in different patient populations. For example, the types of breast pathology and image findings present in a population of women undergoing screening mammography is different than in a population of women undergoing image-guide breast biopsy. To enable the trainee to learn the range of image findings in different patient populations, the method employs standard sampling techniques, well-known in the art, to create image review sets with appropriate composites of findings and/or pathology for defined patient populations.
A further object of the invention is repetitive feedback. After intervals that enable the accumulation of sufficient data for additional feedback sessions, newly interpreted exams are reviewed with tracked data to provide reinforcement concerning the appropriate use of standardized feature descriptors.
In accordance with the invention, generally stated, a training method is described by which a radiologist""s or other exam interpreter""s ability to interpret radiologic studies of a patient, whether presented on film, or in a digital format is measured. Initially, the radiologist or image interpreter views and interprets a set of radiologic exams. For each viewed image in an examination, a finding is made and the radiologist or image interpreter describes the features of the finding using BI-RADS descriptors. An assessment of the presence of a malignancy is also provided. The results are then reviewed to assess both the accuracy of the diagnosis and the use of the descriptors. After this initial image interpretation, the radiologist or image interpreter reviews their diagnostic accuracy and use of feature descriptors, in addition to patient outcomes. Subsequent image interpretation and case review aids the radiologist or image interpreter in improving his or her proficiency in diagnosis and use of feature descriptors. Other objects and features will be in part apparent and in part pointed out hereinafter.