The present invention relates to processing of medical images, and more specifically, to automatic ground truth generation for medical image collections.
The medical imaging community has traditionally lagged behind the general computing community on the testing of methods on large collections of data. This was both due to the difficulty of acquiring the image collections and the availability of clinical experts for ground truth labeling. With electronic health records (EHR) being rolled out in many large hospitals, it is now possible to obtain large scale collections of DICOM imaging studies in integrated EHR systems. However, effectively assigning ground truth disease labels to such collections presents many challenges. It is tedious and impractical to expect clinical experts to manually label these images individually. In addition, manual data entry may also be error-prone when consensus is lacking among experts. Unlike general imaging, many medical images need deep clinical interpretation expertise which is difficult to achieve through conventional large-scale ground truthing mechanisms such as crowd-sourcing. Yet, obtaining these ground truth labels is important for a number of applications such as clinical decision support, computer-aided diagnosis and precision measurement extraction.
In most cases, electronic health record systems include textual reports such as clinical notes, radiology and cardiology reports documenting the findings in medical imaging studies are often available. In general, these reports document many diseases and findings in the echocardiogram images including both positive and negative findings.