Automation of medical imaging requires algorithms to learn how to perform a particular task, and these algorithms require “ground truth” data for training and validation. This ground truth data comes from human experts annotating the data, but such annotations are time-consuming and expensive to obtain. Key problems include how to obtain annotation data efficiently, with minimal effort from the human experts, and how to obtain the right amount of labeled data without paying for more than is actually needed. For machine learning algorithms an additional challenge is knowing when a sufficiently accurate result has been achieved. Finally, the entire cycle of annotation, testing, and validation is slow, limiting the overall pace of innovation.
There have been many machine algorithms trained with data annotated by human experts. In a typical development cycle, researchers guess how much training data will be needed and then employ human experts to provide it. Prior research focused on how best to train given a set of annotated data.
Recently, Deep Learning has emerged as a popular and highly effective method for performing image segmentation. A segmentation of an image is produced by portioning an image into different segments. For medical images, these segments may correspond to biologically relevant structures such as organs, blood vessels, pathologies, etc. However one of the biggest limitations of Deep Learning is that large amounts of labeled data are necessary to get good results without overfitting.
Medical images are difficult to annotate compared to ordinary photographs and videos. For example, different image modalities may introduce artifacts that are not readily identifiable by one without medical training. Moreover, reliable detection of organs and other relevant anatomical structures, as well as identification of relevant diseases and abnormalities, will be difficult, if not impossible unless the annotator has medical training. This makes medical image annotation more costly to obtain as the number of people able to perform this task is limited.
Current practices involve a sequential approach of first obtaining the annotations followed by algorithm development. Any benefits from creating the algorithm do not enhance the annotation acquisition. In this disclosure, we describe how the twin needs for segmentation algorithm development and segmentation training data can be combined into a single process for a more efficient development cycle. Improvements in the algorithm development will speed up the annotation, whereas at the same time the actions of the annotators are used to synchronously drive the learning algorithm.