In order to train computer vision algorithms, researchers must acquire a large amount of manually labelled data sets. The labelled data sets serve as a ground truth during an algorithm training process and essentially teach the algorithm what to look for in imagery included in the dataset. Accurate labelling is vital to a trained model's inference ability on new, unseen data.
However, the acquisition of labelled data, especially for novel environments and contexts, is extremely time consuming. Researchers either manually label the data themselves (frame by frame) or outsource the work to third party contractors. In the latter case, if the data is confidential or sensitive, this is usually not an option.