Over the past decades, massive increases in the scale and types of annotated data have accelerated advances in all areas of machine learning. This has enabled major advances is many areas of science and technology, as complex models of physical phenomena or user behavior, with millions or perhaps billions of parameters, can be fit to datasets of increasing size. However, when such physical phenomena or user behavior involves actions or dynamic movements (e.g., in automotive driving applications), such annotated or labeled datasets can be scarce. Accordingly, service providers face significant technical challenges obtaining labeled data to train machine learning models to detect or classify actions or dynamic movements from image data (e.g., videos or image sequences).