Machine learning programs enable robust computer vision applications because, once adequately trained, these programs are able to classify physical features of a subject despite adverse conditions, such as variable subject positions, orientations, and lighting. However, to be adequately trained, many machine learning programs require large sets of validated data. For example, in the context of real-time hand tracking, a machine learning algorithm may require hundreds of validated video frames to be trained adequately. In some instances, to be validated, depth pixels in each of these video frames are manually tagged along the height and width dimensions to specify the locations of fingertips and other important features captured in the frame. In these instances, the metadata generated by this manual tagging process is used in conjunction with the video frames to train a machine learning program to be a classifier of hand positions, orientations, and translations. Unfortunately, this process of manually tagging video frames is labor-intensive, time-consuming, and prone to error. Furthermore, because classifiers are dependent on the specific camera used during training, new validated data must be created each time a new camera is introduced. For these reasons, creation of validated data can be a bottleneck in the development of many computer vision algorithms.