An artificial neural network (ANN) is trained to perform a task, such as classifying an input, by training the ANN with training data. The training data may include positive and negative samples corresponding to the task. For example, when training an ANN to classify images of cats, the training data may include images of cats (positive sample) and images of other animals (negative samples). In some cases, it is difficult to obtain high quality training data, including mis-detected images as false-positives and false negatives.
For example, when training an ANN used by an autonomous or semi-autonomous vehicle to detect driver behavior, the training data should include images of a driver in various poses. Each pose may be labeled to identify a behavior performed by the driver. In most cases, a driver and/or a passenger are limited in their movements while sitting in a vehicle. Therefore, it is difficult to obtain a wide range of training samples for the training data.
Extracting and sampling images from raw video files obtained by an on-board, in-cabin camera can be important for building an efficient data-set. However, if every frame, or even an indiscriminate sampling of frames from a raw video feed was extracted, an inordinately high number of images would be generated, many of which would have little comparable value to other images. Further, the labeling cost would be very high due to the number of images.