Machine learning may be used to enable machines to automatically detect and process objects appearing in images. In general, machine learning typically involves processing a training data set in accordance with a machine-learning model and updating the model based on a training algorithm so that it progressively “learns” the features in the data set that are predictive of the desired outputs. One example of a machine-learning model is a neural network, which is a network of interconnected nodes. Groups of nodes may be arranged in layers. The first layer of the network that takes in input data may be referred to as the input layer, and the last layer that outputs data from the network may be referred to as the output layer. There may be any number of internal hidden layers that map the nodes in the input layer to the nodes in the output layer. In a feed-forward neural network, the outputs of the nodes in each layer—with the exception of the output layer—are configured to feed forward into the nodes in the subsequent layer.
Machine-learning models may be trained to recognize object features that have been captured in images. Such models, however, are typically large and require many operations. While large and complex models may perform adequately on high-end computers with fast processors (e.g., multiple central processing units (“CPUs”) and/or graphics processing units (“GPUs”)) and large memories (e.g., random access memory (“RAM”) and/or cache), such models may not be operable on computing devices that have much less capable hardware resources. The problem is exacerbated further by applications that require near real-time results from the model (e.g., 10, 20, or 30 frames per second), such as augmented reality applications that dynamically adjust computer-generated components based on features detected in live video.