Generally described, computing devices and communication networks can be utilized to exchange content and other information. In a common application, a server computing system can provide content to various client computing devices. The content may be textual content, image-based content, videos, animations, some combination thereof, etc. For example, a server computing system can host or provide access to videos that are viewable by client computing devices. A client computing device can transmit a request for a video to the server computing system, and in response the server computing system can transmit or “stream” the requested video to the client computing device. The client computing device can display the video and respond to various playback commands (e.g., pause, rewind, etc.).
Some systems process video content to identify people or objects that appear in the video content. For example, computer vision processing may be used to identify objects or faces of people appearing in videos. Some computer vision systems are configured or “trained” to identify features of objects in video content, and to generate scores indicating likelihoods that various objects are present in the video. Computer vision systems often perform this process on a frame-by-frame basis. From the perspective of a user of a computer vision system, the performance of the system can be measured in terms of the accuracy with which the computer vision system can recognize objects in video content, and the speed with which the computer vision system can complete such processing.