Many of the recent advances in computer vision are in the field of machine learning. For example, in the deep learning space, neural networks that can detect and classify objects have been developed that, in some cases, can outperform a human's ability to perform the same task. Advances in tracking objects in motion video using machine learning, however, have lagged behind applications of neural networks to still image problems such as detection and classification. One reason for that lag is the difficulty associated with object tracking dramatically complicates the use of neural networks that have been developed for still images in video applications. Generalized single-object trackers often rely on an input in the form of a bounding box of an object in a frame which is used to track the object in subsequent frames. Meanwhile, most multiple-object trackers specialize in tracking people and are tailored to that specific domain by relying on facial recognition, skeletal modeling, etc., and are thus unsuitable for use as a generalized multiple-object tracker. Further, most multiple-object trackers operate offline on recorded video of a predefined duration and determine tracking paths of the objects within the video using references to both future and past frames.