Three-dimensional localization of moving objects in a video stream using only a single camera can be difficult. In particular, existing methods use sparse feature points, but this technique is difficult to use on objects such as, e.g., cars, because it is hard to establish stable feature tracks. Other existing methods triangulate object bounding boxes against a fixed ground plane, which leads to high localization errors. These existing techniques may also involve expensive inference mechanisms.