Camera images traditionally include two-dimensions of data. Therefore, even when object detection is conducted on an image of a scene, this detection provides no more than the coordinates of the image that correspond to the detected object (i.e. depth and/or scale is ambiguous). Solutions, such as using stereo cameras, have been introduced to recover the depth of a detected object from an image. However, stereo camera depth detection is error-prone and is often too slow for real-time applications, such as autonomous vehicle control, which could lead to reduced safety outcomes.