In autonomous driving systems, the successful perception and prediction of the surrounding driving environment and traffic participants are crucial for making correct and safe decisions for control of the autonomous or host vehicle. In the current literature and application of visual perception, techniques such as object recognition, two-dimensional (2D) object detection, and 2D scene understanding (or semantic segmentation) have been widely studied and used. With the assistance of fast-developing deep learning techniques and computational power (such as graphics processing units [GPUs]), these visual perception techniques have been successfully applied for use with autonomous or host vehicles. Compared with these 2D perception methods, full three dimensional (3D) perception techniques, however, are less studied because of the difficulty in getting robust ground truth data and the difficulty in properly training the 3D models. For example, correct annotation of the 3D bounding box for 3D object detection requires accurate measurement of the extrinsic and intrinsic camera parameters as well as the motion of the autonomous or host vehicle, which are usually difficult or impossible to obtain. Even if ground truth data can be obtained, the 3D model is difficult to train because of the limited amount of training data and inaccurate measurements. As a result, less-expensive and much less functionally-capable alternative solutions have been used in these visual perception applications.