A variety of techniques may be performed to determine positions and/or poses of robots. Some robots perform techniques such as simultaneous localization and mapping (“SLAM”). In other instances, techniques such as visual odometry and/or wheel odometry may be employed. Some robots may use an inertial measurement unit (“IMU”) to determine their positions. To perform robot navigation, some robots and/or robot control systems construct and/or maintain a complete three-dimensional (“3D”) model of an environment in which the robot operates. A robot may acquire data from a 3D laser scanner or other 3D vision sensor (e.g., stereographic camera) viewing a portion of the robot's environment and map such data to the complete 3D model. Some 3D models are formed as so-called “voxel-based” 3D environments in which a 3D grid of voxels are allocated. Data points sensed by one or more 3D vision sensors (also referred to as a “point cloud”) are projected onto spatially-corresponding voxels. However, allocating memory for every voxel of the 3D grid, regardless of whether any data points sensed by the 3D sensor occupy those voxels, is inefficient both in terms of memory consumption and localization. In addition, standard ray-tracing techniques for determining whether dynamic objects remain in previously-detected locations may be computationally costly.