Many image capture devices have 3D or depth sensing cameras that can form a 3D space of a scene, measure the distance from the camera to an object in a scene, and/or provide dimensions of an object in a scene. This is typically performed by using a stereoscopic system with an array of cameras or sensors on a single device and that uses triangulation algorithms to determine 3D space coordinates for points in a scene to form a depth map or depth image for the scene. Other methods to generate a depth image, such as from a single camera, also are known. Often it is useful to augment the captured image or scene by placing external images or virtual objects into the scene and positioned in a realistic manner, such as placing a drawing on a picture of a wall, or placing furniture in a picture of a room. When performed correctly, the objects are in a realistic perspective that matches the perspective in the picture so that the scene with the inserted objects looks realistic to a person viewing the picture. To accomplish these functions, the conventional systems search for planar surfaces, and warp the shape of the virtual objects to place the virtual objects on, or relative to, the planar surfaces. This planar surface searching task, however, is often based on an iterative process that is extremely computationally heavy, resulting in a very low frame rate so that such a process is impractical for many devices.