Composition of a three-dimensional (3D) object as part of a two-dimensional digital image is a common technique used for movie special effects, product mockups for digital marketing content, and so forth. A digital marketing professional, for instance, may interact with an image processing system to insert a product as a 3D object (e.g., a shoe) in a background image for use in an advertisement, e.g., for a banner ad. This functionality is made available through advances of image processing systems to support physics-based rendering and image-based lighting. These advances enable the image processing system to compose the 3D object in a visually pleasing manner as part of the 2D digital image due to realistic application of light and color effects to the 3D object based on an environment of the 2D digital image.
However, conventional techniques used to orient the 3D object in relation to the 2D digital image by an image processing system are inefficient and tedious for sophisticated users and difficult for novice users. This results in an inefficient use of computational resources by the image processing system that employs these conventional techniques due to repeated corrections that are applied to the orientation and a result that lacks accuracy, e.g., does not appear realistic when viewed.
Conventional techniques, for instance, may be grouped into five categories including manual rotation based techniques, vanishing point based techniques, marker based techniques, techniques that rely on external data in addition to the digital image (e.g., depth field or gyroscope), and machine learning based techniques. In a conventional manual rotation technique, the 3D object is oriented with respect to the 2D digital image through use of a trackball. However, this technique in practice is often considered tedious by professional users and prone to error by novice users because an incorrect center of rotation causes unexpected and unnatural results.
In a conventional vanishing point technique, orthogonal groups of parallel lines in the 2D digital image are used to determine vanishing points, which are sufficient to recover intrinsic camera parameters, e.g., to define a horizon in the image. However, in practice the 2D digital image may not contain orthogonal groups of parallel lines (e.g., for a “close up”) and/or the parallel lines result in vanishing points that are of such a distance from a boundary of the image that errors are introduced. In addition, orthogonality between different groups of parallel lines may not hold in some instances (e.g., different objects that define these lines are not orthogonal to each other) and thus also introduce errors. Further, conventional vanishing point techniques may rely on the user to trace the parallel lines, which is both tedious and may introduce inaccuracies. On the other hand, automated edge detection techniques can partially automate the tracing process but also introduce errors as a result of foreground textures and noise in the 2D digital image.
In a conventional marker based technique, a marker of known dimension is included as part of the 2D digital image. Intrinsic and extrinsic camera parameters are then extracted from the 2D digital image by the image processing system based on the marker, such as for camera calibration, visual effects, and augmented reality. In practice, however, these markers are typically not available.
In a conventional external data based technique, data obtained from sensors external to an image sensor of a digital image device is used to provide additional information, such as depth sensors, time-of-flight cameras, structured grid techniques, and so forth. Although this data may improve accuracy, these techniques also introduce additional challenges. A gyroscope, for instance, may determine an orientation of the capturing digital image device but not arbitrary planes in the image scene. An output of a depth sensor is typically considered noisy and has low resolution and thus may also introduce errors. Thus, these challenges may introduce inaccuracies and unrealistic results.
In conventional machine learning based techniques that are applicable for a single digital image, these techniques often rely on strict assumptions about characteristics of the digital image that, if not met, result in errors. Examples of these assumptions include type of digital image (e.g., indoor versus outdoor), type of planes to be recovered from the digital image (e.g., ground plane or camera axis aligned planes), and so forth. Thus, these conventional techniques may fail due to a variety of challenges and result in inefficient consumption of computational resources, e.g., due to repeated application of these conventional techniques.