1. Field of the Invention
The present invention is directed generally to digital image processing, and more particularly to generation of a depth map from a single image.
2. Description of the Related Art
Digital images may include raster graphics, vector graphics, or a combination thereof. Raster graphics data (also referred to herein as bitmaps) may be stored and manipulated as a grid of individual picture elements called pixels. A bitmap may be characterized by its width and height in pixels and also by the number of bits per pixel. Commonly, a color bitmap defined in the RGB (red, green blue) color space may comprise between one and eight bits per pixel for each of the red, green, and blue channels. An alpha channel may be used to store additional data such as per-pixel transparency values. Vector graphics data may be stored and manipulated as one or more geometric objects built with geometric primitives. The geometric primitives (e.g., points, lines, polygons, Bézier curves, and text characters) may be based upon mathematical equations to represent parts of digital images.
Digital image processing is the process of analyzing and/or modifying digital images using a computing device, e.g., a computer system. Using specialized software programs, digital images may be manipulated and transformed in a variety of ways.
There are many digital image applications that require determination of a depth map for an image, e.g., determining relative depth values of a foreground image or object/region of interest and a background. More specifically, a depth map is a channel of pixel data associated with an image, where each value represents the distance from the camera to the scene along that ray. Depth map data make many image manipulation tasks significantly easier, such as proper compositing (including occlusion), rendering depth (or height) of field or focus (or defocus) effects, adding image haze, relighting, foreground/background filters, image editing, e.g., wrapping a texture onto a surface, postprocessing effects for static scenes, and novel view synthesis, e.g., changing “camera location” after a picture has been taken, among others. Automatic creation of depth maps from images often requires special hardware, multiple images of a single scene, or is severely restricted in the types of scenes that can be handled. For single images of scenes (the overwhelming majority of cases), robust depth map creation requires user input—generally this is done in the form of 3D modeling of the scene in an application such as Autodesk's Maya™ or 3D Studio Max™. However, modeling 3D geometry is a laborious task that requires inferring significant amounts of scene content that may not be visible, or even physically possible (e.g. in the case of a painting with errors in perspective). Conversely, specifying the depth of each portion of an image is a relatively easy task for a user. Prior art approaches to taking user's sparse input and creating dense, smooth depth values over an entire image have generally required powerful and expensive tools.