A digital camera is a component often included in commercial electronic media device platforms, as well as vehicles. Digital cameras are now available in wearable form factors (e.g., video capture earpieces, video capture headsets, video capture eyeglasses, etc.), as well as embedded within smartphones, tablet computers, and notebook computers, etc. Three-dimensional (3D) cameras are becoming more common, and can now be found on many mobile devices/platforms, including vehicles. These devices provide enhanced entertainment and utility experiences to an end user. For example, photography and vehicle control systems may be enhanced by depth information output from the 3D camera.
The integration of digital cameras and powerful computing platforms has accelerated advancement of computer vision and computational photography. For such systems there are many use cases where a label is to be assigned to pixels in a frame of image data. Such labeling problems arise in scene segmentation, image restoration, motion analysis, and texture synthesis, for example. In segmentation, a digital camera user or machine vision control algorithm may need to segment an image frame into visually distinct objects. The definition of an “object” can vary from a single instance to a whole class of objects. Once selected, special effects may be applied to one or more objects, objects from multiple photos may be mixed into one, objects may be removed from photos, etc. Such object-based image processing may be on-line, or real-time with image capture, or may be performed during post-processing.
Labeling problems are often addressed by optimizing an energy function by applying a graph cut algorithm. The image pixels are modeled as a first order Markov Random Field (MRF), which may be solved using alpha-expansion (also known as ‘graph cut’). The objective is then to minimize the following energy formulation using graph-cut/alpha-expansion:M(ƒ)=Σp∈PD(ƒp)+Σ(p,q)∈NV(p,q,ƒp,ƒq),  (1)where the first term in the summation is the ‘data cost’ and the second is the ‘neighborhood cost’ or ‘smoothness cost’. N is a neighborhood of a pixel p that includes a pixel q. With P being the set of pixels in an input image, and L being the set of labels {L1, L2, . . . , LKpreS}. The function ƒ is a labeling/mapping function that assigns a label ƒp ∈L to each pixel p∈P. A single iteration of graph cut is binary for a given label Li and all other labels from set L effectively assigning one of two possible values to each node of the graph at that iteration. Cuts are iteratively performed until convergence of Eq. (1). Thus, K alpha-expansions or graph cuts are performed for K labels.
The complexity of the optimization problem (and hence the runtime and memory usage) is dependent on four main factors: the number of nodes in the graph; the connectivity or edges between nodes; the number of labels; and the formulation of the energy function using the costs. Conventional expansions process every pixel of an image as a node or vertex in the directed graphs, and typically utilize a connectivity of 4 to 8 neighbors. Thus the computation required to perform an optimization to arrive at a minimum energy is directly proportional to the image and label size. For example, a 720p (1280*720) image with 4-connectivity results in a graph of approximately 1 million nodes and 2 million edges. This large formulation (K iterations for a graph of millions of nodes and edges) results in a large runtime and memory consumption. The conventional techniques also suffer poor scalability, making their limitations more imposing as commercial devices incorporate cameras of greater resolution.
Thus, there is a need for an MRF optimization framework that significantly reduces the complexity of graph cut labeling, improves scalability of the technique, and yields high quality results with speed and efficient memory usage.