When using graphics applications, users often desire to manipulate an image by compositing objects into the images or performing scene reconstruction or modeling. To be effective and provide realistic results, these processes depend on recovering the illumination that contributes to the original image. Traditional methods of recovering illumination have been limited to illumination directly visible from the actual image. Such methods do not always provide a realistic illumination recovery because images often represent only a portion of the environment in which the scene resides, and the illumination of a scene may be formed by light sources that are not directly visible within an image itself. For example, many standard images provide a limited view, such as approximately a 60-degree view to a 90-degree view, while the entire 360-degree environment may include light sources illuminating the scene but that are outside that limited view shown in the image.
There are existing methods for estimating illumination from an entire panoramic environment, but they lack the robustness to be generally applicable to many indoor scenes. For example, current methods for recovery of outdoor scenes infer sky illumination and do not work for images of scenes with other types of illumination, such as indoor scenes. Additionally, there are data-driven approaches for indoor scenes that compare the illumination of an input image to known environment maps and determine the closest estimation between the known environment maps and the image. Such an approach, however, assumes that there will be a close estimate to the actual environment of the input image that is stored in the database of known environments. Considering the wide variety of illumination that occurs in indoor scenes, this assumption may not always hold true.
Further, other methods utilize deep learning methods to recover reflectance maps, but these methods are based on a small amount of data and many assumptions that limit applicability of these methods. For instance, current deep learning methods for estimating illumination can be used only for images having a predefined object of a certain class. The predefined object must be at or near the center of the captured scene and must be segmented from the background. Such strong assumptions limit the use of these methods to a very narrow class of images.
More recently, neural networks, including deep neural networks, have been used in estimating illumination from a single image. These neural network-based techniques have been used to generate light masks, which correspond to a probability that a light source exists at a particular location. These techniques do not estimate an intensity of light and, thus, present challenges to the generation of high dynamic range (HDR) images. Accordingly, current systems and techniques do not provide accurate or robust methods for estimating illumination intensity of a panoramic environment for a single image of an indoor scene.
In general, neural networks, especially deep neural networks have been very successful in modeling high-level abstractions in data. Neural networks are computational models used in machine learning made up of nodes organized in layers. The nodes are also referred to as artificial neurons, or just neurons, and perform a function on provided input to produce some output value. A neural network requires a training period to learn the parameters, i.e., weights, used to map the input to a desired output. The mapping occurs via the function. Thus, the weights are weights for the mapping function of the neural network.
Each neural network is trained for a specific task, e.g., image upscaling, prediction, classification, encoding, etc. The task performed by the neural network is determined by the inputs provided, the mapping function, and the desired output. Training can either be supervised or unsupervised. In supervised training, training examples are provided to the neural network. A training example includes the inputs and a desired output. Training examples are also referred to as labeled data because the input is labeled with the desired output. The network learns the values for the weights used in the mapping function that most often result in the desired output when given the inputs. In unsupervised training, the network learns to identify a structure or pattern in the provided input. In other words, the network identifies implicit relationships in the data. Unsupervised training is used in deep neural networks as well as other neural networks and typically requires a large set of unlabeled data and a longer training period. Once the training period completes, the neural network can be used to perform the task it was trained for.
In a neural network, the neurons are organized into layers. A neuron in an input layer receives the input from an external source. A neuron in a hidden layer receives input from one or more neurons in a previous layer and provides output to one or more neurons in a subsequent layer. A neuron in an output layer provides the output value. What the output value represents depends on what task the network is trained to perform. Some neural networks predict a value given in the input. Some neural networks provide a classification given the input. When the nodes of a neural network provide their output to every node in the next layer, the neural network is said to be fully connected. When the neurons of a neural network provide their output to only some of the neurons in the next layer, the network is said to be convolutional. In general, the number of hidden layers in a neural network varies between one and the number of inputs.
To provide the output given the input, the neural network must be trained, which involves learning the proper value for a large number (e.g., millions) of parameters for the mapping function. The parameters are also commonly referred to as weights as they are used to weight terms in the mapping function. This training is an iterative process, with the values of the weights being tweaked over thousands of rounds of training until arriving at the optimal, or most accurate, values. In the context of neural networks, the parameters are initialized, often with random values, and a training optimizer iteratively updates the parameters, also referred to as weights, of the network to minimize error in the mapping function. In other words, during each round, or step, of iterative training the network updates the values of the parameters so that the values of the parameters eventually converge on the optimal values.