In augmented reality (AR) systems, a pair of images may be combined so as to create an augmented reality image in which the content from one image appears to be included in the other image. In some arrangements, an image of a virtual object and an image of a real scene are combined so as to generate an augmented reality image in which it appears to the viewer that the virtual object has been included in the real scene. The augmented reality image may be generated by rendering the virtual object within a portion of the captured real scene. When rendering the virtual object in the scene, the relative depth of the virtual object with respect to the depth of the scene is considered to ensure that portions of the virtual object and/or the scene are correctly occluded with respect to one another. By occluding the images in this way, a realistic portrayal of the virtual object within the scene can be achieved.
Techniques for generating an augmented reality image of a scene typically require the generation of an accurate model of the real scene by accurately determining depth values for the objects within the real scene from a specified viewpoint. By generating an accurate model, it is possible to compare depth values and determine portions of the two images to be occluded. Determining the correct occlusion in an augmented reality image may be performed by comparing corresponding depth values for the image of the virtual object and the image of the real scene and rendering, for each pixel of the scene, a pixel using a colour selected from the colour at that pixel in the image of the virtual object or the real scene based upon which image has the smaller depth value with respect to the specified viewpoint, i.e. is closer to the specified viewpoint.
To avoid potential errors with depth measurements, a scene can be scanned from a number of positions to generate an accurate map of the scene. For example, camera tracking may be performed whilst moving a camera around a scene and capturing a number of different scans or images of the scene. However, such processing is time consuming and processor intensive and is not suited to real-time applications, where the position of objects in the scene may vary or where it may be necessary to update the model of the real scene regularly. For example, in video applications where a constant frame rate is required there may be insufficient time between frames to update a scene model.