Conventional image creation technologies, such as those using still or moving picture cameras or computer animations, are generally inadequate for capturing and representing the full scope and nature of the human visual experience. The reasons are many and include the fact that the image is usually based on the geometry of linear perspective, which projects 3-dimensional space on a 2-dimensional plane in a way that does not appear natural to a human viewer. Artists have known since the time of Leonardo da Vinci that linear perspective produces excessive distortions when the peripheral areas of the visual field are projected onto a 2-dimensional surface. As a result, conventional linear perspective images tend to present a constricted view of the world that is cropped, usually in the form of a rectangle, and thereby excludes much of the full field of view, including the peripheral field. Such excluded matter includes portions of the body of the viewer or objects in close proximity to the viewer even though they are visible in natural vision. The resulting images are normally presented as flat objects, i.e. as printed on paper or displayed on a flat screen, which contradicts the spatial cues in the image and thereby reduces the sense of perceived depth. Objects of interest usually appear smaller in the created image than they appear in real life due to the method of projection used in linear perspective and the lack of regard given to the psychological effects of attention on perceived object size, which tends to enlarge objects of interest. Moreover, such created images generally do not record or represent other features of human vision that enhance our visual experience of space, such as binocular depth information from double vision, or ‘physiological diplopia’. In addition, conventional imaging systems fail to record other features of human vision such as heightened edge contrast, relative indistinctness of objects in the peripheral and non-fixated parts of the visual field, the subjective curvature of perceived space, the gradual fading and darkening of vision at the extreme periphery, the change in apparent size of objects relative to fixation point, the image of the viewer's body as seen from the first person perspective, etc. The present invention combines all these effects to produce synthetic emulation of human vision.
One advantage of this concept as a means of representing the entire human visual field is that excessively wide image formats such as those produced by panoramas are avoided because the peripheral areas of the scene are diminished in size and emphasis, as they are in real human vision. Excessively distorted images such as those produced by fisheye lenses, which often diminish the size of the object of interest, or wide-angle lenses that expand the peripheral areas, are also avoided. The greater emphasis is instead placed on the part of the image corresponding to central vision, and most subject to viewer attention, causing it to appear larger, so mimicking the way the scene would be perceived by the human visual system.
A further advantage of this concept is that it gives that area of the image corresponding to the fixation point and the area of attention greater prominence and saliency than with a conventional linear perspective image. For applications such as advertising, this has the benefit of directing the viewer's gaze to the area or object in the image the advertiser wishes to emphasise.
A further advantage is that by applying the methods described herein the resulting images can appear to have a substantially enhanced illusion of depth compared to images produced by conventional linear perspective, or other methods, and can include within the image frame a much wider field of view without diminishing the relative size of the object of interest, or attention, in the scene.
As with other “foveated” imaging systems, there is also a potential to make data storage and transmission savings due to the increased compression and lower resolution of information in areas of the image corresponding to the periphery of the visual field.
A further advantage of the method set out in this invention is that the view of the body of the viewer will be presented from the first person perspective with perceptual accuracy, thus enhancing the effectiveness of images that use this method to convey the first person perspective.
There are a number of known solutions that address some of the foregoing problems. These include the use of wide-angle lenses, such as fisheye lenses, which capture a very large angle of view but with excessive optical distortion at the edges of the image. Although such distortion can be corrected via suitable software processing this is not a perfect solution because the resulting image still projects according to linear perspective and lacks many of the features associated with real human vision, as described herein.
Another technique is to stitch together multiple images to capture a very wide panoramic field of view, but with the disadvantage that there is a practical limit to the use of such very wide picture formats. Moreover, such panoramas also lack the geometric structure and other features associated with real human vision, as described herein.
Other imaging technologies such as Quicktime VR and Condition One (http://www.conditionone.com/) allow the viewer to scan a wide field of view by scrolling through a virtual space, seeing the scene from multiple angles as directed by the viewer, but these methods are still subject to some or other of the inadequacies noted above.
In some forms of computer animation such as are used for video game engines there have been attempts to emulate the viewers' point of view by including parts of the body of the person from that point of view, but these views are generally rendered according to conventional linear perspective, and are confined to a limited rectangular viewing area in the centre of the visual field being depicted. As a consequence they exclude parts of the body, such as the nose or shoulders, which can often be seen in natural vision. There have been some attempts in commercial imagery to emulate the relative indistinctness of the peripheral visual field by selectively blurring of the outer edges of the image, but such techniques still do not compensate for all of the other inadequacies noted above.
Some lens-based and computer-based systems capture a wide field of view while at the same time showing the area of interest at greater size or resolution, thus emulating certain properties of the peripheral and foveal areas of human vision. For example, foveated and wide-angle foveated lens systems are designed to improve data compression and enhance the central area of the image while capturing a wide field of view (e.g. WO 2008/077132). Such systems, however, generally rely on capturing a monoscopic 2-dimensional linear perspective view of the scene (via a camera and lens) and do not produce a projection of the 3-dimensional world corresponding to the geometric structure of the full field of view as seen by a human being, as specified in the present invention. Nor do they compensate for the effects of a moving fixation point within the image or changes in the locus or range of attention within the image. Wide-angle foveated images are typically circular in shape, rather than elliptical and they do not include other features of natural vision, such as double vision (deliberate blurring of the image before and behind the object in focus), changes in apparent size of objects due to changes in fixation at different depth planes, heightened edge contrast of the fixated object, and other methods identified in the present invention.
Other methods use arrangements of multiple cameras to capture both a wide field of view while focusing on more detail on an area within the represented visual field (e.g. US 2004/0061787). The images from the cameras are stitched together and warped to form a wide-angle foveated output image for viewing. Such systems also claim to avoid the excessive distortions associated with conventional linear perspective projections due to the curved array of the cameras. They also aim to provide higher levels of acuity at the area of the image corresponding to the human fovea. However, they fail to record or represent the geometric structure or features of human vision, such as modifications in the image due to changes in 3-dimensional depth, increased saliency of the area of attention (not just fixation), binocular disparity and peripheral indistinctness, etc.
Other methods for enhancing the area of the image corresponding to the foveal region of the eye include ‘digital zooming’ in which areas of an image being fixated upon are enlarged or enhanced relative to the areas corresponding to the periphery of the image (e.g. US 2009/0245600). Such methods, however, rely on capturing a 2-dimensional linear perspective image (via a camera and lens) and do not represent the entire 3-dimensional field of view according to geometric structure perceived in natural human vision. The images resulting from these methods are typically rectangular rather than elliptical in shape and do not represent the additional features of human vision such as double vision, subjective curvature in the visual field, the effects of local attention, etc.
Other methods of generating a foveated image (e.g. EP2362641, U.S. Pat. No. 7,965,314 and GB2400259) are also based on the geometry of linear perspective and lack the capacity to capture and represent key features of human vision.
It has been known for several centuries that viewing a flat picture through a peephole or aperture can enhance the illusion of depth. Screen viewing devices have been designed that enhance the 3-dimensional depth experience of the viewer by covering flat screens with frames that obscure part of the underlying screen (see WO 2010094269; U.S. Pat. No. 6,144,417). However, the shape of the aperture in the frame is generally rectangular, and not elliptical. Moreover, the use of elliptical frames as taught by the present invention is as integrated components of the presentation system, to be used in conjunction with the images and display supports specified in the invention, and not as stand-alone devices to be used with any other images.
Current forms of imaging 3-dimensional space tend to rely on the rules of linear perspective which are based on the behaviour of light and the optical properties of the devices used to capture it, such as lenses and sensitive plates. However, such rules of devices fail to account for a number of features of the human visual system that are known to affect how we perceive the world, such as the structure of the human eye, the consequences of seeing with two eyes, the psychological effects of attention and memory, and so on. The present invention is derived from the realisation that there is a need for a process for making images that more closely approximate the actual experience of human vision than those produced by currently available imaging techniques.
According to a first aspect of the invention there is provided a method of making a 2-dimensional image of a 3-dimensional scene generally corresponding to that perceived by the human brain via the human eyes, the method including the steps, in any suitable order, of:
capturing, recording or generating image data representative of a 3-dimensional scene, or otherwise representing a 3-dimensional scene consisting of the entire field of view or part thereof, visible to a human observer from a given ‘Viewing Point’ (VP) when fixating on a given depth region within the 3-dimensional scene, such as the foveal field of vision of the human eye, or some other fixation point, processing the image data to progressively compress the depth region of the 3-dimensional scene, corresponding to a peripheral field of vision, relative to a fixation point within the depth region, to thereby produce modified data representative of a modified 2-dimensional image of the 3-dimensional scene, generally corresponding to how the 3-dimensional scene would appear to the human perceiver.
The invention also includes in a second aspect the steps, in any order, of selectively including in the image if required additional features that mimic natural human vision where appropriate, including double images, heightened contrast, non-rectangular image formats, selective indistinctness and peripheral indistinctness.
According to a third aspect of the invention there is provided a method of making an image in which the output image is displayed on a surface, or through a medium, that increases the perceived sense of depth in the image by using a support or screen which takes the form of a bulged and/or recessed support or screen, in which the location of the bulge or indentation coincides with the fixation point and region of attention being represented in the image and the outer edge of the image coincides with the boundary of the visual field being represented, the boundary being raised relative to the major plane of the display surface. Conveniently, an aperture or vignette of suitable size, shape, is suitably positioned in front of the planar or non-planar image, and through which the viewer may look at the final image and thereby experience a greater sense of depth.
According to a fourth aspect of the invention the user or viewer of the Field of Vision Image (FoVI) is able to modify its properties by using a suitable input or control device, such as a computer mouse, a touch sensitive screen, a head tracking or eye tracking system, a joystick or games console controller, or depth or motion tracking device, such that the FP and the RA in the FoVI corresponds to the point at which the viewer or user is fixating (FIG. 16). Moreover, the user or viewer may selectively be able to modify the physical shape of the surface containing the FoVI by setting the FP to different points in the FoVI. In one embodiment, the user is able to set the RA within the FoVI such that all the other properties of the image as specified herein, including apparent object size and position, rendering resolution, degree of focus, degree of doubling, etc. are modified relative to the updated FP and changes in the position of the viewer relative to the image. In another embodiment of the invention, the physical shape of display the surface on which the FoVI is presented, modified in response to the input generated by the viewer, such that the part of the surface that is either protruding (bulging) or indented (depressed), corresponding to the RA, coincides with the movement of the RA across the surface of the FoVI (FIG. 17).