Personal computers have gone through an evolution in terms of the form factor and the user interface. In terms of the form factor the evolution path includes desktop, laptop, tablet and pocket. Smartphones are pocket computers. The user interface started with command line and that was followed by graphical user interface. Voice interface became widely available by the introduction of Siri as a digital personal assistant. Siri is the first major step towards personal computers with natural interface. However, Siri is a blind personal assistant, it can hear and talk but she can't see even though every iPhone and iPad has at least one camera. A blind digital personal assistant can have a very limited use because humans are visual beings. A personal assistant can see if and only if she can see exactly what the user of the device sees. In other words, the personal assistant has to be able to see through the eyes of the user to become a true personal assistant. This applies to personal computers with natural user interface as well. Several unsuccessful attempts towards computers with natural user interface can be traced back to not being aware of this requirement. Microsoft's SenseCam is an example.
In a graphical user interface personal computer, the user has to go to the computer to get things done each time. In other words, those computers are reactive. In contrast, a computer with a natural user interface can be proactive; it can anticipate a user's need and offer help just in time or like a human personal assistant. The wearable computer disclosed in this invention relies heavily on camera to capture what a user sees and utilizes image processing to make sense of what is seen. The user can interact with the computer via eye gestures, hand gestures, and voice, as well as a touch screen interface. By having access to what a user sees, one can take pictures or record videos of what he sees without having to hold a camera in his hand and continuously monitoring a screen to ensure the camera is pointed properly. As one tries to capture a moment carefully, he has to split his attention between recording the event and enjoying the experience. In other words, there is a contradiction between focusing on the recording process and enjoying the experience fully. Resolving this contradiction is another objective of this invention.
Human vision and how it works has been well-documented. Generally, a point-and-shoot camera tries to capture a human's binocular field of view which is defined as the overlap of the field of views of the two eyes. Human brain merges the two images that it receives. The high resolution of human eye is referred to as foveal vision or foveal view. This area subtends to a very narrow field of view. Devices that are discussed in this disclosure will capture a subset of the field of view as small as the foveal view and as wide as the whole visual field of view which is made up of the foveal and peripheral view.
The retina in the human eye is a hybrid image sensor that has two types of image sensing cells: cones and rods. Cones create images that have much more resolution than the rods. Cones are located on a very small area on retina called fovea and in this manuscript foveal vision or foveal view is defined as images formed on the fovea. The image formed on the rest of the retina is called peripheral view or peripheral vision. The common field of view between the left and the right eyes is called binocular view. Binocular view does include foveal view. Foveal view subtends to a very small angle which is typically around a few degrees. Binocular view has a field of view between 30 to 60 degrees.
When people talk about what they see, the word “see” generally refers to the binocular field of view. To allow people to capture what they see, the standard point-and-shoot cameras have had a field of view about the binocular field of view of human eyes for decades.