Host devices or “personal computing and/or communication devices” (such as smartphones) having two back cameras (also referred to as “dual-camera” or “dual-aperture camera”) are known, see e.g. U.S. Pat. No. 9,185,291. The two back cameras have two image sensors (or simply “sensors”) operated simultaneously to capture an image, and have lenses with different focal lengths. Even though each lens/sensor combination is aligned to look in the same direction, each will capture an image of the same scene but with two different fields of view (FOV).
Dual-aperture zoom cameras in which one camera has a “Wide” FOV (FOVW) and the other has a narrow or “Tele” FOV (FOVT) are also known, see e.g. U.S. Pat. No. 9,185,291. The cameras are referred to respectively as Wide and Tele cameras that include respective Wide and Tele sensors. These sensors provide respectively separate Wide and Tele images. The Wide image captures FOVW and has lower spatial resolution than the Tele image that captures FOVT. As used herein, “FOV” is defined by the tangent of the angle between a line crossing the lens and parallel to the lens optical axes and a line between the lens and any object that is captured on the respective image corner. The images may be merged (fused) together to form a composite image. In the composite image, the central portion is formed by combining the relatively higher spatial resolution image taken by the lens/sensor combination with the longer focal length, and the peripheral portion is formed by a peripheral portion of the relatively lower spatial resolution image taken by the lens/sensor combination with the shorter focal length. The user selects a desired amount of zoom and the composite image is used to interpolate values from the chosen amount of zoom to provide a respective zoom image. Hereinafter, the use of “resolution” in this description refers to image spatial resolution, which is indicative to the resolving power of a camera as determined by the lens focal length, its aperture diameter and the sensor pixel size.
Dual-aperture cameras in which one image (normally the Tele image) is obtained through a folded optical path are known, see e.g. co-invented and co-owned U.S. patent application Ser. No. 14/455,906, which teaches zoom digital cameras comprising an “upright” (with a direct optical axis to an object or scene) Wide camera and a “folded” Tele camera, see also FIG. 2B below. The folded camera has an optical axis substantially perpendicular (orthogonal) to an optical axis of the upright camera. The folded Tele camera may be auto-focused and optically stabilized by moving either its lens or by tilting an optical path folding (reflecting) element (e.g. a prism or mirror and referred to also as “OPFE”) inserted in an optical path between its lens and a respective sensor. For simplicity, the optical path folding element is referred to hereinafter in this description generically as “prism”, with the understanding that the term may refer to any other optical path folding (reflecting) element that can perform the function of folding an optical path as described herein.
For example, PCT patent application PCT/IB2016/056060 titled “Dual-aperture zoom digital camera user interface” discloses a user interface for operating a dual-aperture digital camera included in host device, the dual-aperture digital camera including a Wide camera and a Tele camera, the user interface comprising a screen configured to display at least one icon and an image of a scene acquired with at least one of the Tele and Wide cameras, a frame defining FOVT superposed on a Wide image defined by FOVW, and means to switch the screen from displaying the Wide image to displaying the Tele image. The user interface further comprises means to switch the screen from displaying the Tele image to displaying the Wide image. The user interface may further comprise means to acquire the Tele image, means to store and display the acquired Tele image, means to acquire simultaneously the Wide image and the Tele image, means to store and display separately the Wide and Tele images, a focus indicator for the Tele image and a focus indicator for the Wide image.
Object recognition is known and describes the task of finding and identifying objects in an image or video sequence. Many approaches have been implemented for accomplishing this task in computer vision systems. Such approaches may rely on appearance based methods by using example images under varying conditions and large model-bases, and/or on feature based methods comprising of a search to find feasible matches between object features and image features, e.g., by using surface patches, corners and edges detection and matching. Recognized objects can be tracked in preview or video feeds using an algorithm for analyzing sequential frames and outputting the movement of targets between the frames.
The problem of motion-based object tracking can be divided into two parts:
(1) detecting moving objects in each frame. This can be done either by incorporating an object recognition algorithm for recognizing and tracking specific objects (e.g., human face) or, for example, by detecting any moving object in a scene. The latter may incorporate a background subtraction algorithm based on Gaussian mixture models with Morphological operations applied to the resulting foreground mask to eliminate noise. Blob analysis can later detect groups of connected pixels, which are likely to correspond to moving objects.
(2) associating the detections corresponding to the same object over time, e.g., using motion estimation filters such as the Kalman filter.