A stereoscopic camera arrangement is an element made of two image capturing units, assembled in a stereoscopic module. Stereoscopy (also referred to as “stereoscopics” or “3D imaging”) is a technique for creating or enhancing the illusion of depth in an image by means of stereopsis. In other words, it is the impression of depth that is perceived when a scene is viewed with both eyes by someone with normal binocular vision which is responsible for creating two slightly different images of the scene in the two eyes due to the eyes'/camera's different locations.
Combining 3D information derived from stereoscopic images, and particularly for video streams, requires search and comparison of a large number of pixels to be held for each pair of images, where each image is derived from a different image capturing device.
Stereo matching algorithms are used to solve the compatibility in stereo images by using feature-, phase-, or area-based matching, for calculating the disparities in the captured images.
Feature-based matching searches use characteristics in the images, like edges or curves, for calculating the best matches according to their similarities. Phase-based algorithms band pass filter the images and extract their phase. Area-based algorithms operate on blocks (patches) of pixels from both images, for calculating their matching level. This may be done in parallel for all analyzed pixels. When using a constant block size over the whole image, called box filtering, these algorithms are especially amenable to parallel and hardware-based solutions.
When determining depth from stereoscopic sources, using different resolutions leads to achieving different results. By analyzing an image, while using a plurality of resolutions and merging the outcome of these different resolutions, the results thus obtained are notably better than results that are obtained while using a single resolution. However, when such a solution is implemented by using an FPGA/ASIC, the local storage and access to external memory need to be optimized. In such a setup, several resolutions may be analyzed line by line in parallel pipelines with different parameters and the analysis results would be merged, by using several merge setups.
Currently, the typical solution applied in the art to overcome the above problem, is, use of a hardware chip that determines depth from stereoscopic images, typically utilizing a number of aggregation machines, which number is proportional to the disparity levels that will be considered.
Aggregation machines calculate distance measures including SAD (Sum of Absolute Difference) and Census (information distance) between patches derived from the left and right captured images for each given disparity level. In most cases, separate sets of aggregation machines, are used for left-to-right disparity computation and for right-to-left disparity computation. Consequently, twice the nominal number of aggregation machines are used, and obviously, twice the amount of power is consumed. Each aggregation process is calculated in a weighted window where the weights mask is determined by YUV value of the pixels in the window.
Since aggregation machines require extensive processing power, reducing the number of aggregation machines that are used, would be rather advantageous in reducing energy requirements of the system, as well as the silicon area of the hardware chip. It would therefore be beneficial to find a solution that would help in reducing the number of aggregation machines that are used, while still retaining, or even improving, the stereoscopic detection results.