A natural and harmonious human-machine interaction approach is an ideal objective of human beings in manipulating machines, which enables a machine to understand a command delivered by a person in a natural state. A depth perception technology, as a core technology for human-machine natural interaction, has a wide application prospect in fields such as machine vision, intelligent monitoring, 3D rebuilding, somatosensory interaction, 3D printing, unmanned aerial vehicles, etc. An active visual mode based on structured light encoding may obtain depth information of an image more accurately, e.g., projecting, by infrared laser, an image of fixed mode onto an object surface for encoding; acquiring, by an image sensor, an infrared-encoded image, and then, obtaining depth information of the object through depth perception computation; the generated depth information may be used for real-time identifying a three-dimensional image and capturing actions, such that it becomes possible for a person to interact with a terminal in a natural manner such as expression, gesture, somatosensory action, and the like. Compared with ToF (Time of Flight), the structured light encoding-based three-dimensional depth perception technology has certain advantages in aspects of cost and performance.
Existing three-dimensional depth perception apparatuses are all designed into horizontal apparatuses irrespective of being based on structured-light encoding or ToF, i.e., a base line between a laser projector and a receiving camera is horizontal, and central optical axes thereof are parallel. For example, Microsoft Kinect I (based on PrimeSense Structured Light Module) and Kinect II (based on ToF module), Asus Xtion, Intel RealSense 3D depth camera (ToF module), and a structured light-based three-dimensional depth information real-time obtaining apparatus provided in the Patent “Image Depth Perception Apparatus” (Patent No. ZL201210490225.0). However, by horizontally placing the laser projector and the receiving camera, irrespective of whether the receiving camera is disposed at the left side or right side of the laser projector, a cavitation phenomenon will always occur to one side edge in the left or right direction of a measured object as it is blocked during a process of generating a depth map in the depth perception computation, i.e., an identifiable area of a certain width exists surrounding the edge of the side; moreover, the cavitation phenomenon becomes more apparent at a closer distance from the depth perception apparatus. During a human-machine interaction process using an existing three-dimensional depth perception apparatus, because a to-be-measured object or person generally moves horizontally, the cavitation issue at the edge will seriously affect identifying, partitioning or cutout processing the to-be-measured object, such that it is hard to uniformly extract the to-be-measured object in the depth map and RGB video map.