The present invention relates generally to natural Human-Computer-Interaction (HCI) such as Augmented Reality (AR) or Virtual Reality (VR) systems. More specifically, hand gesture recognition is provided using a millimeter (MM) radar (e.g., a phased array transceiver) that provides data into a trainable recurrent three-dimensional (3D) convolutional neural network (CNN).
Natural hands-free HCI has been a technological challenge for years. More than ever before, it has become crucial because of recent advances in AR and VR. Hands gesture recognition remains a live research domain. Much of the research in this area has focused on recently-available off-the-shelf sensing modalities. The most matured methods are, for example, using stereo RGB cameras or infrared-based proximity sensors. Also, ultrasound imaging has enabled a hand pose detection method using wearable devices, capturing muscle movements and applying classification on motion flow of muscles using optic flow. Such ultrasonic depth imaging for hand gestures suggest using a separate 2D CNN pipeline for intensity and depth before an LSTM (long short-term memory unit, as commonly used in recurrent neural networks to remember a cell value over a preset time interval).
Microsoft has been able to show a promising solution in their Hololens system, which provides a stereoscopic head-mounted display receiving holographic signals from a computer. However, this solution allows only 2.5D interaction and, while it is hands-free, it does not allow natural interaction in the 3D space. This is due to the limitation of the sensing techniques being used, by combining structure light and the visual domain. This combination allows access only to the frontal facing surfaces and cannot see anything beyond that front surface. This means that a two-hands operation, where one hand occludes the other, is not possible.
Google has developed Soli, which is a very short range, hyper-wide-bandwidth (7 GHz) dedicated chip that can detect fine movements close to the sensor, which can be used to virtually control a mobile device. The technical pipeline includes extracting range-doppler images (RDI), applying advanced preprocessing to improve the signal and extract basic features, and then feeding into a machine language pipeline. Two RDI images are processed using CNNs, similar to the technique used in image recognition systems. Four basic gestures with clear distinct signatures were demonstrated. However, this solution is less practical with a more natural AR/VR interaction, due to its wide-angle response. Also, orientation is a challenge with Soli, due to the small number of elements. Thus, although Soli consumes low power and works great in very close proximity applications such as controlling a mobile device, its limited distance effectiveness would not be suitable for AR/VR applications.