In image processing a convolution matrix, referred to as a kernel or mask, is a small matrix used to process image data for computer vision and related tasks. Convolution considers the local neighborhood of the image data as weighted by the kernel, enabling computer vision applications to make predictions about features of the image, such as in semantic segmentation or classification applications for scene analysis, object detection, coloration, searching and the like.
Computer vision applications have successfully used convolutional networks on two-dimensional (2D) image data. Because 2D convolution is defined on a regular grid it supports extremely efficient implementation using powerful deep architectures for processing large datasets at high resolution.
Data captured by 3D sensors, such as RGB-D (red, green, blue, depth) sensors in cameras and Li-DAR (Light Detection and Ranging) remote sensors, provide depth and other 3D data that is not captured in 2D images. However, using convolutional networks on 3D data is significantly more complex and computationally intense, especially for unstructured point clouds and other noisy real-world data. As a result, 3D data can present performance challenges when applying convolution to computer vision tasks such as scene analysis, object detection and the like.
Other features of the described embodiments will be apparent from the accompanying drawings and from the detailed description that follows.