Vision processing has traditionally been done using a central processing unit (CPU), a graphics processing unit (GPU), or combination of both units as integrated into a computing device, such as a personal computer or server computing device. In some cases, a field-programmable gate array (FPGA) has been used in conjunction with the CPU and/or GPU to assist with the vision processing, especially when the processing is only needed for a short timeframe. Some specific processing functions such as red-eye removal or color correction processing have been made into a custom image processing unit but such units are typically limited to one or two specific functions.
A traditional solution is for a camera to capture image data and transmit the data to vision processing software (e.g., OpenCV) stored on a computing device (e.g., computer). The vision processing software performs certain vision processing algorithms (e.g., Canny edge detection algorithm) on the data through use of a CPU/GPU in the computer.
These traditional approaches have worked fairly well for two-dimensional (2D) processing. However, with new vision processing applications such as augmented reality, measurement, and gesture recognition that can work with popular 3D sensors, e.g., from PrimeSense or Leap Motion, there is a need for 3D processing in real-time—which has not been realized in a satisfactory manner by traditional platforms.
As an example, MICROSOFT® KINECT®, available from Microsoft Corp. of Redmond, Wash., or similar motion sensing input and image capture devices use a custom hardware chip, programmed with specific vision processing algorithms, in order to process 3D data in real-time. Without such silicon-based algorithm processing, it would not be possible to provide real-time 3D processing which can then be used in a large number of applications.
The problem becomes even more acute when trying to implement vision processing applications in mobile or embedded devices, such as smart phones, tablet computers, small Linux devices, and the like. Generally, these devices have limited battery life, processing capability, and memory capacity. Hence, it is not practical to expect mobile and embedded devices to be able to process 3D vision algorithms in any satisfactory manner when real-time processing is required.
For example, power consumption is high for vision processing because most vision processing algorithms and processing units are computing intensive and use a lot of megaflops for the algorithm processing. Also, a lot of memory is needed to store data for vision processing, especially if the data is coming directly from a camera and is getting stored in the system memory. Raw data stream from such camera(s) can be as much as 200 Mbits/sec when converted to 3D data points. Further, most of these devices have processor cores based on the ARM architecture (e.g., developed by ARM Holdings, plc of Cambridge, England)—or something similar—which have a fraction of the processing capability of, e.g., high-end processors available from Intel Corp. of Santa Clara, Calif. that are used in laptops and desktops. However, high-end CPU processors lead to high power consumption, and shorten the battery life in mobile or embedded devices.