A smart home environment is created at a venue by integrating a plurality of smart devices, including intelligent, multi-sensing, network-connected devices, seamlessly with each other in a local area network and/or with a central server or a cloud-computing system to provide a variety of useful smart home functions. Sometimes, the smart home environment includes one or more network-connected cameras that are configured to provide video monitoring and security in the smart home environment. These cameras are often dedicated image capturing and processing devices that include two-dimensional image sensing arrays configured to provide detailed image information (e.g., object locations and motions, user gestures and depth mapping) related to a region of interest in the smart home environment. The detailed image information can also be uploaded to the central server and shared with the other smart devices in the smart home environment to control operations of the other smart devices (e.g., a specific hand gesture is detected from a video clip captured by a camera and used to unlock a smart door lock). However, in many circumstances, although the cameras can provide full resolution two-dimensional images and videos, they do not operate well when the ambient light level is low, and they are not available in many regions of interest in a smart home environment. It would be beneficial to have an accurate, low power, compact, and cost-efficient image capturing device that can work with smart devices installed at different regions of interest in a smart home environment.