The ability to “count things” is important for many business and government reasons. One especially important reason is to acquire the underlying data necessary to take actions that smooth the flow of goods and services, expedite emergency vehicles and, in general, improve the quality of human life. Counts about human activity such as vehicle traffic, traffic conditions, traffic situations and the type of traffic are beginning to feed into rudimentary traffic or information management systems that attempt to process massive amounts of sensor and raw video data to answer questions such as: who is using the road? How much capacity do we need? Where are they going? And do travelers and goods get there on time? To avoid too much data, most information is gleaned from scalar sensors or transmitted intermittent images which provide limited information and usually miss the big picture. Accurate information is essential for policymakers, traffic managers and drivers alike. In this regard, entire systems are starting to be created to mine valuable information embedded in real time traffic video and associated information (counts, occupancy, type of vehicle, road conditions, etc.) but these systems do not correlate data among sensors nor images. They are severely constrained by the massive amounts of data needed for a complete picture of traffic activity requiring huge storage and communications requirements. Today's systems are effectively one-pass data collection entities and are also hobbled by energy limitations, costs, inability to effectively search video, inability to correlate multiple data inputs and the limiting reality of restricted bandwidth multi-hop communications in sensor network deployments.
The ongoing problem with deploying sensors in general is the appetite for new types of data (including new types of sensors) is always growing, which is driving technology to evolve from simple scalar traffic counting to more complex traffic classification technology while feeding into rudimentary smart traffic guidance and management systems for smart cities as part of the expanding role of IoT in everyday lives. Counting vehicles, people and things (products on assembly lines, pallets of goods, people in vehicles, types of vehicles, speeds, etc.) is emerging as another key class of “killer applications” that will greatly assist in driving IoT deployments.
Today, it's amazing what a single camera can do with sophisticated traffic analyzing software. Due to Moore's Law, processing has become very inexpensive compared to the cost of massive video and sensor transmission infrastructures supporting a large and growing number of video feeds mixed in with sensor data. The key problem is to flexibly mine the information embedded into the video data and combine it with other sensor data. This allows the generation of much more efficient and compact meta data (a set of data that describes and gives information about other data.) and not needlessly move raw video data around an energy and capacity constrained network. Instead, because of economic trends it is feasible to attach programmable processors to the data collection points to reduce data volumes and efficiently manage the energy needed to power the network sensors and video nodes to provide the highest monitoring coverage.
Correlating video and sensor data at the network edge is also an unsolved problem. Data collected by today's rudimentary sensor networks have simple scalar forms with less information which makes processing simple (typically simple calculations of addition, subtraction, division, sums, and averages for example are generated). It is difficult to form a comprehensive understanding of an environment based on simple scalar information. On the other hand, videos and images collected by a video sensor network are rich in information, but have complicated forms. They are sometimes compressed (often lossey compression) and sent to back-end servers to be processed (formatted, integrated, analyzed . . . etc.) to meet diverse application requirements. Increasingly, new applications cannot be supported by typical scalar sensor networks because they require vast amounts of information which can only be obtained from an image or video. In addition, image and video data can be “mined” as information through repeated re-analysis of the image or video as often as necessary through the application of different algorithms. The application of the algorithms could be driven by information obtained from other sensor data through collaboration within the IoT environment. Scalar data, on the other hand, is often a single or group of values which represents limited information and is insufficient for many applications such as video surveillance and traffic monitoring. Camera sensors collect video data, which are rich in information and offer tremendous potential when analysis is also event driven and coupled with existing wireless or power line connected sensor networks. This combination of factors gives the most complete information in an efficient form. Due to the lack of flexible sensors, energy requirements, network connectivity and video platforms there has been little progress on a whole system approach for flexible and collaborative IoT sensor environments. Among the needed improvements, systems need to adjust camera aim and coordinate data from multiple cameras including 2D to 3D, manage many camera angles and correlate the video data with additional sensors, do multi-pass analysis to mine for additional information when needed . . . etc. Collaborative processing has other advantages such as increasing system robustness and improving the performance of IoT environments.