Visual understanding is, nowadays, a demanding cognitive task with a set of forms, methodologies, tools, and approaches that can turn data from many separable and discrete elements into information. This information can be used to reason about the world. Computer vision technologies make “things” much more intelligent, responsive, and smarter. High performance computers have become widely available at relatively low cost, which makes it possible to use high performance computers to detect, track, and recognize objects of interest with the variety of cameras. The collected data can subsequently be used to derive actionable insights that can define business value, drive changes, measure business impact, etc. This can happen automatically without lifting a finger.
Cameras, e.g., analog and digital video surveillance cameras, are everywhere. They are seen on street corners, at road intersections, in parking lots, in chain stores, surrounding private properties, etc., however, the cameras are underused. The volume of data produced by the cameras is overloading the network. The volume of data is overloading the computational capability of the systems to which it connects. The data comes from different types of sensors such as video data and conventional sensors like light sensors, accelerometers, etc. so that actionable insights cannot be generated without semantically meaningful annotations.