Developments in field of multimedia processing, such as image processing and video processing have led to a tremendous growth in a field of interactive three dimensional (3D) virtual environments. The 3D virtual environments have applications in fields of virtual games, medical surgeries, autonomous robotics, and video surveillance. Video surveillance may be performed using certain video surveillance systems. The video surveillance systems capture videos in real time using multiple cameras and process the videos to detect human activities at various locations such as, party get-together, street movement, threat detection and criminal activities such as, kidnapping, robbery, riots and road accidents. Most of the video surveillance systems rely on manual detection of human activities using mere observational skills of a human. But with increase of number of surveillance cameras on streets and at other public places, a task of manually monitoring activities in each and every camera of the video surveillance system has becomes highly difficult. The video surveillance system in an office, market, and university require robust recognition method to perform precisely and in an automated manner. In today's world, automatic recognition of activities and gestures in video is vital for video surveillance applications.
In the era of digital technology, emphasis has been growing on automatic detection of objects in images and videos. In order to detect objects in images and videos, a number of image processing techniques are known. The image processing techniques for object detection in a video starts with segmenting individual objects in each frame of the video. As each frame is depicted in a two dimensional plane, it is difficult to segment connected or overlapping objects in the frame since depth information associated therewith each object is unavailable. Since the depth information is generally unavailable, segmentation of objects is solely based upon color data of the pixels in each frame. Further, the segmentation of objects in 2D images largely depends upon resolution of the video captured.
Due to frequent changing parameters such as, luminance and motion of the surveillance cameras, it becomes unreliable to use such image processing techniques for segmenting objects. Considering the increasing number of surveillance cameras, the amount of raw information accumulated in the form of a live streamed video from the surveillance cameras is very high. These live streamed videos need to be processed in real-time in order to generate relevant alerts and detect different human activities. The main challenge remains with segmenting of individual overlapping components which requires a lot of processing.