A video system may be configured to identify a behavior or a pattern in video data. The video system may treat the identification of the behavior or pattern as a learning problem. This aspect of the video system, which may be referred to as a learner, can be provided with image pairs and then informed whether the image sequences and/or one or more patterns in the images are matching or not. The system can then determine which image patches (local descriptors) are most consistent for matching images and which patches (local descriptors) are most discriminative for non-matching images, as well as recognizing patterns of activities of interest. The activities of interest are atomic/short duration activity, such as walking, jumping, falling, entering, exiting, and such.