Computing systems may model motion in a visual scene (e.g., a sequence of video frames or temporal images) to detect object movement. For example, computing systems may calculate an optical flow as the movement of a pixel from one video frame to the next video frame. The calculated optical flow for each pixel in the video frames may include a vector for x-direction movement and a vector for y-direction movement. Moreover, by considering the optical flow of multiple pixels, the computing systems may model or estimate a pattern of motion for objects, surfaces, and/or edges in the visual scene.
Motion, or the movement of pixels, within the visual scene may come from several sources. First, motion within a visual scene may be attributed to movement of a camera with respect to a stationary world coordinate, and thus, may be classified as “camera-centric” motion. Second, motion within a visual scene may be attributed to general movement of an object (e.g., a person that is walking) with respect to a stationary world coordinate, and thus, may be classified as “object-centric” motion. Finally, motion within a visual scene may be attributed to more detailed movement of one or more parts (e.g., arm and leg movement) of a larger object (e.g., the person walking), and thus, may be classified as “part-centric” motion.
When modeling motion based on optical flow, the camera-centric motion and/or the object-centric motion may introduce noise that may make it difficult for systems to detect a particular type of object. In other words, the camera-centric motion and/or the object-centric motion are often associated with more general movement that may not be useful to characterize the particular type of object and that may not aid a system in distinguishing the particular type of object from other objects or a background setting. Thus, the noisy movement introduced by the camera-centric motion and/or the object-centric motion may be detrimental to object detection systems.