This application relates to segmenting spatiotemporal data, including video data or lidar data, using gaze data obtained through monitoring a user's gaze of the spatiotemporal data.
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work described herein, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art.
Segmentation of video data generally involves segmenting frames of the video data individually, i.e., on a frame-by-frame basis. This process involves dividing the pixels of an individual frame into segments in order to identify objects within the frame and boundaries of those objects. Generally, this process involves steps performed by a computer and a user.
In one process, e.g., an individual frame is input into an object detector of a computer, and analyzed extensively to identify all objects within the individual frame. Thereafter, a user reviews the objects identified by the objection detector and corrects misidentifications. This process is repeated for each frame of the video data.
In another process of segmentation, a user is required to manually identify objects in key frames of a video sequence. In this process, a user views an individual frame of the video and creates control points along the boundaries of the object. The user then inputs parameters which identify the object as static or moving and which are used to adjust a location of the defined boundaries in proceeding frames. Thereafter, a user may edit key frames of the video to adjust the location and size of the boundaries. A computer algorithm then uses an interpolation process to identify the position and shape of the object within the frames that have not been directly viewed and edited.