Information may be collected about entities in a movie at a particular time reference. For example, a user may pause a video and query which actor is present in the paused frame by clicking on a particular actor's face. Other information about the actor or other entities in the scene may be available. To link information about the objects on the screen, a video may be subject to a tagging phase such that entities in a scene are identified and recognized. The tagged entities may be stored. Each tag may include the identity of the entity and a time reference (or video frame) in which the entity appears. In some cases, the position, such as coordinates, of the entity (e.g., where a person's face is located) in the frame may be stored as well.
Tags may be synchronized to the video during playback using the elapsed playing time or current frame number. For example, if information about objects at a time, t, may be requested, then tag data at time t may be searched. But elapsed playing time and frame number are not reliable indicators for synchronization if the video playing is a transformed version of the video that was tagged. Transformations that commonly occur include, for example, changing encoding formats or editing. In both examples, frames may be added or removed. Transformations may occur where a user desires content in a different format than the one on which the tagging was performed or where bandwidth is limiting, for example.