In an audio-video sequence, a time-indexed video sequence is synchronized with a corresponding time-indexed audio sequence. For example, in an audio-video sequence capturing a conversation between two participants, the constituent audio sequence will contain the sound of the words spoken by the participants, while the constituent video sequence will show the two participants and their visual behavior. Further, during playback of the audio-video sequence, the spoken words of the audio sequence are synchronized to occur at the same time as the facial movement of the participant speaking them.