The present disclosure relates to processing visual and audio components of video sequences.
Collected video sequences may contain a variety of content. For example, both visual and auditory information of an object (e.g., a person, such as an actor, a musical instrument, a vehicle) may be represented and mixed with visual and auditory information associated with other objects (e.g., a set, which can include furniture and other props, scenery, such as trees or buildings). Visual information and audio information are typically processed separately.