Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within a range that can be heard and/or felt (typically from 20 Hz to 20 KHz). Audio technology relates to the electronic representation of sound. When using an analog recording technique, audio signals are stored as a continuous wave in or on a recording medium. When using a digital recording technique, audio signals may be stored as, for example, representative data points or samples.
In digital recording, the accuracy of the digital signal's representation of the wave (i.e., the underlying sound) is dependent on, among other things, the sampling rate, which specifies how many samples are taken per unit of time. For example, the sampling rate typically used for “CD-quality” audio is 44.1 kHz, meaning 44,100 samples are taken per second to represent the wave. A sample refers to a value at a point in time. A plurality of samples representing a portion of audio is referred to as a sample set. In the case of a wave, a sample represents the amplitude at an associated location on the wave. As more samples of the wave are taken, the wave is more accurately represented by the digital signal, thereby allowing for improved quality in the playback of the originally recorded sound.
Audio has many properties, including loudness, as an example. Loudness, sometimes referred to as volume, is a level of auditory sensation having a value on a scale extending from quiet to loud. The loudness level is determined by the amplitude at a given location of the wave. When the loudness level is at or near the quiet end of the scale (e.g., a low amplitude), the audio is often characterized as being silent or “mute.” Due to static and noise that are present in audio recordings, the audio may still be considered mute even when a slight level of loudness is present. As such, the mute characterization is typically used when audio has no (or only an insignificant amount of) loudness, and would therefore be logically characterized (e.g., by a listener) as being mute. When the loudness level is not at or near the quiet end of the scale (such that it has a relatively high amplitude), the audio is often characterized as having “sound.”
Audio is commonly organized as a single channel (i.e., mono) or as multi-channel (i.e., with two or more associated channels), where multi-channel is often used to create a sense of space and/or direction of sound. For example, a stereo-channel configuration includes two separate (but associated) audio signals, typically identified as left and right channels. Multi-channel configurations having more than two channels are commonly used to simulate surround sound, but still typically include the left and right channels.
Audio technology is also closely related to video technology, which relates to electronically capturing, processing, recording, and reconstructing a sequence of still images referred to as frames, so as to represent motion. Video includes a number of frames based on a predefined frame rate. For example, in the U.S., the Advanced Television Systems Committee (“ATSC”) establishes a standard frame rate of 29.97 frames/second for video used for commercial over-the-air television broadcasting. Video may also be transmitted via a digital video signal (e.g., based on the high definition serial digital interface (HD-SDI) standard). Once captured and processed, video is typically encoded and recorded as a digital file. Thereafter, the file is retrieved and the video is reconstructed by decoding the file.
Audio is often embedded in and/or otherwise associated with video. As an example, a signal based on the HD-SDI standard represents not only video but also up to 16 channels of audio with a 20 or 24-bit sampling rate of 48 kHz. As such, each frame of video may have an associated audio sample set (i.e., representing the sound associated with that frame).
Since the HD-SDI standard or other standards may provide or include more audio channels than were used when a given set of audio was recorded, often the audio associated with one or more of the recorded channels is duplicated and used for the additional channels. For example, when mono audio is played out from a first source and then recorded by a second source (e.g., using the HD-SDI standard), the same (or substantially the same) audio from the single channel may be effectively copied and used as both the left and right channels. This process is often referred to as audio “up conversion.” Thereafter, when the recorded audio is played back, the same (or substantially the same) audio is heard on both the left and right channels. Notably, noise and static often prevent two or more channels in this instance from being identical, but in most instances, they are at least substantially similar.
For a variety of reasons, such as to analyze systems, identify commercials, and assist with video and/or audio editing, there is a desire to analyze sample sets and identify in those sample sets select attributes such as the mute/sound attribute as described above.