Many conventional media systems are connected with analogue cabling. Usually, the wiring radiates out from a small number of centralised pieces of equipment. Propagation delay along the cables is so small as to be negligible. In these implementations, compensation for input and output latency can be carried out manually because there are only a few centralised pieces of equipment.
However, several factors can cause play out between different pieces of equipment to be out of sync. For example, different receivers (amplifiers and/or speakers) can take different amounts of time to play the signal out, and the presence or absence of intermediate processing devices will cause differing delays.
A simple analogue audio system (short cable runs, identical speakers) usually uses a fully synchronous master-slave system. An example would be, amplifier and speakers wherein the amplifier puts a signal on wire and speakers slave to this signal and play it out without worrying about timing. This typically works adequately if speakers take roughly the same amount of time to process the audio, which is normal if they are the same sort of speaker. In most situations, the time on wire is so small as to have no effect. But in a big system, with multiple amps and speakers in diffuse locations, the delays might not be negligible. Additionally, a mixed-media situation (audio+video) typically will not have identical play-out devices.
However, large-scale analogue systems are being replaced by distributed, networked systems because of the benefits of networking for the distribution of signals and the wholesale digitalisation of media equipment. Digital typically improves on analogue audio issues, but may create new problems with regards to timing.
Digital audio systems may have timing issues even for a single transmitter and receiver. The transmission path in a digital system typically involves buffers and digital samples, which means that data must be clocked into and out of buffers. The clock rate typically needs to be synchronised with digital (sample) audio. If the rate is not synchronised, the receiver will consume samples either faster than the transmitter sends them (and run out of samples) or slower than the transmitter sends them (and over-fill its buffer). Thus the transmitter and all the receivers must run (on average) at exactly the same rate to avoid the creation of audio artefacts.
Differences in clocks can be described in terms of rate and offset. Rate applies to all clocks and refers to how fast the clock runs. If two clocks run at different rates, then buffer overrun and underrun will occur. Offset applies only to clocks that maintain a time value, and measures the difference between the current value of each clock. Simple digital timing mechanisms synchronise rate only. A mechanism is used to allow either the transmitter or a particular device on the network to dictate and discipline the clock rate for every device on the network.
Some architectures (for example “AES”) use very small buffers, in the range of 1-4 samples. An AES transmitter sends multiplexed digital audio down a dedicated line. Receivers in these architectures slave directly to the transmitter and read data per-sample. Rate is typically managed by the arrival of samples while offset is typically ignored.
In contrast, packet-based architectures may be used. These architectures typically need much bigger buffers, as the buffer must contain a full “packet” worth of samples plus enough space to allow for however long the packet medium takes to transmit a packet. One common way to achieve this is for the system to define a fixed delay across the entire system and then use a clocking mechanism in the transmission protocol to achieve timing consistency.
For example, a technology might use clocking information in packets for rate control and timing, and define that all nodes must allow 3 ms latency before playout. This typically works adequately in systems where all components have near-identical specifications and then network is tightly controlled (e.g. all I/O is done via a single manufacturer's hardware boards), but may be problematic in systems with significantly different components.
Some such systems (e.g. “CobraNet”) are rate-controlled only, using regular conductor packets to control timing and discipline clocks. One drawback of conductor packets is that every network hop adds delay and thus clocks are not synchronised with regard to offset. Variation in output times in such a system can be in the hundreds of microseconds or more.
Typically, less sophisticated packet-based architectures do not attempt to enforce ongoing synchronisation. Clocks are roughly synchronised by the initial transmission and then free-run. Buffering issues and synchronisation are dealt with by regularly resynchronising the audio, such as after each song. This works satisfactorily for a low-precision system with frequent breaks (e.g., streaming home audio), but not for professional audio. Clocks on personal computers (for example) can easily drift 50 ms over the course of 5-10 minutes, which is more than enough to provide noticeable audio artefacts. For example, an error of 50-100 parts per million (PPM) from “nominal” (30-60 ms over 10 minutes) is not unusual for an undisciplined oscillator crystal. Disciplined crystals that are allowed to run freely can maintain 1-2 PPM with respect to each other, but that requires a synchronisation mechanism to discipline them in the first place.
A master-slave system may also experience difficulties when there is more than one transmitter. Different transmitters may have different clock rates compared to each other. To handle this, clocks could be synchronised from a single source; for example, a master on the transmission network or an external word clock. Alternatively, the receivers may operate a different clocking and buffering domain for each input, and then post processes to adjust the inputs to the receiver's internal clock rate (which might be slaved to an input). A consequence of this latter strategy is that receivers post-processing to different clocking domains may not be properly synchronised, nor can transmissions from different clocking domains.
Conventional digital audio networking may involve distributed, digital media systems being built using heterogeneous, off-the-shelf networking equipment and cabling, although end-nodes are usually proprietary equipment. The result is distributed media systems containing many boxes for passing media signals over the network.
Existing digital media networking technologies typically have a number of problems including, for example, but not limited to:                Fixed packet sizes and sample rates for the whole network        Fixed or limited topologies, e.g., fixed upper limits on latency        Transmission delay sufficiently large that it can no longer be ignored        Packetisation delay (the time taken to collect samples and put them into packet), which is inherent in TCP/IP or Ethernet based audio networks, that is sufficiently large that it cannot be ignored        
In addition, existing audio networking technologies typically:                Do not account for latency introduced hop by hop in the network        Treat all senders and receivers as having the same characteristics (e.g. input latency, packetisation delay, variability in transmission timing)        Runs at the latency of the slowest node in the network since all senders and receivers have the same latency        Typically manage latency manually (i.e., computed with a pen and a piece of paper given a network topology), or set the latency at the worst-case maximum value for the entire network.        
These types of limitations have restricted the utility of existing media networking technologies.