The present invention relates generally to data communication. More particularly, the present invention relates to delay profiling in a communication system.
Voice conferencing systems and videoconferencing systems are becoming increasingly popular forms of communication. In such systems, the quality of the audio channel is paramount. Thus if these systems are to provide a positive participant experience, the delay between audio recording and audio playback must be small. If the delay is too large, participants will “talk over” each other, and a natural pace and flow of conversation cannot be maintained.
The audio channels of these conferencing systems are generally composed of many individual modules, each of which contributes to the overall delay of an audio frame as it travels through the system. Some modules may intentionally contribute large amounts of delay, for example, to compensate for network variability that would otherwise produce unacceptable audio quality. Or, a module might be computationally expensive, needing a lot of time to produce output from its input, and contributing a large amount to the overall delay in that way. In some cases, excessive delay in modules is caused by software bugs and can be corrected at design time. In other cases, the system acts correctly, but uses a configuration that is overly cautious for the actual installation environment.
Several conventional approaches exist for profiling an audio content channel. One conventional approach is roundtrip timing via loopback. In a typical voice conferencing system, the speech recorded from one user is played for the other conference participants but is not played for the person speaking, because doing so would produce a sound similar to a loud echo, which would be distracting. As a diagnostic, however, many voice conferencing systems have a loopback mode, where the speaking user's audio is played back to her after otherwise ordinary progression through the system. This provides a simple roundtrip delay metric that can be used by a single developer. The implementation can be as simple as speaking into a microphone and using a stopwatch to mark when the input is heard back. A more accurate implementation can have the conferencing client use its own timer to calculate the roundtrip delay of a uniquely identified audio packet. However, this approach has several disadvantages. First, only roundtrip time is measured, providing no indication of which individual system modules could be the source of problematic delay. Second, a received audio frame must be matched with a frame that was sent, which might require costly management and lookup of state information for every frame.
Another conventional approach is profiling using an alternate status channel. According to this approach, timing information is transmitted outside of the audio content channel. For example, a buffer might write information about the number of frames it contains to a file, or transmit that information to a central monitoring service. This approach also has several disadvantages. First, audio frames must be matched with frames that were sent, which might require costly management and lookup of state information for every frame. Second, the status channel can consume needed resources, affecting system behavior and invalidating the delay profiling information collected. For example, if a conferencing client and monitoring service use the same network interface, incoming status information might take up bandwidth needed for incoming audio. Third, it can be difficult to synthesize profiling information from multiple devices. A centralized clock may be required, or multiple clocks may need to be synchronized.
Another conventional approach is profiling by function timing, a generic tool for profiling a program's run-time behavior. In many programming languages, sequences of instructions are encapsulated in reusable functions. (Other terms for the same or similar mechanism include subroutine, method, procedure, etc.) When used, a function timing tool typically creates a special version of the target executable code, where the time each function is entered and left is recorded. Additional information such as the number of times each function is called can also be recorded. Function timing can help the programmer identify specific portions of the program's source files that are responsible for poor performance. However, this approach has several disadvantages. First, the divisions of statements into functions often do not correspond to the logical stages of audio frame manipulation. It can be difficult to extract information about individual or average frame performance, particularly when multiple threads of execution are involved. Second, system performance can be greatly affected, depending on the number and type of functions profiled. This can affect both objective and subjective assessments of delay. Third, it can be difficult to synthesize profiling information from multiple devices.