Mixed media streams are generated in communication networks, e.g., when mixing audio signals and video signals during a video conference. Here, it is important that mixed audio signals are matched to the related mixed video signal because otherwise the speech will not be lip-synchronous to the video stream. The same problem also arises with other streams than audio or video streams, e.g., text streams having alphanumeric signs when using subtitles.
Currently, the matching of such video and audio or in more general sense mixed media streams require a complicated procedure. Normally, time stamps are used, which will be attached to the different signals to enable the matching of the related media streams. However, while this at least is some mechanism to a match the mixing of different media streams, currently, there does not exist any solution to the problem how the generation of mixed media streams of a different type, e.g., a mixed video stream and a mixed audio stream, may be coordinated.