In a typical computer network-based conferencing system a Multipoint Control Unit (MCU) receives audio/video data streams from multiple participants of a conference, where each data stream that is received from a participant is rebroadcast to the other participants. MCUs are also typically able to separately provide metadata of real-time events that occur during a conference, such as indicating the start and end of a conference, when a participant joins and leave a conference, and which participant is currently speaking. Recording separate data and metadata streams after they are rebroadcast by an MCU faces the difficult challenge of synchronizing the separate data and metadata streams to each other, particularly where the streams themselves do not include explicit synchronization data.