Many types of events are held everyday which generate or are capable of generating different types of multimedia data. For example, consider the typical sporting event or music concert. Such events may be the subject of live broadcasting, filming, or streaming over the internet. The video of the event may be recorded from multiple camera angles, and focused at many different subjects from different parts of the music stage or sports field. In addition, for the concert example, sound recordings may be taken from many different locations, performers, or instruments. Still photographs are yet another type of media which may be captured for the event from many locations to obtain photographs of many different scenes at the event.
As is evident, any event may be associated with multiple sources of data that are created or recorded for that event. The data may be of different types and formats, e.g., sound, video, photographs, etc. While there are many devices that capture data relating to the exact same event, conventionally these capture devices are completely independent from one another. The conventional media that is used to capture these events, e.g., film, MPEG4, MPEG3, etc, inherently includes only information specific to each individual recording device and medium. Therefore, while the MPEG4 video recording may provide an accurate video of what is being recorded from a very specific camera angle at a very specific recording subject, there is no inherent way to correlate or relate that recording with any other video recording of the exact same event that may have occurred from another camera angle, with an audio or still photo recording of the same subject, or recordings in multiple media which are being directed at another recording subject.
Solutions to this problem may be highly manual in nature, high in cost, and are generally imprecise. For example, the broadcast of a sporting event may involve the strategic positioning of video cameras at different locations within the sporting arena. A production crew is charged with the task of knowing the locations of these cameras and the subjects that are being recorded with these cameras. During either a live broadcast or later production of an aggregated film clip, the production/editing crew must manually review the video recordings to determine the exact subject being recorded, and must essentially estimate or re-generate the relations between the different recordings. Therefore, any attempt to integrate the data from the multiple sources is essentially done in an ad hoc manner using highly manual techniques that generally “guess” at the recording parameters of each recording.
This problem is further complicated by the modern trend of having audience members bring portable electronic devices that are capable of capturing and recording the live event. For example, audience members may bring mobile phones that have image, video, or sound capture capability, and use those mobile devices to capture data relating to the event. Those mobile devices may be recording videos, images, or sound at different angles and at different subjects at the event. However, even though these portable recording devices are not “officially” recording the event on behalf of the event promoters, those recordings may still be of great interest to those that wish to provide a live broadcast or later production of a film for the event. This is because the mobile devices may be capturing videos or photographs that were not captured by the “official” recording devices, and which would be useful or desirable to include in the live broadcast or later production. For example, the mobile device may have captured the scene of a disputed referee call at a sporting event from a very useful angle, or captured the recording of a musical performance from a very unique angle or recording posture.
The disclosure of U.S. application Ser. No. 13/102,794 provides an improved approach for capturing multimedia information in a coherent manner to inherently permit aggregation or synchronization with other coherently captured multimedia information. That disclosure also provides a rich semantic model for relating and aggregating the captured data.
However, this type of media data is typically very large in size, and hence transfer of the media data across a network would require the consumption of a large amount of bandwidth. For a well-attended event where there are many users capturing media at the same time, it is quite likely that many of those users will attempt to transfer the captured media data at the same time. The problem is that conventional networking and data management techniques cannot sufficiently handle this large volume of data traffic over a limited bandwidth network at the same time.