An educational multimedia, for example, a lecture video recorded at a live class room is captured from a plurality of fixed and moving cameras located within the classroom. Usually, such a lecture video is rich with fairly still video images and audio data and contains less moving video content. The video frames are the ones captured from the fixed cameras facing the blackboard, screen of the slideshow or the instructor continuously. The final lecture video is created by effecting time based merger of video and audio data captured from each of the moving and fixed cameras and keeping the instructional value intact. Effecting such a merger is a labor-intensive manual process. Such a lecture video also contains redundant and unstructured data streams that span along the time sequence and consumes a large amount of memory when stored. In a limited bandwidth and storage scenario, storing, browsing and streaming of such a large capacity lecture video is difficult and costly.