The generation of an accurate transcript of a finished, edited dialog-intensive media program is currently a labor-intensive process. Even if a script is available to serve as a starting point, it will generally not conform accurately to what is said in the post-edit, finished program owing to improvised departures from the script and material that has been cut and reordered during editing. Keying a transcript to the times when the corresponding audio dialog occurs in the media requires a human transcriber to play back the edited media, transcribe the dialog, noting the start times for each phrase or sentence, and identifying the character that is speaking. For an average feature-length film, this process takes about 5-10 days. Such an annotated transcript, often referred to as a spot dialog master, has a number of uses in the program distribution workflow. For example, it serves as the primary document that drives dubbing and subtitling for domestic and foreign distribution. It also serves to indicate spans of the various sources that are utilized in the final program, which need to be compiled for reporting purposes in connection with broadcasting rights, clearance, payment of fees, and music rights. The increasing complexity of program production, and the need to move rapidly through the production workflow place a premium on efficiency and speed in the process of generating a transcript and a spot dialog master. There is also an increasing need to track and retain more metadata associated with the various stages of program production and distribution. Furthermore, for time-sensitive productions that have required delivery dates around the world, it is desirable both to speed up the editing process, and to reduce or eliminate the time taken to produce the spot dialog master.