Automatic time alignment of audio has many applications including synchronizing high-quality speech to a low-quality reference recording of the same utterance, aligning utterances of different languages to aid in foreign overdubbing, and synchronizing recorded instrument tracks. Traditional speech features, such as Mel-frequency ceptral coefficients (“MFCC”), struggle in template matching systems, such as dynamic time warping and hidden Markov models, in noisy environments. MFCC values may become distorted so significantly from their nominal values by noise that they become indistinguishable from feature sets of different sounds. Such noisy environments are frequently encountered in a video shoot (e.g., unwanted noise on the set, poor microphone placement, etc.) necessitating actors to overdub the exact dialogue from a video shoot. The process of re-recording actors in the studio is known as automatic dialogue replacement (ADR). If an auto-alignment system is not used, then the actors must painstakingly re-record their lines until the timing is perfect, or a studio engineer must manually fix the timing, which can be a time-consuming and difficult task.