In the transmission of digitally encoded voice, it is often important to be able to time-align the originally transmitted encoded voice information (also referred to as a transmitted signal) with the received encoded voice information (also referred to as a received signal) after received voice information has been transported through a switching network. The switching network can be a traditional circuit switch network or a packet switching network such as ATM, frame relay, or Internet switching. One use for the time alignment of the original digitally encoded voice with the received digital encoded voice is in order to perform speech quality assessment. Until recently the only way to measure users' perception of the quality of voice transmission systems was to conduct subjective tests utilizing humans to make testing judgments. However, subjective tests are expensive and slow, can not be used in certain applications such as in-service monitoring. Various objective models, based on human perception, were therefore developed with the aim of predicting the results of human subjective tests. Various algorithms have been proposed to assess the perception of the quality of transmitted digital voice. The most promising of these algorithms is the perceptual evaluation of speech quality (PESQ). This algorithm has become the basis for the International Telecommunication Union (ITU-T) standard P862. This new standard requires the time alignment of a received digitally encoded voice with a transmitted digitally encoded voice. The method for performing the time alignment of the two voice signals proposed in this standard uses a complicated splitting of speech utterances within the overall speech signal to perform re-alignment of incorrectly aligned samples. Such a technique would result in a complex and expensive alignment algorithm.