In related art systems, the sound of a speaker's voice in dubbed speech is typically adjusted to sound like an actor speaking in the media asset based purely on the original speech of the actor in the media asset. To do this, typical systems generate a voice profile for the original speech that includes information regarding pitch, temporal structure and other qualities of the original speech. The generated voice profile is used to modify the dubbed speech to sound like the original actor, who produced the original speech. These typical systems, though, fail to consider context for the actor's speech in making the adjustments, which results in undesirable audio for the media asset that sounds unnatural.