Hitherto, in a content receiving apparatus, a content received from the server provided at an encoder side is divided into video packets and audio packets and is thereby decoded. The apparatus outputs video frames based on the video time-stamps added to the video packets and on the audio time-stamps added to the audio packets. This makes the video and the audio agree in output timing (thus accomplishing lip-sync) (See, for example, Patent Document 1 and Patent Document 2).
Patent Document 1: Jpn. Pat. Appln. Laid-Open Publication No. 8-280008.
Patent Document 2: Jpn. Pat. Appln. Laid-Open Publication No. 2004-15553.