To roughly know a long content, namely, a content for a long time, recorded in a conventional VTR (video tape recorder) or disk recording/playback apparatus by playing back the content in a time shorter than the length of time taken for the recording, the playback is made at a speed about 1.5 to 2 times higher than the recording with consideration given to a speed at which one can understand voice information.
Even if summary or digest playback is tried of the content in a shorter time, a voice output thus played back at a high speed will not be easy to understand. Normally, only the image information in the content is played back as silent data.
On this account, summary (digest) playback of a recorded broadcast program is made, in some cases, in a predetermined time shorter than the recording time of the initial broadcast program by extracting predetermined feature data on the basis of features appearing in image/voice data (image/voice information signal, image/voice signal or image/voice information) in a recorded broadcast program, detecting key frame sections each appearing to be a key frame (important frame) with the use of the predetermined feature data, and playing back the key frame sections selected sequentially under a predetermined rule and playing back.
Also, in a predetermined section of recorded image data, positional information indicative of playback points is automatically generated at each fixed time intervals such as 3 min, 5 min, 10 min and the like or positional information is manually generated in desired positions by the user. This is generally called “chapter data generation”. The chapter data generation is made to make skip playback, edition and thumbnail display with the use of the positional information (chapter data).