There is a technique for analyzing an image construction of a two-dimensional video image and for converting the two-dimensional video image into a three-dimensional video image. In such a technique, a picture of each encoded two-dimensional video image is decoded and then analyzed to estimate a depth (depth level) for each pixel component of the picture, and a three-dimensional display parallax image is generated by using the depth estimation result and decoded video image.
In this technique, a processing speed of processing for depth estimation may differ from that of processing for two-dimensional video image decoding, and the depth estimation result and decoded image may be generated at different times. Then, in that case, if the depth estimation result and decoded image, which are associated with a single picture, are strictly synchronized to generate a parallax image, a time lag or the like may occur in generation and/or display of a parallax image for each picture, which may make it impossible to perform a suitable three-dimensional conversion process.