1. Field of the Invention
The present invention relates to an image processing device, an image processing method, a program and a recording medium for combining a frame image extracted from a moving image and a character image of a character string corresponding to a voice of a person who is present in the frame image to generate a composite image.
2. Description of the Related Art
In recent years, portable terminals such as a smart phone or a tablet terminal have rapidly spread, and the number of still images (photographs) captured by these portable terminals has increased. In this regard, opportunities to capture a moving image have also increased. Recently, as a service that uses a moving image, as disclosed in “Moving Image Photo! Service”, [online], Fujifilm Corporation, [Retrieved on Feb. 9, 2015], Internet <URL: http://fujifilm.jp/personal/print/photo/dogaphoto/>, a system that images (captures) printed matter such as a photograph using a portable terminal and then reproduces (AR-reproduces) a moving image related to the printed matter on a screen of the portable terminal using an augmented reality (AR) technique has been proposed.
In such a system, the AR reproduction of the moving image related to the printed matter is performed according to the following steps (1) to (6).
(1) If a user selects a moving image to be printed from among plural moving images using a dedicated-use application operated on a portable terminal, the selected moving image is uploaded to a server.
(2) The server extracts a representative frame image from the moving images uploaded from the portable terminal.
(3) The representative frame image extracted by the server is downloaded to the portable terminal.
(4) The user selects a frame image to be printed from among the representative frame images which are displayed as a list on a screen of the portable terminal, and makes a printing order.
(5) The server generates a printed matter (hereinafter, referred to as a moving image print) of the frame image ordered by the user, and performs image processing for a moving image associated with the frame image for AR reproduction.
(6) After the delivered printed matter is imaged (captured) using the portable terminal, the user downloads the moving image for AR reproduction associated with the printed matter from the server to be reproduced on the screen of the portable terminal based on the AR technique.
As in the above-described system, in a system that prints a frame image extracted from a moving image, a printed matter of a frame image in a state of being extracted from the moving image is generated.
In this regard, in techniques disclosed in JP2003-85572A, Japanese Patent No. 4226237, and JP2012-249211A, a frame image and a voice are extracted from a moving image, a person is extracted from the frame image, the voice is converted into a character string, and a frame image and a character string corresponding to a voice of a person who is present in the frame image are combined in a text balloon form to generate a composite image. Further, JP2014-95753A and Japanese Patent No. 4881980 disclose techniques that determine a gender and an age from a voice, and Japanese Patent No. 4881980 discloses techniques that further determine a gender and an age from a video image.