1. Field of the Invention
The invention relates to an image and sound reproduction system which, when a predetermined portion of a background image is to be combined with a part of or the whole of another arbitrary still picture, can easily perform various kinds of reproductions in accordance with graphic interactive data (hereinafter, referred to as GI data), such as that a part of or the whole of the arbitrary still picture is sequentially reproduced while changing the contents by synthesis or replacement, and that a combination of a part of or the whole of the arbitrary still picture is repeatedly reproduced while changing the combination.
GI data are control data indicating information such as the kind and the synthesizing position of a still picture or an object in a still picture, a size, horizontal inversion, vertical inversion, longitudinal inversion, an angle, rotation, and distortion on two-dimensional coordinates or three-dimensional coordinates, and a motion in which these motions are combined with each other.
2. Description of Prior Art
Conventionally, a part of a background image is replaced with another image by the so-called chromakey technique in which an image of an object that is taken with setting the background blue is combined with the background image. In this technique, no control data relating to a motion change of the object is used. Therefore, the technique has a drawback in that it is difficult to, after an image of an object is taken or after an object is combined with a background picture, produce a synthetic image in which the shape, the state, the illumination, the sound, or the like of the imaged object is changed.
In a computer graphics (hereinafter, referred to as CG) technique, a control such as that using so-called motion capture is performed in which motions of predetermined portions of an objective are measured directly or indirectly by sensors and the objective in the CG is controlled by the motion information of the predetermined portions of the objective.
However, such a CG is not provided with means for producing a control signal relating to a motion of an object which changes in linking relationships with frames of a background image that has been already taken, means for selecting and changing a still picture to be combined, controlling means for correlating identification numbers of continuous still pictures serving as a background when the still pictures are to be combined with another still picture or an object which is externally captured, with position information or corrected position information which describes the place where an arbitrary user image other than the background is to be combined, or means for storing control data relating to such information. Therefore, the CG has a drawback in that, after an image of an object is taken or after a synthetic image of an object and a background image is obtained by synthesis or the like, it is difficult to produce a synthetic image by replacing the object of the background image with another object image in which the kind, the size, the brightness, or the voice or sound of the object is selectively changed in accordance with the motion of the object, or superimposing the images.
When a three-dimensional CG due to polygon data is used as a background image, the number of coordinates of polygons is so large that the data processing is complicated and requires a prolonged time period. In order to control such a three-dimensional CG so as to be rendered as a two-dimensional image, therefore, a dedicated hardware which functions as high-speed and dedicated image calculating and generating means is necessary, thereby producing problems such as that the cost is increased, that the image reproduction rate, the number of polygons, the image quality, and the like are restricted, and that the development period is prolonged. When control data relating to the motion of an object in a moving picture image which has been once reproduced is stored and output, there arise drawback such as that the image reproduction rate, the number of polygons, the image quality, and the like are further restricted. In this case, neither means for selectively inputting a user object which is to be combined, nor information for selection exists, and hence a synthetic moving picture cannot be replaced with another one, only by replacement of a photo by the user.
As described above, in the prior art, means for realizing partial replacement of a background image file requires a very sophisticated technique and dedicated hardware, and is expensive. That is, a moving picture file of the prior art fails to, for each frame, have GI data which is motion selection information correlating information for controlling a still picture that is an objective of synthesis or replacement of a background image, with an identification number of a still picture to be combined. Therefore, it is not easy to combine or replace a still picture that is an objective of synthesis or replacement, with another arbitrary still picture. In a model in which a still picture serving as a material of synthesis or replacement, or an object segmented from a part of a still picture is represented by three-dimensional coordinates, there is a problem in that the amount of data is so large that the system is very complicated.
In other words, in a background image in the prior art, each frame of the image consists of data which have been processed as frame data. Consequently, there is no means for changing data in each frame, and post processing of such an image is hardly performed.
This will be described more plainly. In a movie software, only actors or actresses play. As far as a movie is taken on the basis of actual images, a usual person cannot appear in the movie unless the person is in the site of taking the movie. Furthermore, it is impossible to change the cast after a movie is taken. A person can enjoy oneself with freely deforming an image of the face synthesized by a CG or the like. However, there is no means for automatically adding such a deformed image into images which are previously completed, such as a movie. Therefore, a still picture must be separately produced for each frame, and the still pictures must be manually combined with a background moving picture, such as a movie. In this way, the user cannot easily enjoy such synthesis.
It is an object of the invention to provide an image and sound reproduction system in which a predetermined portion of a background image can be replaced with another still picture, and a still picture can be controlled in accordance with GI data indicating the position, motion of the predetermined portion of the background image, and the like, and in which the replacement and control of the still picture can be realized easily and economically.
Specifically, the object of the invention is to provide easily and economically an image and sound reproduction system in which a new moving picture using the same background can be easily obtained by replacing only the face or the body of the user with that of a person appearing in a background image, a new moving picture can be easily produced by replacing only a background image with another one even when the same user image is used, and image reproduction is enabled while easily selecting the story in accordance with the kind and the number of user images.
In order to solve the problems, the image and sound reproduction system of the invention comprises the following means:
(1) still picture synthesizing means for combining an arbitrary first still picture with plural second still pictures serving as a background; and synthesis and reproduction controlling means for, in accordance with position information for combining the first still picture with the second still pictures or selection information for selecting one from the one or more arbitrary first still pictures, controlling the still picture synthesizing means so as to combine the first still picture with the second still pictures, and sequentially reproduce synthetic images;
(2) means for performing reproduction with, by using the position information or the selection information, combinedly using means for capturing the first still picture and an arbitrary n-th (n is an arbitrary integer) still picture other than the first still picture;
(3) means for performing reproduction with, by using the position information or the selection information, combinedly using means for capturing plural arbitrary m-th (m is an arbitrary integer) still picture other than the second still pictures;
(4) synthesis and reproduction controlling means for setting plural dummy objects in the second or m-th still pictures serving as a background, and controlling the still picture synthesizing means in accordance with position information assigned to the dummy objects, so as to combine the first still picture or the n-th still picture with plural still pictures or the plural m-th still pictures, and sequentially reproduce synthetic images;
(5) user object extracting means for setting a partial region (hereinafter, referred to as dummy object) of a dummy still picture serving as a reference, and for extracting a part (hereinafter, referred to as user object) of the first still picture corresponding to arbitrary position coordinates of the dummy object; edge detecting means for detecting an edge of the user object extracted by the user object extracting means; and correcting means for calculating position correction information which correlates arbitrary pixel coordinates of the edge detected by the edge detecting means with pixel coordinates of the first dummy object corresponding to the arbitrary pixel coordinates, and for, on the basis of the position correction information, correcting the position information for combining the first still picture with the second still pictures or the n-th still pictures;
(6) synthesizing means for setting a partial region (hereinafter, referred to as unit object) of the first dummy object or an arbitrary L-th dummy object other than the user object, and for combining the unit object with an image of the user object;
(7) correlating means for assigning an identification number to the first still picture or an object in the first still picture, and for correlating deformation information of the first still picture or the object in the first still picture with the position information describing a place where synthesis is to be performed, the deformation information including a size, horizontal inversion, vertical inversion, longitudinal inversion, an angle, rotation, and distortion on two-dimensional coordinates or three-dimensional coordinates, and combinations of the above;
(8) means for decomposing a moving picture file into a continuous still picture file, and for respectively assigning identification numbers to still pictures as plural m-th still pictures other than the first still picture;
(9) moving picture/sound separating means for separating sound information from a moving picture in a moving picture file containing sound; and moving picture/sound synthesizing means for combining images which are synthesized and reproduced by image processing with the sound information separated from the moving picture file into one moving picture and sound file;
(10) means for capturing an external sound; and moving picture/sound synthesizing means for combining the captured external sound information, images which are synthesized and reproduced by image processing, and sound information separated from the moving picture file into one moving picture and sound file;
(11) means for capturing the first still picture, and means for masking a part of the captured first still picture;
(12) means for setting plural shapes of a still picture mask;
(13) means for displaying plural masked still pictures which are masked by the means of (12); and means for selecting an arbitrary masked still picture from the plural masked still pictures;
(14) means for displaying a masked still picture; and means for, while a user observes the masked still picture, adequately adjusting and determining a capturing position, an expression, or the like of the user;
(15) model converting means for correlating the first still picture with a standard model configured by three-dimensional coordinates; two-dimensional image generating means for generating a two-dimensional image which is obtained by observing a model at an arbitrary, the model being obtained as a result of model conversion by the model converting means; and synthesizing means for combining the two-dimensional image as the plural first still pictures with plural background still pictures;
(16) displaying means for displaying the image taken by imaging means, the displaying means being separate from the imaging means; and means for moving the displaying means together with an object;
(17) means for selecting motion data, i.e., position information or selection information to be correlated, in accordance with the number and the kind of the first still pictures;
(18) arbitrary reproduction enabling means; and means for, in response to enabling information by the reproduction enabling means, starting sequential reproduction of arbitrary synthetic images;
(19) means for selecting arbitrary plural sets of position information correlated with an arbitrary n-th still picture, and for reproducing the n-th still picture and the plural m-th background still pictures, with combining a sequence of the reproduction, or means for repeating the reproduction;
(20) recording means for recording a background image into a first layer of a disk, and for classifying and recording the position information or the deformation information of the still picture or an object in the still picture, into a second layer, the deformation information including a size, horizontal inversion, vertical inversion, longitudinal inversion, an angle, rotation, and distortion on two-dimensional coordinates or three-dimensional coordinates, and combinations of the above, or reproducing means for sequentially reproducing the recorded first layer, on the basis of information recorded in the second layer;
(21) means for printing out an arbitrary still picture in the obtained synthetic images;
(22) means for handling a synthesized still picture as one of the second or m-th still pictures; and
(23) means for handling a synthesized moving picture as one moving picture file.
The synthesizing means may use any one of methods such as that in which a new object to be replaced is superimposed on an original object to be replaced in a moving picture, that in which no original object to be replaced in a moving picture that originally serves as a background exists and synthesis is performed by superimposing a new object to be replaced thereon, and that in which, even when an original object to be replaced in a moving picture that originally serves as a background exists, the object is replaced with a new object to be replaced.
Plural original objects to be replaced in a background image may be used, and plural new objects to be replaced may be used.
By using position information and deformation information of a new object to be replaced, an original object may be arbitrarily replaced with an n-th new object other than the new object to be replaced. The expression of xe2x80x9cn-thxe2x80x9d means an n-th (n is a positive integer) new object of a number of new objects. Similarly, the expression of xe2x80x9cm-thxe2x80x9d means an m-th (m is a positive integer) image group of image groups of different kinds in a series of background image groups.
Next, the functions of the configurations of (1) to (23) will be described.
According to the configuration of (1), the synthesizing position of an arbitrary still picture in a background still picture can be freely set in accordance with position information. When the synthetic images are sequentially reproduced, therefore, the arbitrary still picture is moved, and hence the pictures can be reproduced just as a synthetic moving picture in which the arbitrary still picture moves as the movement of plural still pictures of the background. With respect to position information of number P of the first still pictures which are sequentially reproduced, (P+1)-th position information is a correction corresponding to a movement of the object in the first still picture. Therefore, the configuration has an effect that position information can be easily set. For portrait photos of the same user, when a photo of the face directed to the front, and that of the obliquely directed face are selected, the photos can be instantly reproduced as a synthetic moving picture in which the face of the user looks like moving.
According to the configuration of (2), even for the same position information, a synthetic moving picture of the face of a different user can be instantly reproduced only by replacement with a portrait photo of a second user other than the face of a first user, and without changing position information.
According to the configuration of (3), synthetic moving pictures of a different background can be instantly reproduced only by replacing a background still picture with a different still picture, and without replacement of a series of face directions and expressions of the same user. Furthermore, an entirely different synthetic moving picture of a photo of the face of the same user or another kind, for example, an automobile can be instantly reproduced without changing position information.
According to the configuration of (4), for example, a photo of Mr. A which is a user object can be subjected to synthesis while selecting the position of an arbitrary dummy object in the background.
According to the configuration of (5), in the case where an image of the face of a user is to be taken, even when the position of the face of the user is caused to be deviated from the position of a predetermined dummy object by camera shake or the like, it is possible to correct the synthesizing position so as not to be deviated.
According to the configuration of (6), the mouth portion of a dummy object, or that of an arbitrary user object may be used as a dummy object. Even when there is only one photo of a certain user, the mouth portion of the dummy object can be used as the mouth portion of the user, thereby providing motion to only one image of the mouth portion of the user so that the user looks like speaking. Therefore, labor of taking plural photos can be eliminated.
According to the configuration of (7), even when there is only one photo of a user, still pictures which are previously prepared by deforming the image of the photo by the size, horizontal inversion, vertical inversion, longitudinal inversion, an angle, rotation, and distortion on two-dimensional coordinates or three-dimensional coordinates, and combinations of the above can be sequentially reproduced. Therefore, the motion and expression of the user can be provided with abundant changes.
According to the configuration of (8), since identification numbers are assigned to still pictures of a moving picture, position information of synthesis can be correlated with all of arbitrary still pictures, so that one scene of a moving picture of the same background can be repeatedly used or an arbitrary background can be used in different scenarios.
According to the configuration of (9), sound information contained in a moving picture file serving as a background can be used as it is after synthesis, or can be rewritten at one time for each file, together with other sound information. Since the processing can be performed in the unit of one file, instructions for sequentially continuously reproducing information can be given by one reproduction command, so that labor of repeatedly inputting a reproduction command can be eliminated.
According to the configuration of (10), Arbitrary sound which is captured by the user can be reproduced in place of sound contained in the original moving picture file, together with an image.
According to the configuration of (11), when an image of the face of the user is to be taken, an image of the face only can be taken while hiding the body, or an image can be combined or replaced by using a favorite portion only.
According to the configuration of (12), even in the case of shapes to be combined which are largely different from each other, such as images of the face of the user at various angles or an arbitrary automobile, the same effect as (11) above can be attained by preparing plural masks.
According to the configuration of (13), synthesis using the most favorite image of the face among the plural images taken by using plural masks in (12) above is enabled, or synthetic moving pictures in which different masking states or expressions of the user are used in the same background can be instantly seen with switching over the pictures. When an image of the face of the user is to be taken, the imaging can be performed while checking the displayed masking state. Therefore, the imaging can be performed at a favorite position or without deviating from a preset position of a dummy object, and labor of correcting the position after the imaging can be omitted.
According to the configuration of (14), a still picture of an instant or position at which a favorite expression of the user is obtained by pressing the shutter by the user at an arbitrary instant, or that of an automobile or the like of an instant at which the most favorite illumination or reflection state is attained can be used in synthesis.
According to the configuration of (15), labor of taking plural photos of the user can be omitted. Furthermore, complex deformation can be made by moving coordinates of a three-dimensional model, so that variations of a synthetic moving picture are increased. A method may be employed in which a three-dimensional model of a shape of a face or mouth portion of an object to be replaced is converted into a two-dimensional model, the information of the obtained two-dimensional image is output as an shape of the mouth portion of a new object to be replaced, and the information of the two-dimensional image is selectively displayed as a selected signal of the new object to be replaced. This method has an advantages that the object can be selectively displayed at plural angles, and that it is not required to perform the imaging at plural angles. The conversion from a three-dimensional model to a two-dimensional image is realized by a process which is called rendering and in which a three-dimensional model having coordinates in three axes such as the depth, the height, and the width is seen from a camera point at the coordinate origin and the image on the camera is converted as a model having coordinate data in two axes or two dimensions of the height and the width.
According to the configuration of (16), even when the user is directed to the front, the user can check a change of the expression and position in profile of the user. Therefore, imaging of a favorite profile or at appropriate synthesizing position is enabled, and therefore there arises no case where distasteful photos are taken and the imaging is repeatedly performed.
According to the configuration of (17), when an angry or crying face in expressions of the user is set as the kind of the first still picture, position information or selection information of a story corresponding to the selected face is selected, so that, in the case of an angry still picture, a synthetic moving picture of an angry story is produced, and, in the case of a crying still picture, a still picture of a crying story is reproduced. Furthermore, it is possible to determine whether deformation processing is necessary or not. When the first still pictures are small in number and the number of variations of the expressions of the face is small, for example, there may arise necessity for producing various expressions of the face by using modification information. By contrast, when the first still pictures are large in number, modification information may be unnecessary. When deformation processing is not required, the process rate can be increased.
According to the configuration of (18), the system may be provided with a function such as that, when a certain point is attained in a game or the like, reproduction is started.
According to the configuration of (19), plural different movies can be synthesized by using portraits of the same user, or a laughing story using a crying face and a crying story using a laughing face may be synthesized so that fun due to the unbalance can be created.
According to the configuration of (20), a player which reproduces only the first layer can be differentiated from a player which can reproduce also the second layer. Even when the first and second layers are recorded on the same disk, therefore, a user who uses the second layer can be charged in a manner different from a user who uses only the first layer. Furthermore, it is not required to prepare plural recording disks, and hence labor in operations by the user can be eliminated, and the kinds of disks are not increased, thereby reducing the cost.
According to the configuration of (21), in the case where an obtained synthetic moving picture is to be provided as a souvenir, when the user does not have a reproducing machine, one or plural favorite scenes or plural continuous scenes can be provided as a souvenir.
According to the configuration of (22) or (23), plural first still pictures can be used in one moving picture file.
As described above, according to the image and sound reproduction system of the invention, position information, deformation information, and information of motion and the like of a predetermined portion of a moving picture which is to be combined or replaced with another still picture are reproduced or captured from the outside. Therefore, even a usual user who has no special knowledge can reproduce an arbitrary synthetic image in which an arbitrary still picture is controlled, only by selecting a photo. In an imaging process for capturing a still picture required for the reproduction, position correction information, mask selecting means, or a dedicated jig can be used, thereby enabling the user to capture a still picture easily, accurately, and economically. Furthermore, a substantial reproduction system for an arbitrary synthetic image can be configured which can be enjoyed with using an external sound, or unlimitedly combining plural synthetic images with one another in linking relationships with a quiz, a game, or the like.