1. Field of the Invention
The present invention relates to a technology to encoding images such as mobile pictures, and in particular, to an image object managing method of managing objects configuring images by use of identifiers, an image processing apparatus using the method, an image stream produced by the method and a recording media for the stream, and a recording media on which programs achieving the method is recorded.
2. Description of the Related Art
Heretofore, there have been proposed a large number of methods of encoding or coding mobile pictures. One of these methods which is standardized by a moving picture coding experts group (MPEG) has been broadly employed in the present stage of art. The MPEG standard specifications include MPEG1 and MPEG2 at present. While MPEG1 is adopted for video compact disks (CD) and MPEG cameras, MPEG2 is utilized for digital video disks (VD) and digital satellite broadcasting. As general features of the MPEG standards, there is included data compression according to a correlation with respect to time, which leads to high coding efficiency. In this compression method, differences between a plurality of frames (screen images constituting a mobile picture) which are continuous with respect to time are recorded to compress data. In general, adjacent frames of a mobile picture are quite similar in image features to each other and hence a high compression ratio can be obtained by coding the difference therebetween. This is why the compression efficiency is improved.
New specifications of MPEG4 are being prepared or discussed at present. This is developed inherently for communication and is different in features from MPEG1 and MPEG2. Description will be, however, given of only sections of MPEG4 which relate to the present invention. MPEG4 is conspicuously different from MPEG1 and MPEG2 in an object encoding operation. According to MPEG4, each object appearing on a screen can be encoded. Consequently, it is possible that a person (an assassin) 1 and a background 2 of one film scene are respectively encoded to be also respectively transmitted, and it is possible, on a side having received the respectively transmitted signals, to combine the images into one scene. In this operation, each object is encoded, for example, as follows. Although a little change appears between successive scenes in actual mobile pictures, a large change takes place between scenes in this example for easily understanding in the description of this application.
First, an image of a person is obtained before a single-color background in a studio using, for example, a method called xe2x80x9cblue backxe2x80x9d. Thereafter, a rectangle in which the person""s image is completely contained is defined in the overall image and is trimmed as shown in FIG. 8. Using as a key the single color of the background, a mask is produced to separate the person from the background (FIG. 9). FIG. 8 is compressed by a discrete cosine transform (DCT) which is similar to ordinary MPEG2. The mask, i.e., FIG. 9 is similarly compressed. Compressed data of FIGS. 8 and 9 is transmitted to a partner (on the receiving side). The receiver decodes the data of FIGS. 8 and 9 and then recognizes that a white area of FIG. 9 is xe2x80x9ctransparentxe2x80x9d to set the background area of the decoded FIG. 8 to xe2x80x9ctransparentxe2x80x9d in accordance with the mask of FIG. 9 (reference is to be made to FIG. 10). Thereafter, the obtained image is combined with a background image separately prepared.
This method has a feature of higher encoding efficiency when compared with a method in which the overall screen is encoded. This is because of a principle that the background screen is almost still, i.e., little change, and hence the quantity of data to be processed is small, namely, only the moving sections of the person are to be encoded.
The present invention provides, in consideration of image compression technologies such as MPEG4 having the feature above, novel characteristics to achieve an object-base encoding operation.
A video image stream of MPEG4 includes a stream of backgrounds and streams of respective objects. FIG. 4 shows an example of MPEG4 streams corresponding to FIGS. 1 to 3.
A stream 10 includes individual object streams 11 to 19 and a control stream 20 describing a composite rule of these object screams (the description stipulates, for example, positions of images with respect to depth in the screen image and timing of appearance of images).
In FIG. 1, an assassin 1 and a background 2 are related to object streams 11 and 12, respectively. In FIG. 2, a target 3, a background 4, and a hindrance 5 are associated with object streams 13 to 15, respectively. In FIG. 3, a target 6, a background 7, an assassin 8, and a hindrance 9 are related to object streams 16 to 19, respectively. Object streams 13 and 16, 15 and 19, and 14 and 17 are not interrupted therebetween and are hence respectively continuous object streams, i.e., each combination forms one object stream.
Stream 10 is subdivided in a time division procedure into small packets to be transmitted. Consequently, for the receiving side of these object streams (and the control stream) to restore the original streams, there are required identifiers to identify the respective object streams. According to the stipulation of MPEG4, only the number of bits is determined for the assignment of identifiers and no other rules are stipulated for the identifiers. Therefore, for the identifiers, serial numbers are ordinarily assigned in an order of appearance of objects. To guarantee the time sequence of the objects, a time stamp is assigned to each object stream.
Even in a case in which objects which regarded as the same object by viewers, for example, objects corresponding to an identical person appear in different scenes, if the pertinent stream is once interrupted, a subsequent scream is ordinarily assigned with a new serial number. Namely, another number is assigned to the subsequent stream. In consequence, when at least the stream is simply analyzed by a computer, a correlation, i.e., the person appearing in two or more scenes cannot be appropriately identified.
Consequently, the prior art is attended with problems as follows.
An amateur can produce copies of video images only for curiosity or entertainment to distribute the copies via a network. Moreover, it may also be possible that a malicious person who aims at disgracing dignity of a particular person copies images of the person appearing in video images and combines the copied images with another background and other objects to produce original video images. There exists fear that such an act infringes the right of portraits of the person. Additionally, the produced screen images are not associated with intention of the producers or programs or films. Namely, there also exists fear of an infringement of a copyright. Particularly, when the user of the right is to be charged, there possibly occurs a matter of money.
When the video images are processed in the analog format or in MPEG2 and preceding digital formats, these actions do not easily occur in general because of difficulty in separating a person from a background thereof. However, since object streams can be separated in accordance with MPEG4 under discussion, the problem above may frequently arises. Therefore, a producer who provides video images in a format in which the encoding of images is carried out for objects as in MPEG4 is required to manage the copyright more strictly than in the job in which video images are encoded in MPEG or any preceding format. For this purpose, it is necessary to obtain streams of video images available in the market so as to determine whether or not particular objects exist therein. In the prior art, only the object identifiers can be used as information to identify associated objects. The identifiers are serial numbers assigned in a sequence of appearance of video images and are not associated with the contents of the objects. Namely, there is no measure for the computer to easily identify objects and hence it is impossible for the computer to retrieve particular objects. In consequence, there exists only one available method, namely, the video images are required to be visually checked by humans.
The situation above will be more specifically described by referring to FIGS. 1 to 3. These images show a scene of a film in which an assassin with a machine gun attacks his target. FIG. 1 shows a close-up image of the assassin. In FIG. 2, the attacked person is running away. FIG. 3 shows the running person and the assassin viewed from his rear side. It is assume that a cut takes place between the images respectively of FIGS. 1 and 2 and the camera pans (moves in a horizontal direction) from FIG. 2 to FIG. 3 to resultantly show the assassin in the image. In FIG. 1, assassin 1 and background 2 are related to respective objects and are therefore encoded separately. In FIG. 2, objects are produced for target 3, background 4, and hindrance 5, respectively. In FIG. 2, target 3 runs away and recognizes hindrance 5 to consequently change the running direction. In FIG. 3, assassin 8, target 6, and background 7, and hindrance 5 are associated with objects.
Since no cut exists between FIGS. 2 and 3, the same identifier is assigned to targets 3 and 6 (reference numerals 13 and 16 of FIG. 4). This is also the case of hindrances 5 and 9 as well as backgrounds 4 and 7 (reference numerals 15 and 19 as well as 14 and 17 of FIG. 4). However, a cut exists between FIGS. 1 and 2, assassins 1 and 8 are mutually assigned with different identifiers (reference numerals 11 and 18 of FIG. 4). This is also the case of backgrounds 2 and 4, i.e., there are assigned different identifiers (reference numerals 12, 14, and 18 of FIG. 4). For the producer having the copyright, assassin 1 in a scene is the same as assassin 8 in another scene. That is, for an actor of the assassin, these video images are protected in accordance with the right of his portraits. In the object-base encoding, mobile pictures of objects can be easily separated, namely, it is easy to combine assassin 1 with background 7 to create a new image. The obtained image, however, does not match intention of the film producer.
Assume now that the drawback above is removed by some measure for the computer to identify objects. A malicious person will deceive or cheat the measure. In accordance with the present invention, identifiers of objects are employed as the measure, which will be described later. In this case, a malicious person may possibly change an identifier of an object. In MPEG, no particular rule is stipulated to assign an identifier. Therefore, even if the identifier is changed, no problem occurs in the decoding phase. Namely, it is possible for the malicious person to change the object identifier. In this situation, however, the object of which the identifier is changed cannot be retrieved by the computer. Consequently, there is required a measure to detect the act of changing the identifier. This has been impossible in the prior art.
Description will now be given of a case in which a producer of programs or a user having a stream desires to create a digest or an index of the stream. This may occur that the user produces the digest or index as he or she likes or a digest is produced for a program guide or an index is created for a media title for DVD video images.
In this situation, even if it is desired to make a search or to create a database for each person in the images, since identifiers assigned to objects are not associated with such persons, it is not possible to simply search for each person in a mechanical manner. The identifiers are attended with a difficulty, in addition to the creation of a database, for an operation to select only scenes in which a favorite actor appears or to generate an index of such scenes. These jobs have been conventionally carried out by humans. However, in the present situation in which the number of video sources are remarkably increased due to, for example, multi-channels of satellite broadcasting, the method above is limited in usability and hence it has been desired to conduct the operations by machines. Particularly, when a user desires to conduct the job, it is necessary in the conventional method for the user to view the contents many times to manually mark necessary points. This requires a conspicuously large amount of human labor.
Description will now be given of simplification of video images. In a scene, for example, a mob scene (in which many people are moving), when a particular person or object is not easily identified in the scene, it maybe desired to thin out other objects to some extent to attain video images in which the particular object is clearly shown. The user may desire to view the target item concealed by an object existing in front of the item or to extract, in a weather forecast, a desired region or a desired item (e.g., only the temperature or height of waves). Specification for deletion of a particular object, for extraction of a particular object, or for deletion of all objects which conceal a particular object is to be manually conducted in the prior art. Resultantly, when objects move frequently or when many cutbacks occur in video images, the desired operation is required to be conducted for each movement or each cutback. It is therefore actually not possible to achieve the operation in a realtime fashion.
According to the circumstances above, it is not possible in the prior art to appropriately manage the copyright and the right of portraits for mobile pictures in films, television programs, DVD images in a simple method.
It is therefore an object of the present invention to remove the drawbacks of the prior art and to provide an image object managing method capable of simply achieving management of the copyright and the right of portraits in a sequence of images such as mobile pictures, an image processing apparatus employing the managing method, image streams created in the method and a recording media for the image streams, and a recording media on which a program to implement the method is recorded.
Another object of the present invention is to provide an image object managing method suitable for creating a digest and/or an index of a sequence of images such as mobile pictures, an image processing apparatus employing the managing method, image streams created in the method and a recording media for the image streams, and a recording media on which a program to implement the method is recorded.
Still another object of the present invention is to provide an image object managing method suitable for extracting a particular object from a sequence of images such as mobile pictures, an image processing apparatus employing the managing method, image streams created in the method and a recording media for the image streams, and a recording media on which a program to implement the method is recorded.
To achieve the objects above in accordance with the present invention, when a producer creates a stream of video images, it is allowed for the producer to intentionally assign an identical identifier in all video images to object streams of objects which is regarded as identical by the producer. Consequently, when a particular object is desired to be detected in the video stream, the identifier assigned to the particular object can be used as a retrieval key. Namely, the particular object can be retrieved by a computer.
Moreover, in accordance with the present invention, there is provided an image object managing method of mobile pictures and the like in which respective images appearing in backgrounds and screen images are independently encoded such that in a decoding phase, the background and screen images are decoded to be combined with each other for presentation thereof. In the method, there is provided a database which establishes a correspondence between a required condition for a data stream and an identifier of the data stream or a value obtained by conducting a mathematical operation for the identifier. When an entire stream is to be created, each object satisfying a desired condition associated with the database is assigned with an identifier corresponding to a mathematical operation satisfying a condition of the data stream, the identifier being an identifier of an object stream associated with the object. The other object streams are assigned with an identifier other than that of the object stream. When extracting from the overall stream each object stream satisfying a desired condition appearing in one scene or in a plurality of scenes, the identifier corresponding to the desired condition is used as a key for the retrieval of the object stream.
The image object managing method further includes the steps of disposing an encryption field in a subordinate field of the object stream, encrypting all or part of data of the object stream in the areas other than the encryption field including the identifier field, using as a seed an encryption key known only by a person who encrypts data; and writing encrypted data in the encryption field, extracting an identifier from data decoded using the encryption field and the encryption key. Alternatively, the method includes the steps of creating an electronic watermark using the identifier as a seed, writing the watermark in a data field of the object stream, and extracting an identifier from the electronic watermark buried in the data field. The method further includes the step of comparing the extracted identifier with an identifier in the object stream, thereby detecting modification of the identifier.
In accordance with the present invention, there is provided recording media containing a plurality of object streams recorded thereon in which each object stream includes an object identifier field, an encryption field, and a data field, and data obtained by encrypting all or part of data of the object stream in the areas other than the encryption field is recorded in the encryption field.