Patent Literature 1 describes an example of a conventional identity determination system.
The technique of Patent Literature 1 is to group objects detected from video frames based on the similarity between the objects, and create a list of the appearing objects based on the numbers of appearance of the objects constituting the respective groups. For example, human faces extracted from a video are grouped to create a list of performers based on the numbers of appearing faces in the respective groups.
FIG. 1 shows the configuration for that purpose, which includes: a still image extracting unit 700 for extracting an input video as a plurality of still images; an image dividing unit 701 for dividing the still images into shots which consist of arbitrary numbers of still images; a predetermined video determining unit 702 for determining whether or not each shot includes a predetermined video; a predetermined video precedent determining unit 703 for determining whether or not predetermined videos have been previously included in the input video; a predetermined video classifying and measuring unit 704 for performing grouping based on the similarity of the predetermined videos, and measuring the numbers of appearance of the predetermined videos; and a video appearance list creating unit 705 for creating a video appearance list based on the numbers of appearances.
Patent Literature 2 describes another example of an identity determination system.
The technique of Patent Literature 2 is to group segments that constitute an input signal, such as videos and sound, into combinations of segments having the same semantic signal structures if features extracted from the segments have high similarity therebetween and the segments are at a temporal distance smaller than or equal to a predetermined threshold. For example, in a conversation scene with two speakers, segments that appear alternately for the respective speakers are grouped by speaker.
FIG. 2 shows the structure for that purpose, which includes: a video feature extraction unit 801 that extracts video features from segments consisting of certain consecutive frames, and a sound feature extraction unit 802 that extracts sound features; a feature similarity measuring unit 805 that measures a pair of segments for similarity; and a scene detection unit 806 that detects a scene by detecting and collecting pairs of segments that have a mutual temporal distance within a predetermined time threshold and have dissimilarity smaller than or equal to predetermined dissimilarity.