The present invention relates to segmenting multimedia data, and in particular, segmenting image data streams.
Segmenting multimedia data is a fundamental problem. Segmentation requires breaking up a data stream into meaningful segments or parts. Properly segmented data streams can be better organized and reused. Segmented data streams provide points of access that facilitate browsing or retrieval. As more and more multimedia data are created and made available, segmentation methods can serve the important function of helping summarize this mass of material.
An image document stream having a plurality of image documents in a sequential order is one type of multimedia data stream. For example, a video includes an image document stream consisting of frames. The video may have meaningful parts or segments defined by camera changes, or scenes defined by semantically related shots. This is a difficult segmentation problem for machines (even for humans), because extracting semantics is hard.
A method may simply partition the image data stream at fixed time intervals, as described by Mills, M., Cohen, J., and Wong, Y. Y., 1992, A magnifier tool for video data. Proceedings of ACM Multimedia ""96. ACM Press, pp. 163-174.
Other methods use color histograms and clustering, as described in Boreczky, J. S. and Rowe, L. A., 1996, Comparison of video shot boundary detection techniques. Storage and Retrieval for Still Images and Video Databases IV, Proceedings of SPIE 2670, pp. 170-179 (xe2x80x9cBoreczkyxe2x80x9d); Girgensohn, A., and Boreczky, J., 1999, Time-constrained keyframe selection technique. Proceedings of the 1999 IEEE International Conference on Multimedia Computing and Systems. IEEE Computer Society, vol 1, pp. 756-761; Uchihashi, S. and Foote, J., 1999, Summarizing video using a shot importance measure and frame-packing algorithm (xe2x80x9cUchihashixe2x80x9d). Proceedings of ICASSP ""99, vol. 6, pp. 3041-3044; Yeung, M. M. and Yeo, B-L, 1997, Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans. Circuits and Systems for Video Technology, vol. 7, no. 5, pp. 771-785; Zhang, H. J., Low, C. Y., Smoliar, S. W., and Wu, J. H., 1995, Video parsing, retrieval and browsing: An integrated and content-based solution. Proceedings of Multimedia ""95. ACM Press, pp. 15-24.
Image data streams may be in many types and forms. Video types include produced video and raw video. Examples of produced video are news, movies and training videos. Examples of raw video are video of meetings, surveillance records, and wearable personal video cameras as described in Mann, S., 1996, xe2x80x98Smart Clothingxe2x80x99: Wearable multimedia computing and xe2x80x98personal imagingxe2x80x99 to restore the technological balance between people and their environments. Proceedings of ACM Multimedia ""96. ACM Press, pp. 163-174.
Slide shows, including presentation slides, are also image data streams at a courser level of granularity. Examples of presentation slides include PowerPoint slides and note pages with images such as produced by the NoteLook system as described by Chiu, P., Kapuskar, A., Reitmeier, S., and Wilcox, L. NoteLook: Taking notes in meetings with digital video and ink. Proceedings of Multimedia ""99. ACM, New York, pp. 149-158.
Therefore, there is a desire to provide a segmentation method, information system and computer-readable medium for segmenting data into meaningful sections for retrieval and browsing. For example, there is a desire to provide a method for segmenting image data streams so that summary image documents may be provided to summarize a lengthy image data stream.
A method, information system and computer-readable medium is provided for segmenting a plurality of data, such as multimedia data, and in particular, an image document stream.
According to an aspect of the present invention, a method segments a plurality of data into k segments. A first and second series of numbers is created (xe2x80x9cfirst and second stringsxe2x80x9d), respectively, representing the plurality of data. The first and second strings have kxe2x88x921 boundaries, respectively. First and second fitness function values are calculated for the data represented by the first and second strings, respectively. A third and fourth series of numbers (xe2x80x9cthird and fourth stringsxe2x80x9d) respectively representing the plurality of data is created based on the first and second fitness function values. A partition point (xe2x80x9ccrossoverxe2x80x9d) in the third and fourth strings is selected. The crossover partitions the third string into a head series of numbers and a tail series of numbers. The fourth string is also partitioned into a head series of numbers and a tail series of numbers. The third string tail series of numbers is interchanged with fourth string tail series of numbers to form a fifth and sixth series of numbers (xe2x80x9cfifth and sixth stringsxe2x80x9d). The numbers in the fifth and sixth strings are adjusted to equal kxe2x88x921 boundaries. Third and fourth fitness function values for the data represented by the fifth and sixth strings are then calculated. k segments is provided responsive to a comparison of the first, second, third and fourth fitness function values.
According to an aspect of the present invention, the plurality of data includes a plurality of images.
According to an aspect of the present invention, the plurality of the images includes a plurality of video frames.
According to an aspect of the present invention, the plurality of data includes a plurality of presentation slides.
According to an aspect of the present invention, the plurality of data includes note pages.
According to an aspect of the present invention, the plurality of data includes a plurality of non-image data.
According to an aspect of the present invention, the method further includes a step of reducing the plurality of data.
According to an aspect of the present invention, the plurality of data is reduced by subsampling the plurality of images and/or using a standard deviation measurement.
According to an aspect of the present invention, the fitness function is based on image similarity, importance and precedence.
According to an aspect of the present invention, the fitness function is based on histogram differences between adjacent data.
According to an aspect of the present invention, the fitness function equals:       f    ⁡          (              S        k            )        =            ∑              i        ,                  j          ∈                      S            k                                      xe2x80x83              ⁢          xe2x80x83        ⁢                  α        ⁡                  (                      i            ,            j                    )                    ⁢                        h          ⁡                      (                          i              ,              j                        )                          .            
According to an aspect of the present invention, the fitness function equals:       f    ⁡          (              S        k            )        =            ∑                        i          ,                      j            ∈                          S              k                                                i          ≠          j                            xe2x80x83              ⁢          xe2x80x83        ⁢                  h        ⁡                  (                      i            ,            j                    )                    ⁢                        (                                    I              i                        +                          I              j                                )                                      "LeftBracketingBar"                          i              -              j                        "RightBracketingBar"                    2                    
According to an aspect of the present invention, the fitness function equals:       g    ⁡          (              S        k            )        =            f      ⁡              (                  S          k                )                    (              1        +                  "LeftBracketingBar"                      k            -                          k              0                                "RightBracketingBar"                    )      
According to an aspect of the present invention, the method further comprises a step of adding additional data to the plurality of data.
According to an aspect of the present invention, the method further comprises a step of varying k segments.
According to an aspect of the present invention, the fitness function includes an importance factor equal to:       I    i    =            A      i        ⁢          P      i        ⁢          log      ⁡              (                  δ          ⁡                      (            i            )                          )              ⁢    log    ⁢          xe2x80x83        ⁢          1              W        i            
According to an aspect of the present invention, an information system for segmenting a plurality of data into k segments is provided. The system is comprised of a processor coupled to a memory. The memory stores a software program for segmenting a plurality of data. The plurality of data is encoded with a series of 0""s and 1""s, where there are kxe2x88x921 ones used as boundary segmentation points.
According to an aspect of the present invention, an article of manufacture, including a computer-readable memory is provided. A first software program forms a first string and a second string corresponding to a respective plurality of data, respectively. The first and second strings have boundary points. A second software program calculates a first and second fitness function value for the plurality of data represented by the first and second strings, respectively. A third software program creates a third and fourth string corresponding to the plurality of data based on the first and second fitness function values and a genetic operation. The second software program also calculates a third and fourth fitness function value for the plurality of data represented by the third and fourth strings. A fourth software program selects a set of data in the plurality of data based on the first, second, third and fourth fitness function values.