1. Field of the Invention
The present invention relates to a motion descriptor generating apparatus by using accumulated motion histogram and a method therefor, in which after motion information onto a video concept is processed in view of human perceptual characteristics, accumulated motion histogram is generated and a motion descriptor is generated by using the accumulated motion histogram, thereby being applicable to multi-media database or search system, that is, video application fields.
2. Description of the Conventional Art
Recently, due to the continuous development of expression media, delivery media, storing media and operating these media, demands for free production of large capacity of multimedia data rather than a mono media, rapid search and convenient use and reuse has a tendency toward increasing. The demands for production of free multimedia have been satisfied with according to the development of electronic expression media, and therefore, huge amount of mono-media or multimedia data become scattered on personal or public systems.
However, as the amount of multimedia data increase, time and expense required for searching data to use or reuse become proportionally increased. Therefore, in order to increase the speed and efficiency of data search, it has been researched and developed new search techniques which include the widely-known character-based search technique and have composite information attribute, thereby being suitable for efficient data search of multimedia.
In order to perform search and index of the multimedia data efficiently, it is necessary to achieve minimization of the number of information attribute which express respective media data, simplification of whole procedures and real-time processing. Furthermore, it is necessary to guarantee effectiveness and variety of information attribute to express and flexibility of search.
Further the subjective similarity and the objective similarity of search result is the important factors to evaluate the performance of search. The importance of the subjective similarity is resulted from the limitation in the representation of information characteristics which describes the media data characteristics. Therefore, even though the objective similarity is large, the validity and utility of the search are degraded if the search result as desired by users. Accordingly, it has been continuously studied to develop methods which may reflect subjective similarity to the information attribute which expresses the media data characteristics.
The difference between character-based search and multimedia data-based search, which are both widely used in the fields, is the difficulty of information attribute extraction and variety of information attribute description.
In the view of the difficulty of information attribute extraction, even though it is possible to conduct a search the character data by indexing several principal words and sentences in a document, the data size is to large and lots of media are mixed in case of multimedia so that a proper preprocessing is required in order to obtain new valid information attribute in which information attributes are organically combined each other.
The preprocessing is performed to extract valid characteristics information and to give validity to the procedure. In addition, even though a method is able to detect valid characteristics information; the method can not be put into practical in the application fields that the large amount of multimedia data should be processed in a short time or terminal systems have bad performance, if the expense for hardware (H/W) and software (S/W) in the procedure.
Now, the variety of information attribute description will be described in case of video search, as an example. The video is mixed with various media information such as images, voice, sound, etc. Therefore, after valid information attribute is extracted by preprocessing the information attribute of the respective mono-media data or the information attribute of multimedia which is mixed with more than two media, the data search in the various type be conducted by using the extracted information attribute. For example, in order to search the video, information attribute of images may be utilized and it is also possible to utilize the combined information attribute of images and voices. Therefore, the search which utilizes the various multimedia attributes is more effective than that which utilizes only a single mono-media attributes.
Nowadays, most of studies have been concentrated on the field of stop image which is easier to obtain data in multimedia data indexing and search. The still image is widely utilized in the fields of storing system such as digital electronic steel cameras and image database, transmission system such as stop image transmission device or audio graphic conference and image conference, and printing system such as a printer. The still image indexing or search is the image search method based on the content. Therefore, the importance resides on the extraction, indexing and search method of characteristics information which has a consistent characteristic with relation to the change such as rotation, scale, and translation of the color, texture and shape of the images.
The video search fields is not easy to obtain data comparing with the stop image and limited in its application due to the large capacity of data to store and process. However, owing to the rapid development of the transmission media and storing media such as discs, tapes and cd roms, the expense required for obtaining the data decreases-and owing to the tendency of minimizing of the necessary devices, the study in this fields becomes vigorous. In general, the video refers all the series of images which have sequence in the continuous time. Therefore, the video has the spatial redundancy (repetition) in an image and between images, which is the difference of the characteristics of the video images from the stop images. In the video search method, the reducdance between images may be importantly utilized in the extraction of the characteristics information.
The redundancy between video frames may be measured by using the motion degree. For example, if the redundancy is large, it means that the size of region is large and the motion between the region is small. On the other hand, if the redundancy is small, it means that the size of region is small and the motion between the region is large. At present, video compression methods, of which standardization is finished, adopt motion estimation between images (BMA-Block Matching Algorithm) for thereby improving data compression efficiency (H.261, H.263, MPEG-1, MPEG-2, MPEG-4).
In the conventional video search method, a certain temporal position (hereinafter, to be referred to xe2x80x9cclipxe2x80x9d) of certain units is structured on the basis of changes in the color, texture, shape and motion between images, several key frames, which represent meaning characteristics and signal characteristics of the images in the clip, are selected, and the characteristics information are extracted with relation to the information attributes of the selected key frames to perform them in indexing or search.
In the video structuring, general structure is to be a hierarchical structure which is comprised of basic unit xe2x80x9cshotxe2x80x9d which is a series of stop images having no temporal disconnection, xe2x80x9cScenexe2x80x9d which is a series of continuous shots, having temporal and spatial continuity in the content, and xe2x80x9cStoryxe2x80x9d which is a series of continuous scenes in the four steps of composition.
FIG. 17B shows the general video structure. The video structuring may be achieved in the type of an event tree on the basis of signal characteristics. In the video structuring, the structuring information of the signal characteristics and the significant characteristics may exist together on the basis of correlated link.
In FIG. 17B, the segment tree as shown in the left side and the segment tree as shown in the right side are linked together in the direction as shown by the arrow. For example, if Clinton case which is structured in the event tree is searched, the video link is carried on the event tree from segment 1 to a video of shot 2 of sub-segment 1 and a video of shot 3 of the segment 3.
Further, a single stop image may be also structured. For example, in case of a photograph that a person is in a forest, the structuring is advanced in such a manner that the person and the forest structure an object, face and body structure the person, and eyes, nose, ears, etc, structure the face, as shown in FIG. 17A. FIG. 17A is a view for explaining the structuring of a stop image. In FIG. 17A, it is possible to structure the signal in the shape of region tree which is on the basis of signal characteristics in the images and the meaning in the shape of object tree which is on the basis of objects having sensory meaning in the images. In general, the signal structuring is performed by semi-automatic and automatic stop structuring, while the meaning structuring, which is the conceptional structuring, is performed manually by users by means of manual structuring method.
In the structure of the stop image, as shown in FIG. 17A, the structuring information of the signal characteristics and the meaning characteristics may exist together on the basis of correlated link between the region tree which is shown in the left side and the object tree which is shown in the right side.
The sound signal is comprised of background noise, people""s conversation voices, and background music.
The video structuring becomes more precise as the number of images of a video increases and has an advantage that the search may be conducted faster by this video structuring than by the indexing which utilizes the variety of characteristics.
FIG. 18 is a view for explaining the video structuring by the unit of xe2x80x9cscenexe2x80x9d.
The indexing and search methods by using the key frames do not describe any characteristics of the signal characteristics, temporal and spatial distributions and changes with relation to the whole video, images between the key frames, and a specific temporal period so that the methods have disadvantages that it is not suitable for such applications which requires the above characteristics which can not described by the key frames.
The present invention is derived to resolve the disadvantages of the conventional techniques and it is an object of the present invention to provide a motion descriptor generating apparatus by using accumulated motion histogram and a method therefor, in which perceptual characteristics of user may be reflected in the motion indexing and search of video content, for thereby supporting the users"" indexing and search in various stages.
In order to achieve the above objects of the present invention, it is provided a motion descriptor generating apparatus by using accumulated motion histogram, which includes a motion histogram generating unit for respectively generating motion histogram with relation to intensity data and direction data of an input motion, an accumulated motion histogram generating unit for generating two-dimensional accumulated motion histogram in a predetermined sequence by using the motion histogram which is generated in the motion histogram generating unit, and a motion descriptor generating unit for structuring (hierarchicy structure) video into certain units according to a change amount of the accumulated motion histogram with the lapse of time, which is generated in the accumulated motion intensity histogram generating unit, and generating motion descriptor which describes motion characteristics with relation to the respective structured units.
In order to achieve the above objects of the present invention, it is provided a method for generating motion descriptors by using accumulated motion histogram, which includes the steps of generating motion histogram for input motion intensity and direction data, generating accumulated motion histogram for generating two-dimensional accumulated motion histogram from the generated motion histogram in a predetermined sequence, and structuring (hierarchy structure) video into certain units with relation to the generated accumulated motion histogram according to a change amount with the time lapse, and generating a motion descriptor for describing motion characteristics for the respective structured units.
In order to achieve the above objects of the present invention, it is provided a motion descriptor generating apparatus by using accumulated motion histogram, which includes a motion intensity computing unit for computing intensity (degree) of input motion intensity information, a motion intensity histogram generating unit for generating motion intensity histogram with relation to the motion intensity information which is computed by the motion intensity computing unit, an accumulated motion intensity histogram generating unit for generating two-dimensional accumulated motion intensity histogram in a predetermined sequence by using the motion intensity histogram which is generated in the motion intensity histogram generating unit, a motion descriptor generating unit for structuring (hierarchy structure) video into certain units according to a change amount of the accumulated motion intensity histogram with the lapse of time, which is generated in the accumulated motion intensity histogram generating unit, and generating a motion descriptor which describes motion characteristics with relation to the respective structured units.
In order to achieve the above objects of the present invention, it is provided a motion descriptor generating apparatus by using accumulated motion histogram, which includes a motion direction computing unit for computing direction of input motion direction information, a motion direction histogram generating unit for generating motion direction histogram with relation to the motion direction information which is computed by the motion direction computing unit, an accumulated motion direction histogram generating unit for generating two-dimensional accumulated motion direction histogram in a predetermined sequence by using the motion direction histogram which is generated in the motion direction histogram generating unit, and a motion descriptor generating unit for structuring (hierarchy structure) video into certain units according to a change amount of the accumulated motion direction histogram with the lapse of time, which is generated in the accumulated motion direction histogram generating unit, and generating a motion descriptor which describes motion characteristics with relation to the respective structured units.
5. In order to achieve the above objects of the present invention, it is provided a motion descriptor generating apparatus by using accumulated motion histogram, which includes a motion intensity computing unit for computing intensity (degree) of input motion intensity information, a motion intensity histogram generating unit for generating motion intensity histogram with relation to the motion intensity information which is computed by the motion intensity computing unit, an accumulated motion intensity histogram generating unit for generating two-dimensional accumulated motion intensity histogram in a predetermined sequence by using the motion intensity histogram which is generated in the motion intensity histogram generating unit, a motion direction computing unit for computing direction of input motion direction information, a motion direction histogram generating unit for generating motion direction histogram with relation to the motion direction information which is computed by the motion direction computing unit, an accumulated motion direction histogram generating unit for generating two-dimensional accumulated motion direction histogram in a predetermined sequence by using the motion direction histogram which is generated in the motion direction histogram generating unit, and a motion descriptor generating unit for structuring (hierarchy structure video into certain units according to change amounts of the accumulated motion intensity and direction histograms with the lapse of time, which are generated in the accumulated motion intensity and direction histogram generating unit, and generating motion descriptors which describe motion characteristics with relation to the respective structured units.
In order to achieve the above objects of the present invention, it is provided a method for generating motion descriptors by using accumulated motion histogram, which includes the steps of motion intensity computing for computing intensity (degree) of input motion intensity information, motion intensity histogram generating for generating motion intensity histogram with relation to the motion intensity information which is computed in the step of motion intensity computing, accumulated motion intensity histogram generating for generating two-dimensional accumulated motion intensity histogram in a predetermined sequence by using the motion intensity histogram which is generated in the step of motion intensity histogram generating, and motion descriptor generating for structuring (hierarchy structure ) video into certain units according to a change amount of the accumulated motion intensity histogram with the lapse of time, which is generated in the step of accumulated motion intensity histogram generating, and generating a motion descriptor which describes motion characteristics with relation to the respective structured units.
In order to achieve the above objects of the present invention, it is provided a method for generating motion descriptors by using accumulated motion histogram, which includes the steps of motion direction computing for computing direction of input motion direction information, motion direction histogram generating for generating motion direction histogram with relation to the motion direction information which is computed by the step of motion direction computing, accumulated motion direction histogram generating for generating two-dimensional accumulated motion direction histogram in a predetermined sequence by using the motion direction histogram which is generated in the step of motion direction histogram generating, and motion descriptor generating for structuring (hierarchy structure) video into certain units according to a change amount of the accumulated motion direction histogram with the lapse of time, which is generated in the step of accumulated motion direction histogram generating, and generating a motion descriptor which describes motion characteristics with relation to the respective structured units.
In order to achieve the above objects of the present invention, it is provided a method for generating motion descriptors by using accumulated motion histogram, which includes motion intensity computing for computing intensity (degree) of input motion intensity information, motion intensity histogram generating for generating motion intensity histogram with relation to the motion intensity information which is computed by the step of motion intensity computing, accumulated motion intensity histogram generating for generating two-dimensional accumulated motion intensity histogram in a predetermined sequence by using the motion intensity histogram which is generated in the step of motion intensity histogram generating, motion direction computing for computing direction of input motion direction information, motion direction histogram generating for generating motion direction histogram with relation to the motion direction information which is computed by the step of motion direction computing, accumulated motion direction histogram generating for generating two-dimensional accumulated motion direction histogram in a predetermined sequence by using the motion direction histogram which is generated in the step of motion direction histogram generating, and motion descriptor generating for structuring (hierarchy structure) video into certain units according to change amounts of the accumulated motion intensity and direction histograms with the lapse of time, which are generated in the steps of accumulated motion intensity and direction histograms generating, and generating motion descriptors which describe motion characteristics with relation to the respective structured units.