With the recent increase of multimedia contents, there is an increasing market demand for a content summarizing art that allows users to view contents in a shorter time. In addition, the variety of contents is also increasing, such as movie, drama, home video, news, documentary and music, and thus the variety of user requests is also increasing.
With the increase of variety of user requests, there is an increasing demand for an art for retrieving and presenting any desired image or scene in response to a user request. As an example of such arts, there is known a content summarizing art that summarizes an audiovisual content based on audio signal data in the content (see the Patent literatures 1 and 2, for example).
According to the content summarizing art disclosed in the Patent literature 1 described above, audio data is analyzed to extract at least one of the fundamental frequency, the power and the temporal variation characteristics of the dynamic feature quantity and/or the differences thereof between frames as audio feature vectors. Using a codebook that associates representative vectors obtained by quantization of the extracted audio feature vectors, emotions of speakers, and the respective emotion appearance probabilities with each other, the probabilities of appearance of emotional states including laughter, anger and sorrow are determined.
According to the disclosed emotion detecting method, a part of the content that includes a section determined to be in an emotional state based on the emotional state appearance probabilities is determined as an important part and extracted.
Similarly, according to the content summarizing art disclosed in the Patent literature 2 described above, audio data is analyzed to extract at least one of the fundamental frequency, the power and the temporal variation characteristics of the dynamic feature quantity and/or the differences thereof between frames as audio feature vectors. Using a codebook that associates representative vectors obtained by quantization of the extracted audio feature vectors, the speech emphasis state probability and the calm state probability with each other, the probabilities of appearance of the emphasis state and the calm state are determined.
According to the emotion detecting methods described above, a plurality of pieces of learning audio signal data are retained, and the emotional state is determined by comparison between newly input audio signal data and the retained learning audio signal data. Therefore, in order to increase the determination accuracy, the amount of learning audio signal data has to be increased. Thus, the emotion detecting methods described above are known to have a problem that the memory cost and the calculation cost are enormous.
As related arts, there are known a method of extracting a fundamental frequency and a power (see the Non-patent literature 1, for example), a method of extracting a temporal variation characteristic of a speech rate (see the Non-patent literature 2, for example), a method of estimating a parameter of a probability model (see the Non-patent literatures 3 and 4, for example), and a method of determining a generalized state space model (see the Non-patent literature 5, for example).    Patent literature 1: Japanese Patent Application Laid Open No. 2005-345496 (paragraphs 0011 to 0014, for example)    Patent literature 2: Japanese Patent No. 3803311    Non-patent literature 1: Sadaoki Furui, “Digital Speech Processing, Chapter 4, 4.9 Pitch Extraction,” Tokai University Press, September 1985, pp. 57-59    Non-patent literature 2: Shigeki Sagayama, Fumitada Itakura “On individuality in a Dynamic Measure of Speech,” Proc. of The 1979 Spring Meeting of The Acoustic Society of Japan, 3-2-7, 1979, pp. 589-590    Non-patent literature 3: Kenichiro Ishii, Naonori Ueda, Eisaku Maeda, Hiroshi Murase “Pattern Recognition,” Ohmsha, first edition, August 1998, pp. 52-54    Non-patent literature 4: Jinfang Wang, Syu Tezuka, Naonori Ueda, Masaaki Taguri “Calculation of Statistics I: New technique of the probability calculation, frontier of statistics science 11, Chapter 3, 3 EM Method, 4 Variational Bayesian Method,” Iwanami Shoten, June 2003, pp. 157-186    Non-patent literature 5: Kitagawa, G., “Non-Gaussian state-space modeling of nonstationary time series,” Journal of the American Statistical Association, December 1987, pp. 1032-1063