US 7,321,854 B2 | ||
Prosody based audio/visual co-analysis for co-verbal gesture recognition | ||
Rajeev Sharma, State College, Pa. (US); Mohammed Yeasin, Utica, N.Y. (US); and Sanshzar Kettebekov, State College, Pa. (US) | ||
Assigned to The Penn State Research Foundation, University Park, Pa. (US) | ||
Filed on Sep. 19, 2003, as Appl. No. 10/666,460. | ||
Claims priority of provisional application 60/413998, filed on Sep. 19, 2002. | ||
Prior Publication US 2004/0056907 A1, Mar. 25, 2004 | ||
Int. Cl. G10L 15/00 (2006.01) |
U.S. Cl. 704—243 [704/276] | 19 Claims |
1. A method of gestural behavior analysis, comprising the steps of:
performing a training process using a combined audio/visual signal as a training data set, whereby prosodic audio features
of said training data set are correlated with visual features of said training data set;
producing a statistical model based on results of said training process; and
applying said model to an actual data set to classify properties of gestural acts contained therein
wherein said training process comprises at least the steps of:
dividing said combined audio/visual signal into an audio component and a visual component;
identifying observable visual features of said visual component;
identifying observable prosodic features of said audio component; and
co-analyzing said audio and visual components to establish a correlation between said observable visual features and said
observable prosodic features.
|