1. Field of the Invention
The present invention relates to reproduction of speech and to the speech belonging facial movements of a speaking person. The invention is intended to be utilized in connections where a person s facial movements shall be reproduced simultaneously with produced sound.
2. Discussion of the Background
At speech synthesis there is a need to synchronize the speech with the facial movements of a speaking person. In patent application No. 9504367-5 is described how movement patterns in a face are recorded and stored together with a polyphone collection (sounds) at concatenation synthesis based on half-syllables. The recording of the movement patterns of the half-syllables of the subject (person) after that influences points in a polygon model of the face. Another texture, i.e. another face can be applied on top of the polygon model and at that get lip and facial movements from the polyphone synthesis.
With the described model is required that voices of men, women and children are recorded separately. Such procedures are expensive and circumstantial.
The present invention, and with reference to FIG. 3, relates to a method at speech synthesis for reproduction of facial movements of a person who has been allocated a speech via speech synthesis. Said speech is put together of polyphones which are fetched from a database (step S1). A databank is further established containing polyphones with to the polyphones belonging movement patterns in the face of a first person (step S2). Polyphones from a second person are further registered and stored in a database (step S3). The sound segments in corresponding polyphones in the databank and the database are compared (step S4), and the facial movements in the databank are modified in accordance with the deviation (step S5). The modified movement patterns are stored in the database and are related to the polyphone in question (step S6). The registered polyphones are after that utilized for putting together words and sentences at the same time as corresponding movement patterns build up a face model from the movement patterns in the database (step S7).
Speech from a subject (person) is recorded at the same time as the movement pattern of the subject is registered. The recorded speech preferably consists of nonsense words from which polyphones, half-syllables, are sorted out. The registered polyphones are stored in a polyphone bank. To each polyphone is further stored in a movement bank the facial movements of the subject. For a second person is in corresponding way polyphones registered in a polyphone base. The second person""s facial movements, however, are not registered. A comparison between the sound segments in corresponding polyphones is after that made between the polyphone base and the polyphone bank. The registered differences are after that utilized to modify current movement pattern in the movement bank, at which a model with a movement pattern corresponding to the second speaker""s pronunciation of the polyphones is obtained. The modified movement pattern is stored in a movement base. At putting together polyphones from the polyphone base, the movement base is after that utilized for creating of a face model, the movements of which correspond to the speaking person""s way of speaking. The created model consists of a polygon model based on the movement pattern from the movement pattern of the first subject. In order to create a vision of that the second person is speaking, a picture of the speaker is applied to the model. The polygon model is at that modified to adapt to the second person. The to the model applied picture can consist of stills or moving pictures which have been stored in the database or have been transferred via, for instance, the telecommunication network. A three-dimensional picture is in this way created.
The registered movements in the first face consist of points which have been arranged in a three-dimensional face model. The face model consists of a number of polygons out together by points. The points consist of measuring points in the face of a subject, which points are registered during recording of sounds/polyphones. The registering of the points in the face of the subject is preferably made by marking of selected points in the face of the subject. The points after that are registered by means of, for instance, laser technology, and a bank over sounds and movement patterns is created.
The field of use of the invention is all cases where a reproduction of sound/speech shall be given a lifelike movement pattern in a speaking person""s face. It can, for instance, relate to a person who is speaking a first language, but who by means of speech synthesis is represented speaking a second language. Such conditions should in the future be associated with telephony where the telecommunication system or equipments at the phoning persons translate the speech and represent the speaking persons in picture. The field of use of the invention, however, is not only telephony, but all connections where a first speech produced by a person shall be translated and reproduced in a second language with lifelike facial movements.
The indicated invention makes possible that a cheaper procedure can be applied at animation of speech with a belonging face. This is utilized, for instance, at translation of a speaking person""s speech at translation from a first language to a second language. At recording is required only one subject who is utilized for production of the basic movements in the face. The person/persons who shall borrow the movement pattern from the subject need only record a number of sound sequences from which polyhones can be extracted. By registering polyphones and belonging facial movements of a suitable selection of persons, a bank can be created which can be utilized in different situations for animation of different faces. The registered faces can for instance relate to different persons of different ages and of different sex.