For a user or an animal, the facial variation pertains to the most instinctive and severe visual experience. Therefore, in the field of using the interaction interface in the communication system, it is always one direction, which is continuously developed in the industry to drive a 3-dimensional (3D) facial model by the video of the user or animal. In the existing art, the 3D model transformation is calculated according to the facial reliable character points of the user or animal so as to drive the 3D facial animation. Generally speaking, many reliable character points have to be adopted so that the natural and fine 3D facial animation can be produced.
However, the acquisition of the character points relies on the two-dimensional video data. If the reliable character information cannot be provided in the two-dimensional image, the character points are mis-judged and the produced facial animation is inconsistent with the actual facial image. If the character points are obtained from the less information to drive/produce the facial animation, the inconsistency between the facial animation and the facial image may be reduced, but the animation similarity may become insufficient. Thus, it is an important direction to get the balance between the video data quantity and the facial animation similarity.