In the process of video communication, a video camera needs to align with a speaker. In the existing schemes, the image identification technology is used for face recognition, and then a video camera is controlled remotely to align with a position of the face. However, such scheme cannot achieve tracking a speaker outside a screen range or another speaker in the screen range.