Conventional video conferencing supports a multi-picture conference. Real conditions of each conferencing terminal participating in the conference may be presented on an output television of the conferencing terminal. After the conference begins, the multi-picture mode presented on each conferencing terminal is a preset mode. After each conferencing terminal in the conference joins the conference, an MCU decodes and filters the received image of each conferencing terminal, and then selects the most appropriate multi-picture mode according to the number of conferencing terminals. For example, currently the preset multi-picture modes include a 2-picture mode and a 4-picture mode; when three conferencing terminals join the conference, the system selects the 4-picture mode for displaying, zooms the image according to the size of a sub-picture, then fills the image into the corresponding sub-picture, and finally encodes the multi-picture image adjusted by the MCU and sends it to each conferencing terminal in the conference.
However, when the prior art is used for optimizing configuration of multiple pictures, because the multi-picture mode is preset, the phenomenon of blank sub-picture screens occurs.