In low bit rate video conferencing, much of the displayed video is static and represents a relatively unchanging background. This could be well predicted if a good representation of it could be transmitted to the decoder, and having a high quality background saves bits that could be used on the foreground speaker. Unfortunately, when the bit rate is low, the background cannot generally be transmitted in high quality. The frame size required would be too large: sending such a frame may cause packet loss, dropped frames and extra latency. All of these phenomena would diminish the viewing experience for a user.
An approach within existing technology might be to increase the quality of the video by focusing on different regions alternately, to spread the extra bits of the high quality background across a number of frames. Useful though this approach might be, it has some limitations. One is that the amount of enhancement available is still limited by the need to keep within a frame budget; another is that accumulating a full high-quality reference frame requires many frames with high quality regions to be stored to cover the whole frame area pertaining to the background. This is onerous for the encoder and the decoder, requires more complex encoder reference selection and is inefficient in coding terms as many different reference indices must be encoded.
Another option would be to specify an alternative reference frame. This would be a reference frame that need not be displayed, and in that sense would be artificial. However, transmitting a whole frame would be difficult, since this would therefore require a large amount of bandwidth. Moreover, such a frame could not be incrementally updated except by retransmitting an entire new alternative reference frame.