Video bitstreams that support “spatial scalability” may be encoded as a base layer and a set of enhancement layers, with each enhancement layer facilitating the synthesis of a higher quality/resolution frame for display. Synthesizing a given enhancement layer may involve the use of motion vector data from a previous (e.g., reference) layer in the set, wherein motion vectors may be commonly used to track inter-frame motion within video and burst captured still images. Communication networks, however, may often cause video bitstreams to suffer from packet losses due to channel bandwidth limitations, channel noise, and so forth. If a reference layer is lost due to such a packet loss condition, any subsequent layers relying on the motion vector data from that layer may be disregarded in conventional decoding solutions. As a result, only the last successfully received layer can be used to synthesize the output frame. While the synthesized frame may be upsampled to achieve the target resolution/size, such an approach may lead to blurry results that are missing small details.