For applications such as standards conversion and generation of slow and fast motion in film, television and other video productions, images in a sequence of images may be simply repeated or dropped to achieve a desired sampling rate. Such a technique, however, generally produces unwanted visible artifacts such as jerky motion. Analysis of motion in a sequence of images is commonly used to improve interpolation of the sequence of images.
Motion analysis generally is performed by determining a set of motion parameters that describe motion of pixels between a first image and a second image. For example, the motion parameters may describe forward motion of pixels from the first image to the second image, and/or backward motion of pixels from the second image to the first image. The motion parameters may be defined at a time associated with either or both of the first and second images or at a time between the first and second images. These motion parameters are then used to warp the first and second images to obtain an interpolated image between the first and second images. This process generally is called motion compensated interpolation.
Two images are analyzed to compute a set of motion vectors that describes motion between the first and second images. A motion vector is computed for each pixel in an image at a time between the first and second images. This set of motion vectors may be defined at any time between the first and second images, such as the midpoint. The motion vectors may be computed using any of several techniques. An example technique is based on the constant brightness constraint, also referred to as optical flow. Each vector is specified at a pixel center in an image defined at the time between the first and second images. The vectors may point to points in the first and second images that are not on pixel centers.
The motion vectors are used to warp the first and second images to a point in time of an output image between the first and second images using a factor that represents the time between the first and second image at which the output image occurs. The warped images are then blended using this factor to obtain the output image at the desired point in time between the first and second images. The point in time at which the output image occurs may be different from the time at which the motion vectors are determined. The same motion vectors may be used to determine two or more output images at different times between the first and second images.
The images may be warped using a technique in which many small triangles are defined in an image corresponding in time to the point in time between the first and second images at which the motion vectors are determined. A transform for each small triangle from the point in time at which the motion vectors are determined to the desired interpolated image time is determined, e.g., the triangle is warped using the motion vectors associated with its vertices. For each pixel in each triangle in the output image, corresponding points in the first and second images are determined, and the first and second images are spatially sampled at these points. These samples for each pixel are combined to produce a value for that pixel in the output image.
Motion compensated interpolation also may be performed on two or more images that are dissimilar, or that are non-sequential, or that are not contiguous in any one sequence of images. Thus, motion analysis may be used to process transitions between different sequences of images, such as a dissolve or a jump cut. If two consecutive sequences of images have corresponding audio tracks, the audio tracks may be processed to identify a point in time at which motion compensated interpolation of the transition between the sequences should be performed.
Motion compensated interpolation of a sequence of images also may be performed in conjunction with audio processing. For example, if interpolation of the sequence of images changes the duration of the sequence, the duration of a corresponding audio track may be changed to retain synchronization between the audio and the sequence of images. Resampling of the audio may be used to change the duration of the audio, but results in a change in pitch. Time scaling of the audio also may be used to change the duration of the audio without changing the pitch.
Occasionally, such interpolation creates visible artifacts in the resulting output images, particularly if there is a foreground object that occludes then reveals a background object, or if there is an object that appears or disappears in the images. In some cases, the foreground may appear to stretch or distort, or the background may appear to stretch or distort, or both. In such cases, a region in an image may be defined. The region may be segmented into foreground and background regions. A tracker then may be used to track either the foreground region or the background region or both as an object. A single motion vector or a parameterized motion model obtained from the tracker may be assigned to the tracked region. A combination map also may be defined to control which pixels of the input images are used to contribute to each pixel of an output image based on how a motion vector transforms a pixel from the input image to the output image.
For interlaced media, a vector map of motion also can be computed between fields of opposite sense, i.e., odd and even fields, by treating the two fields as if they are two images of the same type. The resulting vector map, when generated using two fields of opposite field sense, has a vertical offset of about one half of a line. This vector map is then modified by adjusting the vertical component of all of the vectors either up or down half a line. Warping operations then are performed using the modified vector map. However, when sampling a field of one field sense to generate a field of an opposite field sense, the sampling region is translated either up or down half a line.
Accordingly, in one aspect, an output image associated with a point in time between a first image and a second image is generated by determining a motion vector for each pixel in an image at a map time between the first image and the second image, wherein the map time is different from the point in time of the output image. Each motion vector describes motion of a pixel of the image at the map time to a first point in the first image and a second point in the second image. A factor that represents the point in time between the first image and the second image at which the output image occurs is calculated. The first image is warped according to the determined motion vectors and the factor. The second image is warped according to the determined motion vectors and the factor. The warped first image and the warped second image are blended according to the factor to obtain the output image.
In another aspect, a plurality of output images, wherein each output image is associated with a different point in time between a first image and a second image, is generated by determining a motion vector for each pixel in an image at a map time between the first image and the second image. Each motion vector describes motion of a pixel of the image at the map time to a first point in the first image and a second point in the second image. For each output image, a factor that represents the point in time between the first image and the second image at which the output image occurs is calculated. For each output image, the first image is warped according to the determined motion vectors and the factor for the output image. For each output image, the second image is warped according to the determined motion vectors and the factor for the output image. For each output image, the warped first image and the warped second image are blended according to the factor for the output image.
In one embodiment, the first image is in a first sequence of images and the second image is in a second sequence of images such that the first image is not contiguous with the second image in a sequence of images. In another embodiment, the first sequence has associated audio and the second sequence has associated audio, the audio associated with the first sequence is dissolved to the audio associated with the second sequence. In another embodiment, a combination of the output image and the first and second images provides an output sequence of images with a duration at playback different from a duration of an input sequence of images containing the first and second images at playback. If the input sequence of images has associated audio with a duration, the duration of the audio may be adjusted to match the duration of the output sequence of images.
In one embodiment, the first and second images are processed to remove invalid image data. In another embodiment, during warping of an image, any motion vector that transforms a point in the output image to an area outside of one of the first and second images results in no contribution from that input image to the output image. In another embodiment, the output image is initialized to a blend of the first and second images according to the determined factor.
In another aspect, a plurality of output images, wherein each output image is associated with a different point in time between a first image of a first sequence of one or more images and a second image of a second sequence of one or more images, is generated. For each output image, a pair of a first image from the first sequence and a second image from the second sequence is selected. For each selected pair of first and second images, a motion vector is determined for each pixel in an image at a map time between the first image and the second image, wherein the motion vector describes motion of a pixel of the image at the map time to a first point in the first image and a second point in the second image. For each output image, a factor that represents the point in time, between the first and second images selected for the output image, at which the output image occurs is calculated. For each output image, the first image selected for the output image is warped according to the factor for the output image and the motion vectors determined for the first and second images selected for the output image. For each output image, the second image selected for the output image is warped according to the factor for the output image and the motion vectors determined for the first and second images selected for the output image. For each output image, the warped first image and the warped second image are blended according to the factor for the output image.
In another aspect, a transition of a plurality of output images is generated from a first sequence of images to a second sequence of images wherein an image at an end of the first sequence is not contiguous with an image at a beginning of the second sequence. For each output image, a pair of a first image from the first sequence and a second image from the second sequence is selected such that the output image has a point in time between the first image and the second image in the transition. For each selected pair of first and second images, a set of motion vectors is determined that describes motion between the first image and the second image. For each output image, a factor is calculated that represents the point in time, between the first and second images selected for the output image, at which the output image occurs. For each output image, motion compensated interpolation is performed to generate the output image according to the determined set of motion vectors and the calculated factor.
In another aspect, a jump cut is processed from a first image at an end of a first segment of sequence of images and corresponding audio and a second image at a beginning of a second segment in the sequence of images and corresponding audio. The corresponding audio is processed to identify an audio break between the audio corresponding to the first segment and the audio corresponding to the second segment. A set of motion vectors is determined that describes motion between the first image and the second image. Motion compensated interpolation is performed to generate one or more images between the first image and the second image according to the determined set of motion vectors at a point in time corresponding to the audio break.
In another aspect, a first image and a second image are warped and blended to obtain an output image at an output time between the first image and the second image. A set of motion vectors is determined at a map time and that describes motion between the first image and the second image. A primary transform is determined for each triangle in a set of triangles, defined in an image at the map time, from the map time to the output time using the determined set of motion vectors. For each triangle, any pixels in the output image that are contained within the triangle using the primary transform are identified. A first transform is determined for each triangle in the set of triangles from the output time to a time of the first image. For each pixel in each triangle at the output time, a point in the first image is identified using the first transform and the first image is spatially sampled at the point. A second transform is determined for each triangle in the set of triangles from the output time to a time of the second image. For each pixel in each triangle at the output time, a point in the second image is identified using the second transform and the second image is spatially sampled at the point. For each pixel in each triangle at the output time, the spatially sampled first image and the spatially sampled second image are combined to obtain a value for the pixel in the output image.
In another aspect, a first image and a second image are warped to obtain an output image at an output time between the first image and the second image. A set of motion vectors is determined at a map time and that describes motion between the first image and the second image. A primary transform is determined for each triangle in a set of triangles, defined in an image at the map time, from the map time to the output time using the determined motion vectors. For each triangle, any pixels in the output image that are contained within the triangle at the output time are identified using the primary transform. For each pixel in each triangle at the output time, the first image and the second image are spatially sampled at points corresponding to the pixel. The spatially sampled first image and the spatially sampled second image are combined to obtain a value for the pixel in the output image.
In one embodiment, the map time is between the first image and the second image. In another embodiment, the map time is different from the output time.
In another aspect, duration of an input sequence of images with associated audio may be changed, wherein the input sequence of images and associated audio has a duration. An indication of a selection of an operation by an operator, indicative of a desired duration of an output sequence of images, is received. In response to the received indication, a first image and a second image in the sequence of images are selected. A set of motion vectors is determined that describes motion between the first image and the second image. Motion compensated interpolation is performed to generate one or more images between the first image and the second image according to the determined motion vectors. These operations are performed for multiple pairs of first and second images in the sequence of images to provide the output sequence of images. The duration of the associated audio is adjusted to retain synchronization with the output sequence of images. In one embodiment, the output sequence of images may be played back with the audio. In another embodiment, adjusting the duration of the audio involves resampling of the audio. In another embodiment, adjusting the duration of the audio involves time scaling of the audio.
In another aspect, color correction may be performed by generating a first color histogram from first image from a first sequence of images and generating a second color histogram from a second image from a second sequence of images. A set of motion vectors is determined from the first and second color histograms, that describes motion between the first color histogram and the second color histogram. A table of color correction values is generated from the set of motion vectors. The table of color correction values is applied to a sequence of images.
In another aspect, artifacts in an image created using motion compensated interpolation of a first image and a second image may be reduced. A set of motion vectors is determined that describes motion between the first image and the second image. A foreground region and a background region are identified in the first and second images. Tracking is performed on at least one of the foreground region and the background region to determine a motion model for the tracked region. The set of motion vectors corresponding to the tracked region is changed according to the motion model for the tracked region. Motion compensated interpolation is performed to generate one or more images between the first image and the second image according to the changed set of motion vectors. In one embodiment, a combination map is determined using the changed set of motion vectors to indicate which of the first and second images are used to contribute to a pixel in an output image.
In another aspect, two fields of interlaced video may be processed by computing motion vectors describing motion of image characteristics from a field of a frame to another field of opposite field sense. An offset corresponding to one half of a line (having a sign according to an orientation of the y-axis of the image space and which field has the top line of the image) is removed from the motion vectors. The motion vectors then are used to generate a sampling region at a desired output time. The sampling region is transformed using the motion vectors to a sample time at one of the fields. The sampling region also is transformed using the motion vectors to a sample time at the other of the fields. The field sense of an output field to be generated at the desired output time is determined. The transformed sampling region for the field with a field sense opposite the field sense of the output field is translated by an offset of one half of a line and having a sign determined by the orientation of the y-axis of the image space and which field has the top line of the image. The output field is generated using the transformed and translated sampling regions and the fields of the frame. If the desired output time is the time of one of the fields, the generated output field may be combined with the field at the desired output time to generate a progressive image. An effect may be performed on this progressive image. The progressive image with the effect may be vertically decimated (i.e., by sampling every other line) to produce a field at a desired output time.
In another aspect, two fields of interlaced video may be processed by computing motion vectors describing motion of image characteristics from a field of a frame to another field of opposite field sense. An offset corresponding to one half of a line and having a sign according to an orientation of the y-axis of the image space and which field includes the top line. The time of one of the fields is selected as a desired output time. A sampling region specified at the selected time is transformed using the motion vectors to a sample time at the field that is being warped. The transformed sampling region is translated by an offset of one half of a line and having a sign determined by the orientation of the y-axis of the image space and which field includes the top line. The output field is generated at the desired output time using the transformed and translated sampling region and the field that is being warped. The generated output field may be combined with the field at the desired output time to generate a progressive image. An effect may be performed on this progressive image. The progressive image with the effect may be vertically decimated to produce a field at the desired output time.