1. Field of the Invention
The present invention relates to an information processing device and method, and program, and specifically, relates to an information processing device and method, and program whereby a tracking target within a moving image can be tracked accurately.
2. Description of the Related Art
Heretofore, an arrangement has been conceived wherein the content of an image is analyzed, and the analyzed results thereof are employed for image processing or the like. For example, an arrangement has been conceived wherein, with a moving image such as a shot image by a video camera, a desired portion within each of frame images is determined as a tracking point, and an image is enlarged, or the operation of the video camera is controlled so as to track the tracking point (e.g., see Japanese Unexamined Patent Application Publication No. 2005-303983).
Various methods have been proposed as a technique for tracking a target included in a moving image which is specified by a user. For example, there is a method for tracking by block matching processing. Block matching has been known as a method for obtaining a motion vector by employing the current field (or frame) image and an image adjacent thereto by one field (or one frame) to obtain a difference value (evaluation value) of these blocks. Further, in order to realize tracking, a motion vector calculated for each field (or frame) is integrated with the position specified first by the user as a starting point.
FIG. 1 schematically illustrates a situation in case where a moving image according to the interlace method is subjected to such tracking processing. As shown in FIG. 1, a frame 11 which is a frame image within this moving image is configured of two field images of a first field 11-1 and second field 11-2. Similarly, a frame 12 following the frame 11 is configured of a first field 12-1 and second filed 12-2, and a frame 13 following the frame 12 is configured of a first field 13-1 and second field 13-2.
The block matching of tracking processing with such a moving image is generally performed between adjacent fields in the same frame (e.g., between the first field 11-1 and first field 12-1) instead of between consecutive fields (e.g., between the first field 11-1 and second field 11-2). Further, for the sake of reduction in processing cost, or the like, just one of the fields of each frame is subjected to block matching, and the other field is interpolated by employing the values of fields which are adjacent to each other forward and backward (e.g., the average value between previous and following fields is applied). That is to say, either the first field or the second field is set as a field to be subjected to block matching beforehand, and with regard to the field thereof, block matching is performed between the same fields between consecutive frames, and a motion vector is calculated by employing the result thereof, but with regard to the other field, the average value of motion vectors obtained with temporally previous and following adjacent fields, or the like is interpolated.
For example, when assuming that the position of the tracking point at the first field of a certain frame is P(t−1), and motion calculated by block matching is V, a position P′(t−1) at the second field of the frame thereof, and a position P(t) at the first field of the next frame are as in the following Expressions (1) and (2).
                              P          ⁡                      (            t            )                          =                              P            ⁡                          (                              t                -                1                            )                                +          V                                    (        1        )                                                                                                      P                  ′                                ⁡                                  (                                      t                    -                    1                                    )                                            =                            ⁢                                                P                  ⁡                                      (                                          t                      -                      1                                        )                                                  +                                  V                  /                  2                                                                                                        =                            ⁢                                                {                                                            P                      ⁡                                              (                                                  t                          -                          1                                                )                                                              +                                          P                      ⁡                                              (                        t                        )                                                                              }                                /                2                                                                        (        2        )            
Note that, in general, as with a case of a moving image according to the progressive method, block matching is frequently performed by skipping one frame.
Incidentally, as with the case of a movie for example, to convert a moving image according to the progressive method of 24 frames per second (hereafter, referred to as “24p image”) into a moving image according to the interlace method of 60 fields per second, employed for, for example, television broadcasting or the like, is to divide a single frame image into two field images or three field images, lending to the general reference “2-3 pulldown”.
FIG. 2 illustrates an example of a situation of 2-3 pulldown. As shown in FIG. 2, a frame 21 which is a frame image at certain point-in-time within a 24p image is divided into two field images which are 60i images, with a first field 31-1 of a frame 31 as the first field, and a second field 31-2 as the second field. Also, a frame 22 following the frame 21 is similarly divided into three field images which are 60i images, with a first field 32-1 of a frame 32 following the frame 31, and a first field 33-1 of a frame 33 following the frame 32 as the first fields, and a second field 32-2 of the frame 32 as the second field.
Further, a frame 23 following the frame 22 is similarly divided into two field images which are 60i images, with a first field 34-1 of a frame 34 following the frame 33 as the first field, and a second field 33-2 of the frame 33 as the second field. Also, a frame 24 following the frame 23 is similarly divided into three field images which are 60i images, with a first field 35-1 of a frame 35 following the frame 34 as the first field, and a second field 34-2 of the frame 34, and a second field 35-2 of the frame 35 as the second field.
As described above, each frame image within a 24p image is converted into two fields or three fields of a 60i image.
Similarly, to convert a moving image according to the progressive method of 30 frames per second (hereafter, referred to as “30p image”) into 60i images is to divide a frame image into two field images, which is generally referred to as “2-2 pulldown”.
FIG. 3 is a diagram illustrating an example of a situation of 2-2 pulldown. As shown in FIG. 3, a frame 41 which is a frame image at certain point-in-time within a 30p image is divided into two fields which are 60i images, i.e., a first field 51-1 and second field 51-2 of a frame 51. Similarly, a frame 42, frame 43, and frame 44 following the frame 41 within the 30p image are divided into fields which are 60i images, i.e., a first field image 52-1 and second field image 52-2 of a frame 52, a first field image 53-1 and second field image 53-2 of a frame 53, and a first field image 54-1 and second field image 54-2 of a frame 54, respectively.
There are various types of conversion processing as such conversion processing, and for example, there is conversion processing for converting into a moving image according to the progressive method of 60 frames per second (hereafter, referred to as “60p image”) in parallel with 2-3 or 2-2 pulldown without being divided into fields.
In either conversion case, the respective field images (or frame images) of the generated moving image are field images wherein frame images, which are continuous temporally in an original moving image, are (divided into fields, and are) rearranged, which are discontinuous temporally.
For example, with the example in FIG. 1, the first field 11-1 and second field 11-2 images are images at mutually different point-in-time, but the first field 31-1 and second field 31-2 in FIG. 2 have been generated from the same frame image 21, so are images at mutually the same point-in-time. Thus, with a moving image generated by pulldown, consecutive fields images (or frame images) are not necessarily continuous temporally.