This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-187033, filed Jun. 30, 1999, the entire contents of which are incorporated herein by reference.
The present invention relates to a method of describing object region data such that information about an object region in a video is described, an apparatus for generating object region data such that information about an object region in a video is generated, a video processing apparatus arranged to be given an instruction about an object in a video to perform a predetermined process or retrieve an object in a video, and a video processing method therefor.
Hyper media are configured such that related information called a hyper link is given in between mediums, such as videos, sounds or texts, to permit mutual reference. When videos are mainly used, related information has been provided for each object which appears in the video. When the object is specified, related information (text information or the like) is displayed. The foregoing structure is a representative example of the hyper media. The object in the video is expressed by a frame number or a time stamp of the video, and information for identifying a region in the video which are recorded in video data or recorded as individual data.
Mask images have frequently been used as means for identifying a region in a video. The mask image is a bit map image constituted by giving different pixel values between the inside portion of an identified region and the outside portion of the same. A simplest method has an arrangement that a pixel value of xe2x80x9c1xe2x80x9d is given to the inside portion of the region and xe2x80x9c0xe2x80x9d is given to the outside portion of the same. Alternatively, xcex1 values which are employed in computer graphics are sometimes employed. Since the xcex1 value is usually able to express 256 levels of gray, a portion of the levels is used. The inside portion of the specified region is expressed as 255, while the outside portion of the same is expressed as 0. The latter image is called an xcex1 map. When the regions in the image are expressed by the mask images, determination whether or not a pixel in a frame is included in the specified region can easily be made by reading the value of the pixel of the mask image and by determining whether the value is 0 or 255. The mask image has freedom with which a region can be expressed regardless of the shape of the region and even a discontinuous region can be expressed. The mask image must have pixels, the size of which is the same as the size of the original image. Thus, there arises a problem in that the quantity of data cannot be reduced.
To reduce the quantity of data of the mask image, the mask image is frequently compressed. When the mask image is a binary mask image constituted by 0 and 1, a process of a binary image can be performed. Therefore, the compression method employed in facsimile machines or the like is frequently employed. In the case of MPEG-4 in which ISO/IEC MPEG (Moving Picture Experts Group) has been standardized, an arbitrary shape coding method will be employed in which the mask image constituted by 0 and 1 and the mask image using the xcex1 value are compressed. The foregoing compression method is a method using motion compensation and capable of improving compression efficiency. On the other hand, complex compression and decoding processes are required.
To express a region in a video, the mask image or the compressed mask image has usually been employed. However, data for identifying a region is required to permit easy and quick extraction, to be reduced in quantity and to permit easy handling.
On the other hand, the hyper media, which are usually assumed that an operation for displaying related information of a moving object in a video is performed, have somewhat difficulty in specifying the object as distinct from handling of a still image. A user usually has difficulty in specifying a specific portion. Therefore, it can be considered that the user usually aims, for example, a portion in the vicinity of the center of the object in a rough manner. Moreover, a portion adjacent to the object which is deviated from the object is frequently specified according to the movement of the object. Therefore, data for specifying a region is desired to be adaptable to the foregoing media. Moreover, an aiding mechanism for facilitating specification of a moving object in a video is required for the system for displaying related information of the moving object in the video.
As described above, the conventional method of expressing a desired object region in a video by using the mask image suffers from a problem in that the quantity of data cannot be reduced. The method arranged to compress the mask image raises a problem in that coding and decoding become too complicated. What is worse, directly accessing to the pixel of a predetermined frame cannot be performed, causing handling to become difficult.
There arises another problem in that a device for permitting a user to easily instruct a moving object in a video has not been provided.
Accordingly, it is an object of the present invention to provide a method of describing object region data and an apparatus for generating object region data which are capable of describing a desired object region in a video by using a small quantity of data and facilitating generation of data and handling of the same.
Another object of the present invention is to provide a method of describing object region data, an apparatus for generating object region data, a video processing method and a video processing apparatus with which a user is permitted to easily instruct an object in a video and determine the object.
Another object of the present invention is to provide a method of describing object region data, an apparatus for generating object region data, a video processing method and a video processing apparatus with which retrieval of an object in a video can easily be performed.
According to one aspect of the present invention, there is provided a method of describing object region data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, the method identifying a desired object region in a video according to at least either of a figure approximated to the object region or a characteristic point of the object region; approximating a trajectory obtained by arranging positions of representative points of the approximate figure or the characteristic points of the object region in a direction in which frames proceed with a predetermined function; and describing information about the object region by using the parameter of the function.
According to another aspect of the present invention, there is provided a method of describing object region data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, the method describing the object region data by using information capable of identifying at least the frame number of a leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the leading frame and the time stamp of the trailing frame, information for identifying the type of the figure of an approximate figure approximating the object region, and the parameter of a function with which a trajectory obtained by arranging position data of representative points of the approximate figure corresponding to the object region in a direction in which frames proceed has been approximated.
According to another aspect of the present invention, there is provided a method of describing object region data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, the method describing the object region data by using information capable of identifying at least the frame number of a leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the leading frame and the time stamp of the trailing frame, the number of approximate figures approximating the object region, information for identifying the type of the figure of an approximate figure and the parameters of functions with which trajectories corresponding to the approximate figures and obtained by arranging position data of representative points of each approximate figure in a direction in which frames proceed have been approximated.
According to another aspect of the present invention, there is provided a method of describing object region data such that information about an arbitrary object region in a video is described over a plurality of continuous frames, the method describing the object region data by using information capable of identifying at least the frame number of a leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the leading frame and the time stamp of the trailing frame, and the parameter of a function with which a trajectory obtained by arranging position data of characteristic points of the object region in a direction in which frames proceed has been approximated.
Information capable of identifying the frame number of a leading frame and the frame number of a trailing frame of the plurality of the subject frames or the time stamp of the leading frame and the time stamp of the trailing frame is the leading frame number and a trailing frame number or the leading frame number and the difference between the leading frame number and the trailing frame number.
The parameter of the function may be position data of knots of the trajectory and information arranged to be used together with the position data of the knots to be capable of identifying the trajectory. Alternatively, the parameter of the function may be a coefficient of the function.
When a plurality of representative points of the approximate figure of the object region or characteristic points of the object region exist, it is desirable to identify the correspondence between the plural representative points or the characteristic points of the present frame and a plurality of representative points or characteristic points of an adjacent frame.
It is desirable to describe information related to the object or a method of accessing to the related information.
According to another aspect of the present invention, there is provided a recording medium storing object region data containing information about regions of one or more objects described by one of the above methods.
According to another aspect of the present invention, there is provided a recording medium storing object region data containing information about regions of one or more objects described by one of the above methods and information related to each object or information indicating a method of accessing to the related information.
According to another aspect of the present invention, there is provided a recording medium storing object region data containing information about regions of one or more objects described by one of the above methods and information for identifying information related to each object, and information related to each object.
According to another aspect of the present invention, there is provided a video processing method for determining whether or not a predetermined object has been specified in a screen which is displaying a video, the method obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data of representative points of the approximate figure in a direction in which frames proceed when an arbitrary position has been specified in the screen in a case where a region of the predetermined object exists in the video; detecting the position of the representative point in the frame based on the obtained information; detecting the position of the approximate figure in accordance with the detected position of the representative point; determining whether or not the input position exists in the approximate figure; and determining that the predetermined object has been specified when a determination has been made that the input position exists in the approximate figure.
According to another aspect of the present invention, there is provided a video processing method for determining whether or not a predetermined object has been specified in a screen which is displaying a video, the method obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data of characteristic points of the object region in a direction in which frames proceed when an arbitrary position has been specified in the screen in a case where a region of the predetermined object exists in the video; detecting the positions of the characteristic points in the frame in accordance with the obtained information; determining whether or not the distance between the input position and the detected position of the characteristic point is shorter than a reference value; and determining that the predetermined object has been specified when a determination has been made that the distance is shorter than the reference value.
When a determination has been made that the predetermined object has been specified, it is desirable to show information related to the predetermined object.
According to another aspect of the present invention, there is provided a video processing method of displaying a region in which a predetermined object exists when the predetermined object has been specified in a screen which is displaying a video, the video processing method obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data of at least representative points of an approximate figure of the object region or characteristic points of the object region in a direction in which frames proceed when the region of the predetermined object exists in the video; detecting the representative point or the characteristic point in the frame in accordance with the obtained information; and displaying information for displaying the position of the object region in the screen in a predetermined form of display in accordance with the detected representative point or the characteristic point.
According to another aspect of the present invention, there is provided a video processing method for retrieving a predetermined object among objects which appears in a video and which satisfies a predetermined condition, the video processing method inputting an arbitrary position in the video and a retrieving condition determined in accordance with the input position; obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of an object region produced for each object which appears in the video or a characteristic point of the object region in a direction in which frames proceed; determining, for each object over a plurality of frames, whether or not the representative point of the approximate figure or the characteristic point and the input position have a predetermined relationship in one frame of one object obtained in accordance with the obtained information; and detecting the predetermined object satisfying the retrieving condition in accordance with a result of determination.
The predetermined relationship may be the relationship that the input position exists in the approximate figure region or the relationship that the distance from the characteristic point to the input position is shorter than a reference value. The retrieving condition may be a condition of an object which is to be extracted, which is selected from a retrieval condition group consisting of a condition that at least one frame satisfying the predetermined relationship exists at the input position, a condition that the predetermined number of frames each satisfying the predetermined relationship exists successively with regard to the input position and a condition that the predetermined relationship is not satisfied in all of the frames.
The retrieval condition group includes, as a condition which must be added to the condition which is determined in accordance with the position, an attribute condition which must be satisfied by the approximate figure of the object.
According to another aspect of the present invention, there is provided a video processing method for retrieving a predetermined object among objects which appears in a video and which satisfies a predetermined condition, the video processing method inputting information for specifying a trajectory of the position in a video which is to be retrieved; obtaining information describing parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the object region produced for each object which appears in a video and which is to be retrieved or a characteristic point of the object region in a direction in which frames proceed; evaluating, for each object, similarity of the trajectory of the representative point or the characteristic point of the one object detected in accordance with the obtained information and the trajectory of the input position; and detecting the predetermined object corresponding to the specified trajectory.
Information for specifying the trajectory of the position may be time sequence information including the relationship between the position and time. The similarity may be evaluated while the positional relationship is being added.
The specified trajectory may be a trajectory of an object in a video which has been specified. Alternatively, a user may be permitted to input the trajectory by drawing the trajectory on a GUI.
According to another aspect of the present invention, there is provided an object-region-data generating apparatus for generating data about described information of a region of an arbitrary object in a video over a plurality of continuous frames, the object-region-data generating apparatus comprising a circuit configured to approximate an object region in the video in a plurality of the subject frames by using a predetermined figure; a detector configured to detect, in the plural frames, coordinate values of the predetermined number of representative points identifying the predetermined figure which has been used in the approximation; and a circuit configured to approximate a trajectory of a time sequence of the coordinate values of the representative points obtained over the plurality of the continuous frames with a predetermined function, so that information about the object region is generated by using the parameter of the function.
According to another aspect of the present invention, there is provided an object-region-data generating apparatus for generating data about described information of a region of an arbitrary object in a video over a plurality of continuous frames, the object-region-data generating apparatus comprising a detector configured to detect the coordinate values of the predetermined number of characteristic points of an object region in a video over the plurality of the subject frames, and a circuit configured to approximate a time sequential trajectory of the coordinate values of the characteristic points obtained over the plurality of the continuous frames with a predetermined function, wherein the parameter of the function is used to generate information about the object region.
According to another aspect of the present invention, there is provided a video processing apparatus for performing a predetermined process when a predetermined object has been specified in a screen which is displaying a video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the object region in a direction in which frames proceed in a case where a region of a predetermined object exists in the video when an arbitrary position has been specified in the screen to detect the position of the representative point in the frame; a detector configured to detect the position of the approximate figure in accordance with the detected position of the representative point; and a circuit configured to determine whether or not the input position exists in the approximate figure.
According to another aspect of the present invention, there is provided a video processing apparatus for performing a predetermined process when a predetermined object has been specified in a screen which is displaying a video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating a trajectory obtained by arranging position data of a characteristic point of the object region in a direction in which frames proceed in a case where the region of the predetermined object exists in the video when arbitrary position has been specified in the screen to detect the position of the characteristic point in the frame; and a circuit configured to determine whether or not the distance between the input position and the detected position of the characteristic point is shorter than a reference value.
According to another aspect of the present invention, there is provided a video processing apparatus for performing a predetermined process when a predetermined object has been is specified in a screen which is displaying a video, the video processing apparatus comprising a circuit configured to obtain a parameter of a function approximating a trajectory obtained by arranging position data of at least a representative point of an approximate figure of the object region or a characteristic point of the object region in a direction in which frames proceed when the region of the predetermined object exists in the video to detect the representative point or the characteristic point in the frame; and a circuit configured to display information for indicating the position of the object region in the screen in a predetermined display form.
According to another aspect of the present invention, there is provided a video processing apparatus for retrieving a predetermined object among objects which appears in a video and which satisfies an specified condition, the video processing apparatus comprising a circuit configured to obtain information describing parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the object region produced for each object which appears in a video which is to be retrieved or a characteristic point of the object region in a direction in which frames proceed when an arbitrary position in the video which is to be retrieved and a retrieving condition determined in accordance with the position have been input; a circuit configured to determine, for each object over a plurality of the frames, whether or not the approximate figure or the characteristic point of one object in one frame obtained in accordance with the obtained information and the input position satisfy a predetermined relationship; and a detector configured to detect an object which satisfies the retrieving condition in accordance with a result of the determination.
According to another aspect of the present invention, there is provided a video processing apparatus for retrieving a predetermined object among objects which appears in a video and which satisfies an specified condition, the video processing apparatus comprising a circuit configured to obtain information describing parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the object region produced for each object which appears in the video which is to be retrieved or a characteristic point of the object region in a direction in which frames proceed when information for specifying a trajectory of the position in a video which is to be retrieved has been input; a circuit configured to evaluate, for each object, similarity between the trajectory of the representative point or the characteristic point of one object obtained in accordance with the obtained information and the trajectory of the input position; and a detector configured to detect the predetermined object corresponding to the specified trajectory in accordance with the evaluated similarity.
Note that the present invention relating to the apparatus may be employed as the method and the present invention relating to the method may be employed as the apparatus.
The present invention relating to the apparatus and the method may be employed as a recording medium which stores a program for causing a computer to perform the procedure according to the present invention (or causing the computer to serve as means corresponding to the present invention or causing the computer to realize the function corresponding to the present invention) and which can be read by the computer.
The present invention is configured such that the object region in a video over a plurality of frames is described as a parameter of a function approximating a trajectory obtained by arranging position data of representative points of an approximate figure of the object region or a characteristic point of the object region in a direction in which frames proceed. Therefore, the object region in the video over the plural frames can be described with a small quantity of the function parameters. Hence it follows that the quantity of data required to identify the object region can effectively be reduced. Moreover, handling can be facilitated. Moreover, extraction of a representative point or a characteristic point from the approximate figure or generation of the parameter of the approximate curve can easily be performed. Moreover, generation of an approximate figure from the parameter of the approximate curve can easily be performed.
When the representative point of the approximate figure is employed, a fundamental figure, for example, one or more ellipses, are employed such that each ellipse is represented by two focal points and another point. Thus, whether or not arbitrary coordinates specified by a user exist in the object region (the approximate figure) can be determined by using a simple discriminant. Hence it follows that the user is able to easily instruct a moving object in a video.
When the characteristic point is employed, whether or not the arbitrary coordinates specified by a user indicates the object region can considerably easily be determined. Thus, a moving object in a video can easily be specified by the user.
When display of an object region among regions of objects which can be identified by using object region data and which has related information, or display of an image indicating the object region is controlled, the user is permitted to quickly recognize whether or not related information exists and the position of the object region. Therefore, the operation which is performed by the user can effectively be aided.
According to the present invention, retrieval of an object in a video can easily be performed in accordance with a position in a video through which the object passes, residence time at a certain point or a trajectory.
Additional objects and advantages of the present invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present invention.
The objects and advantages of the present invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.