Conventionally, a trimming image is generated by trimming a video image obtained by a video camera, and is used for various purposes. A video image is the view obtained by a cameraman with a video camera, the view from which the cameraman selects a target object consciously or subconsciously, and thus, according to specific empirical rules, determines the composition of the video image. In this case, the area that includes the object aimed at by the cameraman can be defined as the essential portion of a video image.
Discussions have been held concerning methods to be used for identifying, within a video image, an essential portion aimed at by a cameraman, and attempts have been made, including automating camera operations, to develop robot cameras that can automatically detect such a portion. Robot cameras have also been designed wherein sensors, for the detection of images, are provided that can pick up images when no cameraman is present. For this purpose, a method has been discussed whereby line of sight detection and the time-series measurement of camera operation can be used to analyze, for feedback to and control by a robot camera, animation characteristics that enable a cameraman to identify and focus on objects (e.g., “Analysis Of An Image Pickup Method Used By A Broadcast Cameraman When Recording A Moving Object”, NHK Technical Institute R&D, No. 48, pp. 34-46, February 1998).
Further, a method, based on videos recorded using a fixed camera, has been proposed whereby a video that appears to have been directly recorded by a cameraman can be generated by using a digital camera, the operation of which mimics that employed by a cameraman (“Automated Video Generation Using A Digital Camera Operation”, MIRU-2000, vol. I, pp. 331-336).
As a consequence of the development of portable computers and portable terminals, such as cellular phones, for which Internet connection service is provided, a system has been proposed for the distribution, as digital content, of various images. Furthermore, concomitant with the spreading use of the Internet, in addition to the above portable terminals, other computer systems, such as desktop computers or workstations, have come to be widely used for the distribution of images.
The size of image data is generally greater than that of either text data or of hypertext markup language (HTML) data, and since for portable computers or other portable terminals, such as cellular phones, the overhead for the delivery of image data tends to be excessive, memory resources for the downloading of data and for streaming are urgently required. This is especially true for cellular phones, the use of which is spreading, since the color liquid crystal displays mounted on these phones are too small to adequately display even reductions of images obtained using video cameras.
For portable terminals, the frame reproduction rate for video formats MPEG-2 and MPEG-4, both of which provide high compression and coding, is set at 15 frames per second. Thus, when a video image is distributed to a portable terminal, specifically because of a small screen size, a low resolution and a low frame rate, an image that is clearly visible in the original video may not be discernable on the screen. It has been especially pointed out that this inconvenient result will be outstanding for sports videos featuring rapid movement, and it is anticipated that viewing problems will occur in that the movement of objects, balls or players, will not be visualized and understanding of the contents of the video will be lost.
Thus, a demand exists for a process that will either extract only an area aimed at by a cameraman, which constitutes the essential portion of a video image, and enlarge the image sequence, or that will increase the resolution only of a target area while high compression scalable coding is performed for other portions, thereby enabling an image to be discerned, even on a portable terminal. To respond to this demand, it should be ascertained which area in a video image is targeted. However, since the point of aim may differ, depending on the cameraman, the uniform determination of target areas is difficult. Therefore, it is necessary to analyze a video image of a target area focused on by a cameraman, and to extract from the target area an object tracked by the cameraman who obtained the video. This is difficult, in that after a target area in a video image has been designated, an object in the target area should be extracted, i.e., it should be determined which of multiple objects in a video image a cameraman aimed at.
Thus, the removal, to the extent possible, of unnecessary portions of an obtained video image and the delivery to a portable terminal of only a predetermined area, including an object a cameraman was aiming at, are required, so that only a small amount of data are needed to deliver the essential portion of a video image that can be presented using a display having a limited area.
It is also necessary, to improve delivery and display speeds and to save on memory and network resources, for a video image for which trimming of the essential portion has been performed to be transmitted to computers.
In addition, especially for a portable telephone for which the display size and the memory resources are limited, efficient data distribution is required to enlarge only the essential portion of a video image.