A camera providing a function enabling an image from the camera located at a remote site to be viewed via a network such as internet is disclosed in Japanese Patent Laid-Open No. 10-040185. Hereinafter, such a camera having a networking function is referred to as a camera server apparatus. In the example of conventional technique, an image from the camera server apparatus can be viewed simultaneously at a plurality of terminal apparatuses such as personal computers, and in addition, the pan and/or tilt angles and the zoom ratio of the camera can be controlled from remote locations using a plurality of terminal apparatuses.
In the case where a plurality of terminal apparatuses are allowed to control one camera in such a camera server apparatus system allowing the camera to be controlled, the right to control only one physically available camera should be mediated. For this, if a concept of control right disclosed in Japanese Patent Laid-Open No. 10-042278 is introduced, the user can control the camera only during a period over which he or she has the control right. On the other hand, a technique of superimposing information on an image from this camera server apparatus is disclosed in Japanese Patent Laid-Open No. 11-196404.
In recent years, due to advancement in technology of cellular phones and portable terminals, there arises the possibility that camera images can be viewed and manipulated from the above apparatuses. However, if the image from the camera server apparatus is to be provided not only to the terminal of the personal computer or the like but also to the portable terminal of the cellular phone or the like, the camera server apparatus needs to have two interfaces for both of these terminals because the portable terminal is different from the terminal of the personal computer or the like in image providing scheme, image format and the like. As a result, the cost of the camera server apparatus is increased. Similarly, a dedicated interface for control the camera from the portable terminal should be provided separately on the camera server apparatus side, resulting in increased complexity and cost of the camera server apparatus.
In addition, an advertisement can not be flexibly superimposed on the image on the camera server apparatus that does not have a function to superimpose an advertisement and the like on the image. If the volume of information to be superimposed on the image is considerably high, retaining superimposed information in the camera server apparatus is a function different from the original function for delivering an image, and thus superimposition of information is not feasible in terms of cost. Furthermore, it is impossible in the conventional technique to superimpose advertisement information on an image to be provided to the cellular phone while superimposing no advertisement information on an image to be provided to the conventional terminal, for example.
In addition, the technique in which a camera located at a remote site is controlled via a network to obtain and display an image is characterized in high degree of freedom as to camera control such as pan, tilt, zoom and backlight correction of the camera. In addition, the television conference system in which images and voices at a plurality of sites are sent and received via a network with the image and the voice combined together as a pair is generally used. In addition, the technique in which the image and sound are played back while they are downloaded via a network is called streaming, and the live delivery technique in which the coding, network delivery, reception and playback of the image and sound are performed at a time is used.
As for the matching of the image with voice, an image sensing apparatus outputting the image and sound with camera parameters matched with sound is described in Japanese Patent Laid-Open No. 11-305318. In addition, an apparatus selecting and outputting the image and sound is described in Japanese Patent Laid-Open No. 08-56326. In addition, an example of the television conference system in which a plurality of sites are connected together, and the switching is made between the image and voice to be used is disclosed in Japanese Patent Laid-Open No. 10-93941.
In a so-called web camera in which a camera located at a remote site is controlled via a network, only the image can be obtained, and no sound is obtained in general. On the other hand, the television conference system allows to send/receive the image and voice in addition to camera control, but employs a method in which the image and voice are inputted in the same bidirectional communication apparatus at the same point due to the utilization purpose. In addition, the destination to which the image and voice are communicated is generally specified on purpose by the user of the terminal.
In addition, in the image streaming technique, one image with sound is delivered to numerous receiving apparatuses, and combining of arbitrary image with arbitrary sound is not normally performed. In addition, the previously disclosed apparatus selecting and combining the image and sound cannot combine an image with arbitrary sound on the network.
In addition, the image delivery system continuously delivering the image via a data transmission medium such as internet and intranet has already been popularized in the society, and is used in a variety of fields such as transmission of live images, indoor and outdoor monitoring and observation of animals and plants.
These image delivery systems use image delivery servers for delivering images, and many of the image delivery servers employ the JPEG coding mode (international standard image coding mode defined by ISO/IEC 10918) as an image coding mode.
On the other hand, coded image data conforming to the JPEG coding mode (JPEG coded data) sent from the image delivery server is received by a client terminal, and is decoded and then displayed on the screen. Since many of currently popularized PCs (personal computers) and PDAs (personal data assistants) have a function for decoding JPEG coded data as a standard function, the PC and PDA are used as client terminals.
In recent years, the cellular phone has sprung into wide use, and for the portable terminal used in Japan, the cellular phone surpasses the notebook PC and PDA in penetration rate. In addition, the function of the cellular phone has been rapidly improved, and the cellular phone compatible with the third generation communication mode recently commercialized in Japan is provided as a standard function with a function for decoding coded data (MPEG4 coded data) conforming to the MPEG4 coding mode (international standard voice and image coding mode defined by ISO/IEC 14496). However, the cellular phone is not normally provided with a function for decoding JPEG coded data, and it is therefore impossible to directly send JPEG coded data from the image delivery server to the cellular phone.
For solving this problem, two methods are presented. The first method is a method in which the image delivery server is modified so that MPEG4 coded data can be sent. In this method, however, the existing image delivery server should be replaced with a new image delivery server, and thus the cost for the replacement is considerably increased in proportion to the number of image delivery servers to be installed.
The second method is a method in which a relay server is installed at some midpoint in the communication path between the image delivery server and the cellular phone, and JPEG coded data is converted into MPEG4 coded data by this relay server. The advantage of this method is that a plurality of image delivery servers are connected to one relay server, whereby the number of relay servers to be installed can significantly be reduced, and thus the cost for installation is significantly reduced.
However, the method in which the relay server is installed has a disadvantage. That is, since the image size normally decodable by the cellular phone is the QCIF (Quarter CIF) size (lateral: 176 pixels; longitudinal: 144 pixels) while the image size normally dealt with by the conventional image delivery server is the QVGA (Quarter VGA) size (lateral: 320 pixels; longitudinal: 240 pixels) or 1/16 VGA size (lateral: 160 pixels; longitudinal: 120 pixels), JPEG coded data of the QVGA size or 1/16 VGA size must be converted into MPEG4 coded data of the QCIF size, and the image quality may be degraded due to this conversion of coded data.
For example, the conventional method of converting the resolution of JPEG coded data is such that as disclosed in Japanese Patent Laid-Open N. 4-229382, the image size is reduced by a factor of laterally m/8 and longitudinally n/8 (m and n are each an integer number equal to or greater than 1 and equal to or smaller than 7) by taking out only lower coefficient components from orthogonal conversion data in one block obtained during processing of JPEG image decoding and subjecting them to inverse orthogonal conversion.
However, conversion from the QVGA size to the QCIF size results in laterally 0.55 times (4.4/8 times) and longitudinally 0.6 times (4.8/8 times), and conversion from the 1/16 VGA size to the QCIF size results in laterally 1.1 times (8.8/8 times) and longitudinally 1.2 times (9.6/8 times). Thus, m nor n is an integer number, and it is thus impossible to perform conversion from the QVGA size or 1/16 VGA size to the QCIF size.
In addition, conventional general methods of converting the image resolution include a method in which the image is thinned out by taking pixels in a fixed ratio (scaledown), a method in which same pixels are repeatedly inserted (scaleup), and a method in which the weighted average value of a plurality of neighboring pixels is calculated to generate a new pixel value. These methods allow the image size to be converted in any ratio. However, these conventional. methods have problems described below with reference to FIGS. 44 to 47.
FIG. 44 shows the correspondence between image areas before and after the QVGA size image is converted into the QCIF size image by the conventional technique. As shown in this figure, an image area of laterally 320 pixels and longitudinally 240 pixels is scaled down to an image area of laterally 176 pixels and longitudinally 144 pixels. It corresponds to a conversion factor of laterally 0.55 times (4.4/8 times) and longitudinally 0.6 times (4.8/8 times) as described previously.
FIG. 45 illustrates the shifting of block border lines caused by the conversion of image size in FIG. 44. In this figure, solid lines show positions of border lines laterally spaced by 8 pixels and longitudinally spaced by 8 pixels, and dotted lines show positions of border lines laterally spaced by 4.4 (=8×0.55) pixels and longitudinally spaced by 4.8 (=8×0.6) pixels. That is, positions of block border lines in the image before conversion are shifted from positions shown by solid lines to positions shown by dotted lines due to the conversion of image size in FIG. 44. Then, the image after conversion is divided again along block border lines in positions shown by solid lines, and is subjected to MPEG4 image coding, and therefore the image obtained after being subjected to MPEG4 image decoding has block border lines in both positions shown by dotted lines and solid lines.
Block border lines in positions shown by dotted lines are created at the time of coding the image by JPEG coding in the image delivery server, and block deformations become more noticeable in positions shown by dotted lines as the compression rate of JPEG coding is increased. In addition, block border lines in positions shown by solid lines are created at the time of coding the image by MPEG4 image coding in the relay server, and block deformations also becomes more noticeable in positions shown by solid lines as the compression rate of MPEG4 image coding is increased.
The communication traffic between the image delivery server and the cellular phone is currently several tens to several hundreds kilobits per second, which is insufficient for transmitting a moving image to move smoothly, and therefore the compression rate of the image is normally set to a high level. Thus, block deformations appear plainly in both positions shown by dotted lines and solid lines shown in FIG. 45, and consequently the quality of images viewed by the user of the cellular phone is significantly reduced.
FIG. 46 shows the correspondence between image areas before and after the 1/16 VGA size image is converted into the QCIF size image by the conventional technique. As shown in this figure, an image area of laterally 160 pixels and longitudinally 120 pixels is scaled up to an image area of laterally 176 pixels and longitudinally 144 pixels. It corresponds to a conversion factor of laterally 1.1 times (8.8/8 times) and longitudinally 1.2 times (9.6/8 times) as described previously.
FIG. 47 illustrates the shifting of block border lines caused by the conversion of image size in FIG. 46. In this figure, solid lines show positions of border lines laterally spaced by 8 pixels and longitudinally spaced by 8 pixels, and dotted lines show positions of border lines laterally spaced by 8.8 (=8×1.1) pixels and longitudinally spaced by 9.6 (=8×1.2) pixels. That is, positions of block border lines existing in the image before conversion are shifted from positions shown by solid lines to positions shown by dotted lines due to the conversion of image size in FIG. 46. Then, the image after conversion is divided again along block border lines in positions shown by solid lines, and is subjected to MPEG4 image coding, and therefore the image obtained after being subjected to MPEG4 image decoding has block border lines in both positions shown by dotted lines and solid lines.
That is, in the case of the 1/16 VGA size image, block deformations occur in both positions shown by dotted lines and solid lines, and consequently the quality of images viewed by the user of the cellular phone is significantly reduced.