1. Field of the Invention
The present invention relates to an image processing apparatus and an image processing system.
2. Description of the Related Art
As a practical use of communication services using digital lines, such as Integrated Services Digital Network (ISDN) lines, is promoted, multimedia information communications using the above-described digital lines, such as video image information, sound information, data, etc., are implemented. In the International Telecommunications Union-Telecommunication Sector (ITU-TS), services definitions, video image coding methods, multimedia multiplexing structures definitions, protocol definitions, and so on, are recommended as International Telecommunication Union-Telecommunication (ITU-T) recommendations. Additionally, high-speed processors which can sufficiently process video images in real time, compression coding/decoding chips, etc. are becoming less expensive. This further makes it possible to accelerate video image communications.
As a typical known communication apparatus of the above type, a videoconference system is available. Videoconferences and videotelepliones through personal computers (hereinafter referred to as "PCs") and workstations (hereinafter referred to as "WSs") are becoming feasible. As the above-described standards are being further promoted, processors are becoming faster, various; types of image-processing chips are becoming less expensive, and PCs and WSs are becoming more powerful and less expensive.
In a videoconference system, a conference usually proceeds by the following typical communication pattern. In addition to sound information communications, portrait images from both communication ends are coded according to coding algorithms, such as the ITIJ-T recommendation H. 261, using the interframe coding or intra-frame coding method, and are mutually transmitted and received. A conference thus proceeds with the participants observing each other via moving pictures. On the other hand, a camera for picking up a document image, which is referred to as "a document camera", is used to transmit and receive the image, whereby the common document can be displayed on both screens.
In most cases, an actual conference proceeds while observing not only the portrait images but also the common documents shared with both communication ends. Accordingly, there is an increasing demand for an improved method of easily transmitting and receiving a clear and sharp document while a conference is under way. The following methods have been attempted for communicating and displaying documents: directly transmitting and receiving a document produced on a PC or a WS in a videoconference system using a PC or a WS, transmitting and receiving a document read by a scanner, and transmitting and receiving still images that have been incorporated with a still video camera, or the like. However, the above-mentioned methods encounter the problems of incompatibility between PCs or WSs, and complexity of operation. Problems may also arise when it is required that ;sections of the displayed document be pointed to or when the documents must be changed. Further, an extra camera and scanner used for this specific purpose increases the costs. Thus, in realizy, these problems hamper the widespread use of the above-described methods. Accordingly, a document camera is best used since it can be simply handled and even three-dimensional images can be transmitted and received. Document cameras include a camera specifically used for this purpose and a camera serving to pick up both portrait and document images.
Hitherto, when a document is transmitted with a document camera or in a document camera mode, it is required to be first set under a document camera table or a document camera, and then, the focus and the zoom ratio of the camera are adjusted and the document is positioned while observing a screen that displays the document image picked up by the camera. However, by using an ordinary camera that outputs video image signals according to the NTSC or PAL system, the A4-size overall document primarily including characters can be displayed, but the characters cannot be sufficiently read because of poor resolution. In order to overcome this drawback, the zoom is regulated to sufficiently read the necessary portion of the document set on the camera table or on the camera. A document image is captured as an ordinary signal according to the NTSC or PAL system and is transmitted as a coded moving picture image, in a manner similar to a portrait image. Alternatively, a document image is captured in a manner similar to the above technique, and is coded and transmitted as a still picture image with the operation by the user. In either case, the image is decoded and displayed at a receiving end in a manner similar to a portrait image.
In a typical CCD camera with 380,000 pixel-resolution, which is at present commercially available, the effective resolution for reading and displaying an image is, in general, approximately 450 lines of horizontal resolution and 350 lines of vertical resolution. In other words, the resolution is calculated in terms of a facsimile machine, which is equivalent to approximately 2 pel/mm (approximately 50 dpi) when an image of an A4-size document is taken. Also, monitors used in offices, for e(example, the latest PCs and WSs, with high resolution, such as 1280 by 1024 or 1024 by 768 dots, are becoming less expensive. If an A4-size document is read by a facsimile machine with 200 dpi resolution, 1728 by 2339 dots can be obtained only one half of this resolution, i.e., 100 dpi, equivalent to the resolution of approximately 800 by 1000 dots, is sufficient to display an A4-size document on a high-resolution display monitor. It has been determined that a document read by a scanner with this resolution is legible.
A video image signal according to the NTSC or the PAL method has an aspect ratio of 4:3, and the A4-size document is 210 mm by 297 mm, which is approximately 4:3. However, although a landscape-positioned document matches the NTSC/PAL video signal, a portrait-positioned document having an aspect ratio of 3:4, is inconsistent with the NTSC/PAL video signal.
An attempt has been made to divide an A4-size document into portions and input them, assuming that a satisfactory image can be obtained judging from the above-described resolution. In reality, however, a camera is mechanically moved to input an image, and only a slight document displacement due to the operator's erroneous operation causes a resulting synthesized image to distort at its boundary. Further, it is necessary to change the image-reading methods depending on whether the A4-size document is located in a landscape or portrait position. A camera should also be controlled differently depending on the size of the document, such as A4, B4 or A5. Additionally, the position of moving the camera should vary depending on where the document is located on the document table. Namely, only the user can execute the above-described camera control. On the other hand, in an apparatus that transmits a document as a still image with the use of a document camera, every time the document camera mode and the portrait camera mode are switched, very complicated operations are entailed: such as not only the changing of the camera modes, but also, the switching of video image-coding transmission modes, the document positioning and the starting operation required for capturing still images, etc. Moreover, in an actual conference using this document camera, in most cases, the user makes a presentation while continuously changing the documents. This type of apparatus requires a complicated and troublesome operation for camera control every time the user changes the documents.
The following problems are encountered in the apparatus of the above conventional type that transmits and receives a document as a moving picture. Since an input video image is incorporated as an ordinary video signal according to the NTSC or PAL system, in an office, an overall manual or a document of the A4 size that is produced by a wordprocessor or the like, is not legible, unless it is written in large characters, because of poor resolution. On the other hand, the following problems are presented in an apparatus that transmits and receives a high-resolution still image by dividing the document into portions and synthesizing them. A large mechanism is required to move the overall camera, which increases the cost. Also, there are variations in the types of documents, such as differences in sizes, for example, B4, A5, B5, etc., and differences in orientations, for example, landscape-positioned documents and portrait-positioned documents. Thus, only users can perform these very complicated and troublesome operations. Further, in an actual conference, the user, in general, makes a presentation using a plurality of documents, and accordingly has to frequently change the documents. In a conventional apparatus that codes and transmits a document as a still image, complicated and troublesome operations are required every time the documents are changed. Consquently, it is thus difficult to use such an apparatus without the aid of an operator skilled in the above-described operation. Further, the operator may forget to switch to the portrait mode from the document mode even though there is no document to be transmitted and may be unaware that the image is no longer transmitted to the other end, or a meaningless image of the document table may continue to be transmitted to the other end.
In an apparatus selectively using the interframe coding and the intra-frame coding methods according to a coding algorithm, such as H261 or the like, an input image with less motion is compressed using the interframe coding method, thereby accomplishing the transmission of highly-compressed images.
Hitherto, since a document image is transmitted while the document is still being prepared for image input operation, i.e., before a document becomes still, an unnecessary image that is difficult to identify is transmitted to the receiving end. Further, since such an unnecessary document has a great motion, it is compressed according to the intra-frame coding method. Then the resulting image, which has not been highly compressed, is uselessly transmitted, thereby wasting transmission costs.
Additionally, hitherto, images are always transmitted in real time even after still images are transmitted. Accordingly, only a slight displacement due to the operator's erroneous operation, though the operator does not intend to remove the document, causes an interframe-coded or intraframe-coded image that is actually the same image as the previous image to be transmitted. This also increases transmission costs.