The present invention relates to methods and apparatuses for compressing, processing, transmitting, and receiving data representing multiple views of an object. More particularly, the present invention relates to methods and apparatuses for compressing, processing, transmitting, and receiving multiple views of an object over a network of computer systems.
Digital processing systems, such as conventional computer systems, can often display various different views of an object on a display device which is coupled to the digital processing system. In many such systems, the user of the system may manipulate the object in such a way to see various views of the object. The views, in one example, may be considered to be obtained from the surface of a virtual sphere which surrounds the object. FIG. 1A shows a virtual sphere 100 which surrounds an object 101. The different views of the object may be considered to be taken from various points on the surface of the virtual sphere 100. The virtual sphere 100 includes an equator 12 and a meridian or longitudinal line 14. Point 15B represents the North Pole of the virtual sphere and point 15A represents the South Pole of the virtual sphere. Point 16, 17, 18, and 19 on the equator 12 represent the locations 0°, 90°, 180°, and 270° respectively along the equator. If the view at point 16 along the equator 12 is considered to be a front view of the object 101, which is shown as a house, then the view from point 18 is a rear view while views from point 17 and 19 are views of the right and left sides respectively. A view from the North Pole shows the roof of the house, and a view from the South Pole shows the bottom of the house.
Various methods exist in the prior art for manipulating such an object in order to see various views of the object. For example, U.S. Pat. No. 5,019,809 by Michael Chen describes a method for direct manipulation of an object by using a two dimensional cursor control device, such as a mouse, to simulate three-dimensional movement over the surface of a virtual sphere in order to see views of the object which is surrounded by the virtual sphere. Other methods, such as the use of sliders displayed on the screen or physical, mechanical sliders which may be manipulated by a user are also well known in the art. These various techniques allow a user to rotate or otherwise manipulate the object in order to see various different views of the object.
It is well known in the art that these views may be used to make a sequence of views which appears to be a movie. Typically, these views are displayed in a particular sequence which makes the object appear to be smoothly rotating. For example, the house 9 at the center of the virtual sphere 100 may appear to rotate on an axis defined by the north and South Pole. This “movie” is merely the playback of various selected or all of the views of the object taken along the equator 12 in sequence from point 16, through points 17, 18, and 19 back to point 16. This “movie” may be further enhanced by providing views at different latitudes.
FIG. 1B shows an example of the various views which may be provided at each selected latitude. FIG. 1B includes rows 21 through 33, each of which specify at least four longitudinal views; in the case of latitude zero (along the equator), the views are from points 16, 17, 18, and 19 of FIG. 1A. It will be appreciated that additional views may be obtained and stored to provide greater resolution along each latitude. For example, views at every 5° or 10° along each latitude provides great resolution of the object and also makes any “movie” seem more realistic. It will also be appreciated that additional views along additional latitudes may be stored in order to provide greater resolution in the north and south directions.
Table 35 of FIG. 1B represents a typical way in the prior art in which the various views are stored and transmitted between systems. Essentially, the views are stored in circular passes of the object at various vertical levels along the north/south axis. Typically, the physical arrangement of the data in a storage device reflects a similar arrangement of the data, which arrangement is often the manner in which the data is originally captured from the object. For example, a camera may be positioned at each of the different viewpoints in series and the data from the camera may be stored in this order such that there are essentially circular passes of the object at various vertical levels which are captured and stored on a storage device, such as hard disk or other computer readable media.
On a storage device which has random access capabilities and which provides reasonably fast rates of data retrieval, this storage arrangement provides adequate data rates such that a “movie” may be displayed from these various views. However, if this data is stored in a remote location and is accessed through a network or through a slow input/output port, then storage of this data in this arrangement does not provide adequate or satisfactory display of the object, particularly when the object is to be displayed as a “movie” which may be referred to as an “object movie”. This often happens in the case of transmission of objects through the Internet or other networks.
One major obstacle for using these types of object movies, especially with a three-dimensional object movie, in which there are multiple views of the object, is the extremely large amount of data associated with them. To be able to transmit, store, or export the sequences of the object movies, substantial compression of the data must be accomplished. It is well known in the art that data compression is a translation of data (e.g., still images, video, audio, digital or combination) using a variety of computer compression algorithms and other techniques to reduce the amount of data required to accurately represent the content of the data.
There are at least two ways for compressing object movies, compressing every frame individually and compressing based on frame differencing. Compressing every frame individually is the same as compressing still images. For instance, JPEG compression method is one way of compressing a still image and because of that, much more space is required. Compressing based on frame differencing is accomplished by first compressing a key frame using still image compression; obtaining a delta frame, which is the difference between the current frame and the previous frame; and optionally, compressing the delta frame. Such compression continues for several subsequent frames wherein each of the subsequent frames is compared to the previous frame and a delta frame is obtained. This is typically referred to as a linear compression model.
Compression of an object movie with multiple views can be done utilizing a linear sequence compression using frame differencing compression method. FIG. 1C illustrates that a current video compression technology 102 assumes that each frame (e.g., each view) of the object movie is arranged is a linear way and compression is linearly accomplished in one direction. Each frame of the object movie represents a view of the object wherein the multiple views of the object can be taken using the method of capturing multiple views of an object 101 described above. The frames of the object movie can be arranged in a two-dimensional array of images as shown in FIG. 1C. As shown by the arrows pointing in direction A, compression is performed in order from frame 1 through frame 25, assuming that the object movie has 25 frames.
To compress a video sequence, for instance, with a video sequence that starts with frame 1 and ends with frame 25, the video sequence can be arranged as shown in FIG. 1C. There are five rows in this arrangement, row 102-a, 102-b, 102-c, 102-d, and 102-e. Using frame differencing compression, a compressor usually starts from a key frame, in this example, frame 1 in row 102-a, and performs a frame differencing compression. The compressor first compresses frame 1, then, based on the difference between the current frame, frame 2, and the previous frame, frame 1, a delta frame is compressed. This event is repeated until all of the frames in row 102-a are compressed. The compressor will then continue to compress row 102-b, 102-c, 102-d and then 102-e in that order in the same manner as was done for row 102-a. (See arrows A). The number of key frames in a video sequence may be chosen by the compressor, for instance, when there is a big enough difference between two frames, the compressor will assign a key frame. Alternatively, the key frame can be defined, for instance, with a command that assigns a key frame every five or ten frames in the sequence. One advantage for this compression is that the delta frame is usually smaller in size compared to the key frame, given there is much similarity between video frames.
FIG. 1E summarizes the current compression method 100-a for an object movie discussed above. Here, step 104 is used to capture images or views of the object (e.g., object 101 above) of the movie object at various perspectives. In step 106, the frames representing the images are arranged and stored in a linear sequence, for example, a two-dimensional array of images. In step 108, the key frame or key frames for the video sequence is determined, for instance, by assigning a key frame to an image or a view when the image is the first frame of the sequence or, by assigning a key frame to an image or a view when there is a big enough difference between consecutive images or views. Finally in step 110, a compression method is applied to the video sequence, for example, frame differencing. The compression method is linear in compression direction in that it is compressing only in one direction.
Object movies may comprise several views hence, numerous frames. For instance, an object movie typically has hundreds of frames and even more depending on horizontal resolution (e.g., thirty pictures for each row horizontally and with eighteen rows in total, the object movie has a total of five-hundred-forty frames). The ability to enable random accessing during a user interactive experience is particularly in demand with object movies. For example, the user may wish to select views of the top the sides of the object 101 above and skip some other views. The user may also wish to designate the sequence of playback which means that the user must be allowed to access any frame in any random order. However, random accessing of frames in the current object movies compressed under the current compression method is extremely slow, lengthy, complex, tedious, and troublesome.
For the user to access a particular view, the frame to that view must be decompressed. Under the compression method 102, decompression must always start with the key frame associated with the particular frame of that view followed by decompression of as many delta frames as necessary to get to that particular view. For example, FIG. 1D illustrates that to access frame 7, the key frame, frame 1, must be decompressed. Then, frame 2 must be decompressed next, i.e., the delta frame between frame 2 and frame 1 must be decompressed. Then, frame 3 must be decompressed after frame 2, i.e., the delta frame between frame 3 and frame 2 must be decompressed. The decompression continues for frames 4, 5, and 6, or as many frames as necessary to get to frame 7. This sequence assumes that there is only one key frame, frame 1, between frame 1 and frame 7. Depending on the complexity and the similarities (or lack thereof) between the images taken for each object, there may be more or less key frames. As can be seen, one key frame and six delta frames needed to be decompressed before frame 7 can be decompressed and accessed. Similarly, if frame 25 needed to be accessed, one key frame and twenty-four delta frames needed to be decompressed before frame 25 can be decompressed and accessed.
Because linear compression is typically a one-direction compression, decompression is slow and not optimized. As illustrated, numerous steps of decompression are thus necessary thereby slowing down the random access interactivity. Appointing more key frame in a video sequence can minimize steps of decompression. However more key frames means that the compressed file will be bigger resulting in slow compression rate, transmission rate and exporting rate. Furthermore, the data will be more costly to generate.
It is thus desirable to have compression methods that enable quick and simple decompression step while keeping the cost of the method low.
Demands for an efficient compression and decompression rate have grown even more. The modern trend is that users typically request for views of an object over the Internet. FIG. 2A shows several computer systems which are coupled together through the Internet 103. It will be appreciated herein that the term “Internet” refers to a network of networks which uses certain protocols (e.g. the TCP/IP protocol and possibly other protocols such as HTTP (hypertext transfer protocol) for HTML (hypertext markup language) documents). The physical connections of the Internet and the protocols and communication procedures of Internet are well known to those in the art. Access to the Internet 103 is typically provided by Internet service providers (ISP's) such as ISP's 105 and 107. Users on client systems, such as client computer systems 121, 125, 135, and 137 obtain access to the Internet through the Internet service providers. Access to the Internet allows users of the client computer systems to exchange information, to receive and send e-mails, and to view and manipulate these objects as they are received. For example, web server system 109 may contain data representing the object 101 shown in FIG. 1A and provide this data to a client computer system such as client system 121 upon request by the client system 121. Often these web servers are provided by ISPs, such as ISP 105, although a computer system may be set up and connected to the Internet without that system also being an ISP as is well known in the art.
The web server system 109 is typically at least one computer system which operates as a server computer system and is configured to operate with the protocols of the World Wide Web (WWW) and is coupled to the Internet. Optionally, the web server 109 may be part of an ISP which provides access to the Internet for client systems. The web server 109 is shown coupled to other computers in the Internet 103. Client computer systems 121, 125, 135, and 137 may each, with the appropriate web browsing software, view HTML pages provided by the web server 109. These web pages may provide movies, such as QuickTime movies, which may be viewed by users of the particular client computer system.
The ISP 105 provides Internet connectivity for the client computer system 121 through the modem interface 123 which may be considered part of the client computer system 121. The client computer system may be a conventional computer system such as a Macintosh computer, a “network” computer, a Web TV system, or other types of digital processing systems, such as a cellular telephone having digital processing systems or capabilities. Similarly the ISP 107 provides Internet connectivity for client systems 125, 135, and 137, although as shown in FIG. 2A, the connections are not the same for these three computer systems. Client system 125 is coupled through a modem interface 127 while client computer systems 135 and 137 are part of a Local Area Network (LAN). While FIG. 2A shows the interfaces 123 and 127 as a modem, it will be appreciated that each of these interfaces may be an analog modem, an ISDN modem, a cable modem, a satellite transmission interface (e.g. “Direct PC”), or other interfaces for coupling a computer system or a digital processing system to other digital processing systems. Client computer systems 135 and 137 are coupled to a LAN bus 133 through network interfaces 139 and 141 which may be an Ethernet network interface or other network interfaces. The LAN bus is also coupled to a gateway computer system 131 which may provide firewall and other Internet related services for the local area network. This gateway computer system 131 is coupled to, the ISP 107 to provide Internet connectivity to the client computer systems 135 and 137. The gateway computer system 131 may be conventional server computer system. Also, the web server system 109 may be a conventional server computer system.
Even with modern, high-speed analog modems, data transmission rates through the Internet are often painfully slow. Thus, a user of a client system may request various views representing an object to allow the user to inspect the object or to manipulate the order of viewing the object. This request will be processed by a server system or some other digital processing system and the data will be transmitted to the requesting client system. This data will be transmitted to the client system and decompressed, for example, in the order shown in FIGS. 1C-1D which is typically also the same order used to play back a movie of the object. For example, a series of views along the equator beginning at 0° and progressing consecutively at 5° increments back to 0° may be transmitted from the server system to a client system. The user may request views in any particular order along the virtual sphere surrounding the object. Decompressing each of these views, which are often high-resolution digital data, so that these data can be transmitted, can take a considerable amount of time.
The random access of the data for the “object movie” requires sequential decompression of all the prior frames as illustrated in FIG. 1D. The user must patiently wait for the completion of the decompression of all of the other prior frames for each of the selected frames. The interactive experience is thus painfully slow. Therefore, it is desirable to provide methods and apparatuses for improved compression and transmission of data representing views of an object.