1. Description of Prior Art
In the last few years, there have been tremendous advances in the speed of computer processors and in the availability of bandwidth of worldwide computer networks such as the Internet. These advances have led to a point where businesses and households now commonly have both the computing power and network connectivity necessary to have point-to-point digital communications of audio, rich graphical images, and video. However the transmission of video signals with the full resolution and quality of television is still out of reach. In order to achieve an acceptable level of video quality, the video signal must be compressed significantly without losing either spatial or temporal quality.
2. Field of the Invention
This invention relates to handheld devices for video transmission, including video capture, wired and wireless file transfer and live streaming, and display. Embodiments of the invention relate to data compression, specifically to the compression and decompression of video and still images, and relate to graphical user interfaces for controlling video transmission and display.
A number of different approaches have been taken but each has resulted in less than acceptable results. These approaches and their disadvantages are disclosed by Mark Nelson in a book entitled The Data Compression Book, Second Edition, published by M&T Book in 1996. Mark Morrision also discusses the state of the art in a book entitled The Magic of Image Processing, published by Sams Publishing in 1993.
Video Signals
Standard video signals are analog in nature. In the United States, television signals contain 525 scan lines of which 480 lines are visible on most televisions. The video signal represents a continuous stream of still images, also known as frames, that are fully scanned, transmitted and displayed at a rate of 30 frames per second. This frame rate is considered full motion.
A television screen has a 4:3 aspect ratio.
When an analog video signal is digitized each of the 480 lines is sampled 640 times, and each sample is represented by a number. Each sample point is called a picture element, or pixel. A two dimensional array is created that is 640 pixels wide and 480 pixels high. This 640×480 pixel array is a still graphical image that is considered to be full frame. The human eye can perceive 16.7 thousand colors. A pixel value comprised of 24 bits can represent each perceivable color. A graphical image made up of 24-bit pixels is considered to be full color. A single, second-long, full frame, full color video requires over 220 millions bits of data.
The transmission of 640×480 pixels×24 bits per pixel times 30 frames requires the transmission of 221,184,000 million bits per second. A T1 Internet connection can transfer up to 1.54 million bits per second. A high-speed (56 Kb) modem can transfer data at a maximum rate of 56 thousand bits per second. The transfer of full motion, full frame, full color digital video over a T1 Internet connection, or 56 Kb modem, will require an effective data compression of over 144:1, or 3949:1, respectively.
A video signal typically will contain some signal noise. In the case where the image is generated based on sampled data, such as an ultrasound machine, there is often noise and artificial spikes in the signal. A video signal recorded on magnetic tape may have fluctuations due the irregularities in the recording media. Florescent or improper lighting may cause a solid background to flicker or appear grainy. Such noise exists in the real world but may reduce the quality of the perceived image and lower the compression ratio that could be achieved by conventional methods.
Basic Run-Length Encoding
An early technique for data compression is run-length encoding where a repeated series of items are replaced with one sample item and a count for the number of times the sample repeats. Prior art shows run-length encoding of both individual bits and bytes. These simple approaches by themselves have failed to achieve the necessary compression ratios.
Variable Length Encoding
In the late 1940s, Claude Shannon at Bell Labs and R. M. Fano at MIT pioneered the field of data compression. Their work resulted in a technique of using variable length codes where codes with low probabilities have more bits, and codes with higher probabilities have fewer bits. This approach requires multiple passes through the data to determine code probability and then to encode the data. This approach also has failed to achieve the necessary compression ratios.
D. A. Huffman disclosed a more efficient approach of variable length encoding known as Huffman coding in a paper entitled “A Method for Construction of Minimum Redundancy Codes,” published in 1952. This approach also has failed to achieve the necessary compression ratios.
Arithmetic, Finite Context, and Adaptive Coding
In the 1980s, arithmetic, finite coding, and adaptive coding have provided a slight improvement over the earlier methods. These approaches require extensive computer processing and have failed to achieve the necessary compression ratios.
Dictionary-Based Compression
Dictionary-based compression uses a completely different method to compress data. Variable length strings of symbols are encoded as single tokens. The tokens form an index to a dictionary. In 1977, Abraham Lempel and Jacob Ziv published a paper entitled, “A Universal Algorithm for Sequential Data Compression” in IEEE Transactions on Information Theory, which disclosed a compression technique commonly known as LZ77. The same authors published a 1978 sequel entitled, “Compression of Individual Sequences via Variable-Rate Coding,” which disclosed a compression technique commonly known as LZ78 (see U.S. Pat. No. 4,464,650). Terry Welch published an article entitled, “A Technique for High-Performance Data Compression,” in the June 1984 issue of IEEE Computer, which disclosed an algorithm commonly known as LZW, which is the basis for the GIF algorithm (see U.S. Pat. Nos. 4,558,302, 4,814,746, and 4,876,541). In 1989, Stack Electronics implemented a LZ77 based method called QIC-122 (see U.S. Pat. No. 5,532,694, U.S. Pat. No. 5,506,580, and U.S. Pat. No. 5,463,390).
These lossless (method where no data is lost) compression methods can achieve up to 10:1 compression ratios on graphic images typical of a video image. While these dictionary-based algorithms are popular, these approaches require extensive computer processing and have failed to achieve the necessary compression ratios.
JPEG and MPEG
Graphical images have an advantage over conventional computer data files: they can be slightly modified during the compression/decompression cycle without affecting the perceived quality on the part of the viewer. By allowing some loss of data, compression ratios of 25:1 have been achieved without major degradation of the perceived image. The Joint Photographic Experts Group (JPEG) has developed a standard for graphical image compression. The JPEG lossy (method where some data is lost) compression algorithm first divides the color image into three color planes and divides each plane into 8 by 8 blocks, and then the algorithm operates in three successive stages:                (a) A mathematical transformation known as Discrete Cosine Transform (DCT) takes a set of points from the spatial domain and transforms them into an identical representation in the frequency domain.        (b) A lossy quantization is performed using a quantization matrix to reduce the precision of the coefficients.        (c) The zero values are encoded in a zig-zag sequence (see Nelson, pp. 341-342).        
JPEG can be scaled to perform higher compression ratio by allowing more loss in the quantization stage of the compression. However this loss results in certain blocks of the image being compressed such that areas of the image have a blocky appearance and the edges of the 8 by 8 blocks become apparent because they no longer match the colors of their adjacent blocks. Another disadvantage of JPEG is smearing. The true edges in an image get blurred due to the lossy compression method.
The Moving Pictures Expert Group (MPEG) uses a combination of JPEG based techniques combined with forward and reverse temporal differencing. MPEG compares adjacent frames and, for those blocks that are identical to those in a previous or subsequent frame, only a description of the previous or subsequent identical block is encoded. MPEG suffers from the same blocking and smearing problems as JPEG.
These approaches require extensive computer processing and have failed to achieve the necessary compression ratios without unacceptable loss of image quality and artificially induced distortion.
QuickTime: CinePak, Sorensen, H.263
Apple Computer, Inc. released a component architecture for digital video compression and decompression, named QuickTime. Any number of methods can be encoded into a QuickTime compressor/decompressor (codec). Some popular codec are CinePak, Sorensen, and H.263. CinePak and Sorensen both require extensive computer processing to prepare a digital video sequence for playback in real time; neither can be used for live compression. H.263 compresses in real time but does so by sacrificing image quality resulting in severe blocking and smearing.
Fractal and Wavelet Compression
Extremely high compression ratios are achievable with fractal and wavelet compression algorithms. These approaches require extensive computer processing and generally cannot be completed in real time.
Sub-Sampling
Sub-sampling is the selection of a subset of data from a larger set of data. For example, when every other pixel of every other row of a video image is selected, the resulting image has half the width and half the height. This is image sub-sampling. Other types of sub-sampling include frame sub-sampling, area sub-sampling, and bit-wise sub-sampling.
Image Stretching
If an image is to be enlarged but maintain the same number of pixels per inch, data must be filled in for the new pixels that are added. Various methods of stretching an image and filling in the new pixels to maintain image consistency are known in the art. Some methods known in the art are dithering (using adjacent colors that appear to be blended color), and error diffusion, “nearest neighbor”, bilinear and bicubic.
Portable Hand Held Devices: Pen-Based Computers and PDAs
In the early 1990s, a number of pen based computers were developed. These portable computers were characterized by a display screen that could be also used as an input device when touched or stroked with a pen or finger. For example in 1991, NCR developed a “notepad” computer, the NCR 3125. Early pen-based computers ran three operating systems: DOS, Microsoft's Windows for Pen Computing and Go Corp.'s PenPoint. In 1993, Apple developed the Newton MessagePad, an early personal digital assistant (PDA). Palm developed the Palm Pilot in 1996. Later, in 2002, Handspring released the Treo which runs the Palm OS and features a Qwerty keyboard. In 2000, the Sony Clie, used the Palm OS and could play audio files. Later versions included a built-in camera and could capture and play Apple QuickTime™ video. Compaq (now Hewlett Packard) developed the iPAQ in 2000. The iPAQ and other PocketPCs run a version of Windows CE. Some PocketPC and PDA have wireless communication capabilities.
In 2001, Apple released a music player, called the iPod, featuring a small, internal hard disk drive that could hold over 1000 songs and fit in your pocket. The original iPod has a display, a set of controls, and ports for connecting to a computer, such as a Macintosh or PC, via Firewire, and for connecting to headphones. However, the original iPod did not have a color display, a built-in camera, built-in speakers, built-in microphone or wireless communications.
Portable Hand Held Devices: Cell Phone and Picture Phones
The first cellular telephones had simple LCD displays suitable for displaying only a limited amount of text. More recently, cell phones have been developed which have larger, higher resolution displays that are both grayscale and color. Some cell phones have been equipped with built-in cameras with the ability to save JPEG still photos to internal memory. In April 2002, Samsung introduced a cell phone with a built-in still photo camera and a color display. The Samsung SCH-X590 can store up to 100 photos in its memory and can transfer still photos wirelessly.
Cell phones can be used as wireless modems. Initially they had limited data bandwidth. Next, digital cell phones were developed. By early 2002, bandwidth was typically 60-70 Kbps. Higher bandwidth wireless networks are being developed.
Hand Held Devices are Limited is Size and Weight
Hand held devices are limited in size and weight. Many users are only willing to use a handheld device that weights a few ounces and can fit inside a typical shirt pocket, or even worn on their waist or arm. These size and weight limitation prevent handheld devices from having the electronic circuitry, processors, and batteries found in laptops and other larger computers. These limitations have made it impossible to provide full frame, full motion video display or live transmission on handheld devices.
PDAs, PocketPCs, and Picture Phones are Limited by Battery Life, Processor Speed, and Network Bandwidth
The existing, commercially available hand held devices have not been able to support live or streaming video for a number of reasons. Uncompressed full-motion, full frame video requires extremely high bandwidth that is not available to handheld portable devices. In order to reduce the bandwidth, lossy compression such as MPEG has been used to reduce the size of the video stream. While MPEG is effective in desktop computers with broadband connections to the Internet, decoding and displaying MPEG encoded video is very processor intensive. The processors of existing handheld devices are slower or less powerful than those used in desktop computers. If MPEG were used in a handheld device, the processor would quickly drains the battery of most handheld devices. Further, the higher bandwidth wireless communications interfaces would also place a large strain on the already strained batteries. Live video transmission and reception would be even more challenging. For this reason, handheld device have not been able to transmit or receive streaming, or especially, live video.
What is needed is an enhanced handheld device that is capable of receiving streaming and live video. Further a handheld device that could capture and transmit live video would provide live coverage of events that would otherwise not be able to be seen. With handheld video devices that both transmit and receive live video, handheld wireless videoconferencing could become a reality. Also a video compression method that requires significantly reduced processing power and would be less draining on the battery of a handheld device is needed. Additionally since, handheld video display screens which are smaller than typical computer screens, a user of a handheld video receiver needs to be able control the portion of a video be transmitted to allow a smaller, higher quality video to be received and viewed on the handheld screen with dimensions smaller than the original video.