Several products support the capture and storage of handwritten information in the form of electronic ink. As used herein, electronic ink is a sequence of inked strokes, each stroke being a sequence of (x,y) coordinate pairs measured by a digitizer tablet at a constant sample rate. Numerous digitizing devices are known in the art. One such device, for example, has a sample rate of 100 Hz, the coordinate pairs are given in units of milli-inches and are represented by 16 bit integers. For this exemplary device, raw representation required roughly 1 kB to store a typical handwritten word.
For hand-held products the limitations on memory size require some data compression of the electronic ink to allow storage of a useful volume of notes and sketches. Data compression systems are known in the prior art that encode digital data into compressed digital code and decode the compressed digital code back into the original digital data. Data compression refers to any process that attempts to convert data in a given format into an alternative format requiring less space than the original. The objective of data compression systems is to effect a saving in the amount of storage required to hold or the amount of time required to transmit a given body of digital information. Data compression systems can be divided into two major categories; loss-less and lossy.
To be of practical utility, a data compression system should satisfy certain criteria. The loss-less system should have reciprocity. In order for a loss-less data compression system to possess the property of reciprocity it must be possible to reexpand or decode the compressed data back into its original form without any alteration or loss of information. The decoded and original data must be identical and indistinguishable with respect to each other. A lossy data compression system may allow some alteration or loss of information during the compression de-compression process provided the overall perception of the data is unchanged.
Loss-less compression of electronic ink is certainly possible. However, from the point of view of ink which is only required to be rendered to a screen there is significant redundant information in the ink which can be discarded. Given the requirement for high compression ratios a lossy compression technique is most suitable.
The data compression systems should provide sufficient performance with respect to the data rates provided by and accepted by the devices with which the data compression and de-compression systems are communicating. Performance of electronic ink application is of great importance because generally, the electronic application is used on small computers with relatively modest CPU power. Even systems of greater computing power might be significantly slowed if a complex compression technique must be applied to each stroke.
Another important criterion in the design of data compression and de-compression systems is compression effectiveness, which is typically characterized by the compression ratio. The compression ratio is generally defined as the ratio of data size in uncompressed for divided by the size in compressed form. In order for data to be compressible, the data must contain redundancy. Compression effectiveness is determined by how effectively the compression procedure uses the redundancy in the input data.
An electronic ink application must balance compression effectiveness against performance and ink degradation. Relatively high compression ratios are possible because the electronic ink has several sources of redundancy that allow for compression.
Published International Patent Application WO 94/03853 discloses a method and apparatus for the compression of electronic ink in which extrema points are saved to preserve fidelity. To reduce the number of points stored between successive extrema, each point is tested for local curvature and points with low curvature are discarded. This local test risks distorting ink which turns smoothly over an extended stroke, in which case there may be no point which triggers the curvature test but the cumulative curvature is enough to introduce kinks.
In the exemplary device, the constant sampling frequency (100 Hz) allows the capture of dynamic information on the pen movements. However, the sampling frequency is well above the Nyquist limit for handwriting and the data could be down-sampled by a factor of 2-4 and still retain the full dynamic information. The dynamic information is very important for signature verification and is useful in some approaches to handwriting recognition and scribble matching. However, the dynamic information is unnecessary for the purposes of rendering the ink trace onto a display device.
The use of absolute coordinate positions in the default ink representation allows the pen position to vary by the complete range between successive 10 ms samples. This is about two orders of magnitude higher than the maximum slew rate the human hand achieves. In normal handwriting the velocity peaks are another order of magnitude lower (2-5 in/s). Furthermore, the pen trajectory between velocity minima is a smooth, slowly varying curve and so the location of the next sample point is reasonably predictable using linear prediction, curve fitting or a dynamic model. Thus, a coding of the model together with the deviations from that model can offer further compression, though in practice knot points occur with sufficient density (thus resetting the models) that the saving is modest.
Both the digitizer and display resolutions are significantly below the standardized milli-inch resolution of the internal pen data. Reducing the stored resolution to match a known specific digitizer/display setup or to simply use a lower standardized resolution (e.g. 300 dpi) offers some small additional saving.