This invention relates to a data encoding system, and more particularly but not exclusively to an encoding system for encoding, transmitting and decoding image data.
Encoding systems for encoding, transmitting and decoding image data, for example facsimile systems, are well known in the prior art. These systems comprise transceivers which are linkable to one another through one or more serial communication channels, for example a telephone link. In one of these systems, an image to be communicated from a first transceiver is conveyed therefrom as a stream of information through one or more channels to a second transceiver whereat the image is received and then reproduced. The image is partitioned at the first transceiver into a series of parallel image bands which are scanned to provide a sequence of data packets in which each band is represented by a corresponding data packet. End of line (EOL) data are inserted between each of these packets to punctuate them and thereby provide composite encoded data suitable for transmission. The packets are not each individually identifiable by an address reference defining their corresponding band position within the image but are arranged relative to one another in a sequence in which the bands are abuttable to form the image. In the sequence, the packets are said to be relatively addressed by their position therein.
These systems suffer from a problem that relatively addressed data loses spatial accuracy if EOL data has been lost as a result of data corruption. Moreover, synchronisation problems may also result when EOL data following corrupted data are not reliably recognised. Data corruption may render a received image unintelligible.
The probability of a data corruption occurring in the systems described above increases as transmission duration increases. For example, an image communicated by facsimile at standard CCITT (Consultative Committee on International Telegraph and Telephone) resolution may involve transfer of data representing approximately two million bits of information. This is described in a book xe2x80x9cFAX: Facsimile Technology and Applications Handbookxe2x80x9d ISBN 0 89006 495 4 McConnell, Bodson and Schaphorst 1992. If these data are communicated in uncompressed form through a communication channel at a rate of 2400 bits per second (bps), data transmission duration will be approximately fourteen minutes. Data corruption during such an interval is likely to occur in systems in which fading and interference phenomena are experienced over shorter timescales than this.
Restricted communication bandwidth limiting information communication rates to approximately 2400 bps is particularly characteristic of high frequency (HF) radio systems which operate by emitting and receiving electromagnetic radiation in a frequency range of 3 to 30 MHz. Such radio systems are prone to transmission problems such as interference, signal fading and multipath effects which may result in errors being introduced into information conveyed through them; this is particularly pertinent when transmission durations are long. Despite the problems, HF radio systems provide an important advantage of beyond line of sight communication and are presently employed, for example, in maritime applications. Techniques for coping with the transmission problems are clearly important for such systems.
Current compression techniques for reducing problems of data corruption when transmitting packets of data through error prone channels rely on using compression algorithms for decreasing redundancy in the data, thereby reducing data transmission duration . Robustness of compressed data thereby generated to transmission errors is further increased by adding error control code data to it. Such techniques decrease image transmission time although inclusion of the control code data tends to offset data size reduction benefits arising from data compression. Although such techniques are effective for removing occasional errors occurring during data transmission, a problem arises when errors occur more frequently than the control codes are able to compensate. This results in extensive damage to the compressed data on account of its reduced redundancy. These excess errors render an image conveyed to the receiving transceiver possibly unintelligible and, at best, flawed.
The error control codes described above include forward error correction (FEC) codes incorporating parity bits. Automatic repeat request (ARQ) codes are sent in reply from a second transceiver receiving data to a first transceiver transmitting the data when the data are corrupted during transmission to instruct the first transceiver to retransmit the data. In the case of prior art facsimile systems, ARQ codes returned when transmission errors have occurred invoke retransmission of an entire image to which the ARQ codes relate. Retransmission of parts of the image is not possible in these prior art systems because they are devoid of facilities for relating isolated retransmitted parts of the image together.
In a modified Huffman encoding technique, for example as used for CCITT group 3 standard facsimile, each data packet is encoded into a series of variable length codewords separated by a robust EOL code. This is described in a publication xe2x80x9cInternational digital facsimile coding standardsxe2x80x9d Hunter and Robinson, Proc. IEEE-68, pp. 854-867. Although this Huffman technique which employs relative addressing is effective at limiting error propagation from one packet to another and providing data compression, it is frequently unable to provide error free conveyed images when bit error rates (BERs) of 2% or more are experienced during image transmission.
As an alternative to the Huffman technique described above, a SEA-RL (Sequential Edge Addressingxe2x80x94Run Length) encoding technique involves:
(i) representing an image as a two colour (black-white) image in a two dimensional array of pixel elements;
(ii) partitioning the array into bands of single pixel element width and encoding each band in terms of colour transitions and run lengths relative to a reference end of the band to provide a corresponding data packet; and
(iii) assembling the packets into a sequence of data wherein each packet is separated from its successive packet by EOL code data.
The SEA-RL technique was developed for improving transmission reliability when transmitting low resolution documents using very high frequency (VHF) radio communication apparatus arranged to transmit and receive modulated electromagnetic radiation in a frequency range of 30 MHz to 300 MHz. In the technique, run lengths correspond to sizes of groups of consecutive similar colour image pixel elements which are present in the bands. The data packets are not individually addressed in the sequence of data because relative addressing is employed where the data packets are arranged in the sequence in an order in which their respective bands are abuttable to form the image. A description of the SEA-RL technique is provided in a publication xe2x80x9cJoint source-channel coding for raster document transmission over mobile radioxe2x80x9d by Wyrwas and Farrell, IEE Proceedings, Vol. 136, Pt. I, No. 6 pp. 375-380, December 1989. The SEA-RL technique differs from other encoding techniques described above in that absolute rather than relative addressing of groups of image pixels is employed within each packet of data. Moreover, the technique is limited to communicating two tone black-white images only.
In the SEA-RL technique, EOL code data which punctuate data packets in a sequence are susceptible to transmission errors; corruption of EOL data may result in loss of individual packets or possibly loss of two successive packets. Due to their importance, EOL code data are therefore included in duplicate into the sequence. In a situation where one or more of the packets become corrupted in the sequence, entire retransmission of image data is required because relative addressing is employed for the packets. The SEA-RL technique is, in common with alternative techniques described above, based upon encoding an image as a series of bands to generate packets of data for transmission and is therefore heavily reliant on robustness of EOL codes punctuating each packet for coping with transmission errors. The reliance of the SEA-RL technique on its EOL symbols has proved in practice to be its major weakness, especially when bursts of interference corrupt SEA-RL encoded data and EOL codes incorporated therein.
In general, prior art approaches to encoding images described above all rely on:
(i) partitioning an image to be encoded into bands;
(ii) providing packets of data associated with each of these bands;
(iii) adding control information such as EOL and error control code data to each of the packets of data to provide composite data; and
(iv) compressing the composite data.
They suffer from a problem that corruption of the control information within the compressed composite data causes extensive damage to an image reconstructed from the corrupted composite data. Interference giving rise to bit slips may also cause decoder synchronisation errors which results in incorrect data decoding of data subsequent to the bit slips, even when the subsequent data has not been corrupted.
It is an object of the invention to provide an alternative data encoding system.
According to the present invention, a data encoding system is provided incorporating encoding means for encoding input data including:
(a) analysing means for analysing the input data and generating corresponding digits; and
(b) processing means for searching the digits to provide encoded output data, characterised in that
(c) the analysing means is arranged to generate digits which are representative both of data distributions and of data occurrence probabilities; and
(d) the processing means is arranged to select groupings of digits for use in generating the output data,
thereby providing at least one of data compression in the output data and increasing its robustness to data corruption.
The invention provides the advantage that the encoding system provides one or both of data compression and increase in data robustness to corruption.
Moreover, EOL codes are not required in the encoded data and errors occurring during communication of an image through the system generally give rise to damage disseminated throughout the image when received and reconstructed rather than damage which is locally concentrated therein as observed in the prior art.
Damage disseminated generally throughout an image tends to render it more intelligible than localised damage to it because interpolation may be more easily applied to correct for the disseminated damage. Humans possess an eye-brain visual recognition system which is very good at discerning valid visual information imposed upon a noisy background and hence coping with damage disseminated throughout an image.
The encoding system may incorporate decoding means for decoding the encoded data, the means including:
(a) translating means for decoding the encoded data to provide groupings of digits which preserve information on data distributions and data occurrence probabilities encoded in the encoded data; and
(b) interpreting means for interpreting the digits according to the occurrence probabilities to recreate the data in decoded form.
This provides an advantage that the input data received at the encoding means may be reconstructed at the decoding means from the encoded output data. As a consequence of encoding the input data by grouping digits, a need for EOL codes are not required. This provides an advantage that the system is able to recommence transmission of the data after disruption thereof without a need to abandon data which has already been successfully transmitted.
The system treats an image provided to it as a whole composition, dissecting it into a mosaic of variable geometry elements. An absolute address is associated with each element to provide data independence thereby allowing, for example, the elements to be interleaved with one another in the encoded data. The absolute address here defines spatial position of its associated element relative to a reference point for use when reconstructing the image.
The encoding system may be arranged to represent the input data in an array of digits and the processing means may be arranged to identify groupings of digits of mutually similar value in the array for use in generating the encoded output data. This provides an advantage of data compression in the output data by encoding groupings of digits of mutually similar value, and an advantage of being straightforward to implement, for example, in software implementing the encoding means.
The encoding system may be adapted to identify groupings of digits comprising digits which neighbour one another in the array for use in generating the output data. This provides an advantage of data compression in the output data.
The decoding means may be adapted to respond to corruption of digit grouping information in the encoded data received thereat by requesting selective retransmission of the digit grouping information and combining this information with previously received uncorrupted information for decoding the encoded data to recreate the data in decoded form. This provides a benefit that it is only necessary to retransmit the digit grouping information which has been corrupted during transmission from the encoding means to the decoding means rather than retransmitting the input data in entirety as undertaken in the prior art.
The processing means may be adapted to represent the digits in each grouping by a single corresponding grouping size parameter in the encoded output data. This provides an advantage of more efficient data compression in the output data.
The analysing means may be arranged to despeckle the input data represented by the array of digits by identifying isolated digits in the array whose associated values are dissimilar to those of neighbouring digits thereto and to modify the isolated digit values for at least one of increasing numbers of digits included in the groupings and decreasing the number of groupings required for representing the input data in the encoded output data. This provides a benefit of removing features in the input data which convey relatively little information and which result in reduced data compression in the encoded output data.
The analysing means may be arranged to partition the input data into a plurality of zones, and to despeckle digits corresponding to each zone by using a despeckling process preselected for that zone. This provides an advantage that zones incorporating non-critical information may, for example, be despeckled and thereby more efficiently data compressed in the encoded output data whereas zones which incorporate critical information may be un-despeckled and thereby be losslessly encoded into the encoded output data.
The processing means may be adapted to encode the groupings of digits into the encoded output data in an interleaved order corresponding to groupings with progressively smaller numbers of pixels therein, and the translating means may be adapted to check that groupings are encoded in an interleaved order in the encoded data for identifying that the groupings are uncorrupted. This provides a benefit that corrupted data may be identified when corruption has resulted in it being in non-interleaved order.
The translating means may be arranged to filter the encoded data for identifying incorrectly interleaved data therein and the interpreting means may be arranged to disregard the incorrectly interleaved data when recreating the data in decoded form. This provides an advantage that corrupted data which is not in an interleaved order is not used when recreating the data in decoded form.
The processing means may be adapted to search for digit groupings containing a number of digits, said number being in a predefined range, and encode them into the output data. This provides a benefit of providing a compromise between data compression and data resolution in the output data.
The analysing means may be adapted to represent each data distribution of the input data in corresponding digits using entropy encoding by assigning to at least some of the distributions relatively more or fewer digits with non-default values according to whether occurrence probability of the distribution within the input data is relatively higher or lower compared to that of other distributions, subject to each data distribution being unambiguously encoded into the output data by the processing means searching the digits. This provides enhanced data compression in the encoded output data.
The analysing means may be adapted to represent each data distribution of the input data using cluster encoding by assigning to at least some of the distributions relatively more or fewer digits with non-default values according to whether occurrence probability of the distribution within the input data is relatively higher or lower compared to that of other distributions, subject to:
(a) each data distribution being unambiguously encoded into the output data by the processing means searching the digits; and
(b) each data distribution being represented by a corresponding digit combination which differs from that of data distributions most frequently occurring in combination with it by as few digit value differences as possible.
This provides an advantage of more efficient data compression in the encoded output data than that provided by entropy encoding.
The analysing means may be arranged to represent data distributions in the input data which occur most frequently by assigning it a default value and the processing means is arranged to
(i) omit such default values when selecting digit groupings to be encoded into the output data for providing enhanced data compression; and
(ii) include header information in the output data defining the data distribution which occurs most frequently;
and the decoding means is arranged to recreate the data distributions which occur most frequently from the header information data when recreating the output data in decoded form. This enhances data compression.
The processing means may be arranged to apply additional Reed-Muller encoding to the encoded output data, and the translating means is adapted to decode the Reed- Muller encoded data by using trellis decoding means. This provides greater robustness of the encoded output data against corruption.
The processing means may be adapted to incorporate one or more synchronisation codes into the encoded data for assisting resynchronisation of the translating means to the data after loss or absence of synchronisation. This provides an advantage of improved synchronisation when decoding the encoded output data.
The analysing means may be adapted to analyse input data comprising image colour information and to incorporate the information into the digits, the processing means may be adapted to select groupings of the digits for use for providing encoded output data incorporating the colour information, and the decoding means may be adapted to receive the encoded data incorporating colour information, decode it and recreate from it the input data including its colour information. This provides an advantage that the system may be used for communicating colour images, for example where the system is incorporated into a colour facsimile system.
The analysing means may incorporate digit estimating means for estimating occurrence of neighbouring digits occurring in association with groupings of digits selected by the processing means which neighbour onto one another and the processing means may be adapted to include estimation parameters in the encoded output data indicative of the neighbouring digits, and the decoding means may be adapted to decode encoded data incorporating estimation parameters and estimate therefrom presence of neighbouring digits for use in recreating the data.
In another aspect, the invention provides a method of encoding input data in a data encoding system to provide encoded output data, the encoding system incorporating analysing means for analysing the input data and representing it in terms of digits, and processing means for processing the digits to generate the output data, and the method including the steps of:
(a) performing a frequency analysis on the input data to represent both data distributions and data occurrence probabilities in terms of digits to obtain at least one of data compression and increased corruption robustness; and
(b) processing the digits to select groupings of digits for use in generating the output data.