Such data sequences are used widely in computer processing fields, as many computer applications involve the creation and manipulation of structured data. For instance, such data sequences are used extensively in database systems. Generally in such systems, there will be a database server computer arranged to manage the data within the database. Client computers will connect to the server computer via a network in order to send database queries to the server computer. The server will then process those queries, and pass the results back to the client. These results will generally take the form of a structured data sequence of the type discussed above (ie having a plurality of records, and each record having a plurality of fields with data items stored therein). For example, a database containing details of a company's employees would typically have a data record for each employee. Each such data record would have a number of fields for storing data such as name, age, sex, job description, etc. Within each field, there will be stored a data item specific to the individual, for example, Mr Smith, 37, Male, Sales executive, etc. Hence a query performed on that database will generally result in a data sequence being returned to the client which contains a number of records, one for each employee meeting the requirements of the database query.
Since data storage is expensive, it is clearly desirable to minimise the amount of storage required to store structured data. Additionally, when a data sequence is copied or transferred between storage locations, it is desirable to minimise the overhead in terms of CPU cycles, network usage, etc. within the database field, much research has been carried out in to techniques for maintaining copies of data. Generally, these techniques are referred to as `data replication` techniques. The act of making a copy of data may result in a large sequence of data being transferred from a source to a target, which as mentioned earlier is typically very costly in terms of CPU cycles, network usage, etc. within the database arena, this `data replication` is often a repeated process with the copies being made at frequent intervals. Hence, the overhead involved in making each copy is an important issue, and it is clearly advantageous to minimise such overhead.
To reduce the volume of data needing to be transferred and the time required to copy a set of data, an area of database technology called `change propagation` has been developed. Change propagation involves identifying the changes to one copy of a set of data, and to only forward those changes to the locations where other copies of that data set are stored. For example, if on Monday system B establishes a complete copy of a particular data set stored on system A, then on Tuesday it will only be necessary to send system B a copy of the changes made to the original data set stored on system A since the time on Monday that the copy was made. By such an approach, a copy can be maintained without the need for a full refresh of the entire data set. However, even when employing change propagation techniques, the set of changes from one copy to the other may be quite large, and hence the cost may still be significant.
Given the above problems, it is an object of the present invention to provide a technique for compressing structured data which will alleviate the cost of maintaining and replicating structured data.