In computer science, a data file is a computer file which stores data for use by a computer application or system. A data file may use a specific data structure. A data structure is a way of storing data in a computer memory such that it can be used efficiently e.g. by the computer's central processing unit (CPU). As a result, a data structure may impact the functioning of the computer's CPU. Data structures may be implemented using e.g. data types, references and operations on them provided by a programming language. Common data structures are e.g. lists, arrays, linked list, graph data structures, etc. (see e.g. Wikipedia contributors, “Data structure,”).
A data file does generally not refer to files that contain instructions or code to be executed (typically called program files), or to files, which define the operation or structure of an application or system (which include configuration files, directory files, etc.); but more specifically to information used as input, and/or written as output by some other software program. In the following, a data file will however be understood as a computer file which may store both instructions and data for use by the CPU (for a detailed discussion on data file, see e.g. Wikipedia contributors, “Data file,”).
Besides, machine code or machine language is known to be a system of instructions and data directly understandable by a computer's CPU. Instructions are encoded as patterns of bits with different patterns corresponding to different commands to the machine. Every CPU model has its own machine code, or instruction set. There is usually a substantial overlap between the various machine codes in use. Some machine languages give all their instructions the same number of bits, while the instruction length differs in others. How the patterns are organized depends on the specification of the machine code. Common to most is the division of one field (the opcode) which specifies the exact operation (for example “add”). Other fields may give the type of the operands, their location, or their value directly.
More in details, an opcode (the term is an abbreviation of Operation Code) is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format will be laid out in the instruction set architecture (ISA) of the computer hardware component in question, which is e.g. a CPU or a more specialized unit. A complete machine language instruction contains an opcode and, optionally, the specification of one or more operands, that is, what data the operation should act upon. Some operations have implicit operands or none. Some ISAs have instructions with defined fields for opcodes and operands, while others have a more complicated structure.
The operands upon which opcodes operate may consist of registers, values in memory, values stored on the stack, I/O ports, the bus, etc., depending on the CPU architecture. The operations an opcode may specify can include e.g. arithmetic, data copying, logical operations, and program control.
Opcodes can also be found in byte codes interpreted by a byte code interpreter (or virtual machine, in one sense of that term). In these, an instruction set architecture is created to be interpreted by software, rather than a hardware device. Often, byte code interpreters work with higher-level data types and operations than a hardware instruction set, but are constructed along similar lines. Examples include the Java programming language's Java Virtual Machine (JVM), the byte code used in GNU Emacs for compiled LISP code, and many others (see e.g. Wikipedia, “Machine code,” “Opcode,”).
Opcodes shall hereafter denote some low-level instructions to a processor.
Aspects of compression and “Delta” or “Deltafile” technology are now discussed. According to Wikipedia, Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files. Delta encoding is sometimes called Delta compression, particularly where archival histories of changes are required (e.g., in software projects).
The differences are recorded in discrete files called “Deltas” or “diffs”, after the Unix file comparison utility, “diff”. Because changes are often small (typically 2% total size on average), Delta encoding greatly reduces data redundancy. Collections of unique Deltas are substantially more space-efficient than their non-encoded equivalents.
Perhaps the simplest example is storing values of bytes as differences (Deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, one would store 2, 2, 2, 3, −2. This is not very useful when used alone, but it can help further compression of data in which sequential values occur often. Compression algorithms often choose to Delta encode only when the compression is better than without. For example, in video compression Delta frames can considerably reduce frame size, and are used in a number of video compression codec.
A Deltafile can be defined in two ways, a symmetric Delta or a directed Delta. A symmetric Delta between two versions v1 and v2 consists of properties specific to both v1 and v2 (v1\v2 and v2\v1). A directed Delta, also called a change, is a sequence of (elementary) change operations which, when applied to one version v1, yields another version v2.
The nature of the data to be encoded influences the effectiveness of a particular compression algorithm. Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method.
Thus, Delta technology is a way of creating new versions by reusing old versions and this, by exploiting their likely similarity. It is worth mentioning that the primary purpose of a Delta Language is to binary-encode differences between reference data or “ancestor” (e.g. the “old” version of a file, typically a binary file) and updated data or “descendant” (e.g. the “new” binary version of the file) in a highly compressed manner.
There are many ways of compressing computer information prior to transmission but in these days of hundred's of megabytes of memory and multi gigahertz processors, there are none aimed specifically at the embedded market where both memory and clock-cycles are typically in short supply. Additionally, the embedded market has simultaneously embraced advanced compiler technology combined with compact processors with correspondingly difficult-to-compress instruction sets. The result is that embedded software updates (by remote download) are taking longer because the software is growing in size faster than the typical bandwidth available. A new approach of updating data is therefore required for embedded systems. Said approach should preferably be convenient for systems, which combine the sophistication of Delta technology with a small memory footprint and relatively low-cycle processor that we find on today's embedded systems.
There are many Delta algorithms available (some of which are free). However, on examination and evaluation, such algorithms were found to exhibit unsatisfactory performance and/or use too much memory, at least for some specific practical implementation such as update of embedded softwares.
Accordingly, there is a need for a new computer-implemented method for updating data, which improves performance and/or needs less memory compared with known methods.