Electronic binary files exist in many different formats for many different uses. These formats include formats suitable for storage of image, sound, text, data, executable files, and so on.
Binary files containing data, if not encrypted, tend towards a structured format. There is usually header information, text, repetition, and positioning amongst other components. Generally the first few bytes in a binary file contain an indicator of a file type and therefore the application with which the binary file is compatible. Executable files or files used to perform functions of any type have a considerably less structured format. There is, however, an element of structure as these files either have to interact with an operating system to perform a function, or they are part of the operating system.
Compressed and encrypted files have the least structure as, by design, they remove repetitive values within a file. In the case of encryption, a key is used to define the substituted values. For compression, a “shorthand” is used for repetitive structures. In the case of an encrypted or compressed file, the file will not only have the internal structure changed but, particularly in the case of compression, the size of the file.
Mathematically for a binary file of size 1,048,576 bytes (1 Mb) there are 2561,048,576 possible structures of arrangement of bytes possible. In actual usage only a fraction of this number is used. The number actually used can only be approximated based on an estimate of a number of different file types, functionality of executable or operational files, and compression and encryption routines available.
There are many existing techniques to perform data compression on a data file. Some data compression algorithms are based on indexing techniques and involve the calculation and indexing of unique values within a data file. In most compressed data files, there is some repetition of data values within each 256 byte code segment. In average files, there are only 160 to 170 unique non-repeated values per 256 byte segment of code. Data compression techniques based on factorial calculations do not work very well with this number of values.