Many computing actions require a file, containing a block of data, to be transferred between the file-system of one computer to the file-system of another computer.
There are a number of benefits of using a messaging infrastructure to perform this transfer, instead of via a direct network connection based mechanism such as the File Transfer Protocol (FTP). These include benefiting from the reliability and indirect routing provided by messaging infrastructure technologies, as well as the asynchronous nature of messaging which allows files to be placed into the messaging infrastructure for delivery whenever the target machine becomes available.
File transfer over a messaging infrastructure may involve transferring large numbers of files. Individually large files being transferred through a messaging infrastructure may be split into messages containing separate parts of a file.
Messages containing parts of an individual file may be delayed, for short or long amounts of time, or may arrive out of sequence. Messages containing parts of different files from multiple locations may also arrive concurrently on a single target queue.
These issues can be overcome using the following known approaches.
In a first existing approach, the complete file must arrive on a queue before writing it to the file system in the correct order using sequence numbers (or technology built into the messaging infrastructure). This has the disadvantage of requiring enough space on the target queue to contain the compete file. It can also delay availability of the file compared to writing the parts of the file as they arrive in messages.
In a second existing approach, separate state data is kept relating to the messages that arrive containing parts of a file, and the complete file is built from these messages in the correct order using this state data. This requires mechanisms to reliably persist this state data, for example, in a file, on a queue, or in a database. Updating this state data must also be carefully coordinated with writing to the file system, and complex logic may be required in the case of a failure or restart of the machine.
In addition, it may be a requirement to ensure non-repudiation of the file data and detection of any alteration of files during transit. It is known to use a hash function or message digest to ensure a block of data has not been tampered with accidentally or maliciously. A cryptographic hash function summarises an arbitrarily large quantity of data into a fixed size summary.
Existing uses of hash functions or message digests, take a single hash or digest of the data when it is known to be valid (i.e. at the source) and then pass the hash around with the data to ensure consistency.
Examples of known uses of a message digest include:                A signed email, where the hash is sent encrypted with the message, and hence ensures against tampering.        A hash sent with a packet of data traveling across a network connection protected by a cryptographic protocol, such as SSL (Secure Sockets Layer) or TLS (Transport Layer Security).        An HTTP/FTP (Hypertext Transfer Protocol/File Transfer Protocol) download of a file from the Internet, to check the final down loaded file has not been broken in transit. For example, a file may be broken into chunks and the chunks sent from A to B. The final re-assembled chunks are then checked against a hash taken at A for the entire file.        
In the case of file transfer using a messaging infrastructure, non-repudiation can be ensured by gathering a hash or message digest of the complete file and sending it with the messages comprising that complete file. Once completely transferred, the hash or message digest can be gathered on the file on the file system and compared to that expected. This approach requires the complete file to be read back from disk after writing has been completed. This is a slow operation for a large file.