1. Field of Invention
The invention relates to remote file operations, particularly to secure remote file operations, and more particularly to doing the operations at block level.
2. Description of the Related Art
Today's business environment often requires the need for reviewing and editing documents from remote locations. Having access to networking technologies, like the Internet, allows the user to remotely access, via laptops, PDAs, etc., documents securely stored at a central location. For example, a user may create a document at work, save it on an online file depository, and access the same file later at home. In another example, the user may create a presentation document, upload it to the online file depository, and allow access to the same file to colleagues around the world.
Uploading and retrieving user files to and from an online file server is typically carried out using a client-server architecture. FIG. 1 shows a client side network stack 100 employed by typical prior art file transfer programs. The network stack includes a User Application 101, a Remote Drive Interface 103, a Data Transfer layer 105, and a LAN Driver 109. The horizontal line 107 labeled “OS” indicates that the layers below the line are implemented within the operating system (OS) kernel. Although not shown in FIG. 1, it is implied that the client side network stack communicates with a server, which includes remote file storage. The user application 101 communicates with the remote drive interface layer 103 to request or store files. The remote drive interface layer 103 may offer additional primitives such as select remote drive, rename file, select compression algorithm, select encryption algorithm, etc. The remote drive interface layer 103, in turn, communicates with the data transfer layer 105, which offers primitives such as “get,” “put,” etc. The data transfer layer may implement the primitives with a transfer protocol, such as file transfer protocol (FTP), simple file transfer protocol (SFTP), etc. Finally, the LAN driver is used to transmit or receive the data payload requested by various higher level operations.
For various reasons, some of which are discussed below, these operations have been limited to treating the files as a whole. For example, if a file stored on the server needs to be appended, the user will have to upload the complete file, append the new data, to replace the version stored on the server. In general, the data transfer layer 105 of FIG. 1, does not allow uploading only portions of the files that need updating or downloading only selected portions; instead the whole updated file needs to be uploaded or downloaded. For a given bandwidth, the amount of time taken to upload a file to the server is primarily dictated by the file size. In situations where file sizes are very large, transferring the whole file each time, for even a small modification, may prove to be quite inefficient. Further, the protocols associated with the data transfer layer 105 transfer files sequentially. Consequently, for this reason as well, transferring large files results in large delays.
To reduce large temporal costs associated with large file transfers, many applications compress the file before being transferred. For example, in FIG. 1, the remote drive interface layer 103 compresses the file before it is transferred to the data transfer layer 105. Lossless file compression algorithms are usually based on two popular methods: minimum redundancy coding (e.g., Huffman coding) and dictionary based method (e.g., Lempel-Ziv). By its very nature, compression substitutes a shorter (measured in bits) representation (or code) of a symbol for a relatively longer representation of the same symbol in the original file. For example, an 8-bit representation of the letter ‘E’ in the original file may be replaced by a 3-bit unique code in the compressed file. Note that the uniqueness of an encoding of a symbol is limited to the same file or dataset. In other words, a letter ‘E’ in one file may be encoded with an entirely different code and code-length than that for the letter ‘E’ in a second file (for example, the Huffman coding encodes each symbol based on that symbols frequency in the file or dataset). As a result, the offset from the start of file to the code for a symbol in the compressed file may be different from that for the same symbol in the original file. In other words, the block list mapping from the original file to the compressed file cannot be determined a priori. This makes it very difficult to merely update a block of data in a file which is stored compressed or retrieve a selected block from the stored compressed file.
Therefore, a solution is needed that enables overwriting or retrieving any section of an existing file on the remote file server while being efficient and cost effective.