Storage blocks are located at low level within a computing environment. Block level devices are agnostic to all application, system, file and user operations.—Therefore, it is difficult to do block level optimization for higher level elements of a computing system.
In particular, Wide Area Network (WAN) such as but not limited to Internet (Web) storage applications that are block based, pack data to be sent over the WAN into objects without knowledge of which user, file or application those blocks belong to. The result is that objects sent for storage at a remote location will contain blocks belonging to multiple files (the blocks of a single file will get stored in multiple objects at the remote location); objects will contain blocks belonging to different applications; objects will contain blocks belonging to different users, and so on. This chaotic way of building objects for remote storage results in significant performance, speed in data recovery and bandwidth utilization penalties.
For example, operations that change files and delete files stored on block level devices using remote storage targets result in highly inefficient procedures. In order for file operations to be properly reflected on the remote storage, the remote objects that contain blocks for the affected file(s) need to be brought back from the remote location and need to be split so the blocks belonging to the affected file can be marked as free or be over-written. In some cases it is more efficient to just leave the deleted files in the remote storage instead of managing them
The blocks inside the objects affected and that belong to other files need to be preserved and packed into new objects. Once all these operations are completed, the objects with the changed blocks and the objects with the preserved blocks need to be re-transmitted to the remote storage system.
If another file gets deleted and its blocks are in objects that were just changed by the situation just described, those objects need to be brought back again to go through the same break and re-package procedure.
The operations just described result in severe penalties in terms of input/output (I/O) performance and bandwidth utilization, especially if files are constantly being changed or deleted. Moreover, since a block device is not file aware, the blocks for deleted files or modified files that have not been overwritten remain in objects at the remote location even if they were marked as free. This means that the remote storage resource utilization will not be reduced even if files are deleted or reduced in size when block based systems control the remote storage.
Data retention policies and day to day data management may require elimination of unused files. The behavior of block based systems in the performance of such operations results in complex, inefficient operations.
The problem is similar for applications. An application that requires data that is not cached locally and must be fetched from the remote location will result in large numbers of objects with little data required by the application being transferred back and forth over the WAN.
Read-ahead optimizations which can greatly improve application performance over the WAN are not possible since the remote objects that have the data for the application will contain blocks that belong to other applications or files. These constraints make the use of remote storage for primary data very difficult and inefficient.
If user activity needs to be audited, having the data blocks produced by the users scattered in hundreds of objects will result in a very expensive process of tracking the user activity in terms of performance and bandwidth optimization.
At the system level, blocks that are written after a system shutdown command may end up been unnecessarily stored in the remote location even though they could have been flushed.
The operation of current block level systems poses a number of significant penalties when storing data in remote locations.