The demand for storage has been rapidly increasing. As the amount of data, such as digital media, stored by users grows so does their need to store digital media reliably over extended periods of time. Traditional backup solutions periodically copy data to, for example, backup tapes, compact discs (CDs), or other local storage media. However, such solutions are not optimal as the backup media is stored in a single location and media used for backup are prone to failure.
Other solutions include storing data files on a local hard-drive of a personal computer (PC) and synchronizing the data remotely using hosted storage services. Having a remote backup ensures data is stored in multiple locations, and is protected from local disasters, such as fires or floods. However, such solutions require installation of special client software on each individual PC, which is prone to software incompatibilities, lack of central control, and high deployment cost.
Commercially available services, referred to as cloud storage services, provide mass storage through a web service interface available through the Internet. FIG. 1 illustrates a storage system 100 designed to provide cloud storage services. The system 100 includes a distributed array of geographically distributed data centers 110-1 to 110-M connected to a plurality of clients 120-1 to 120-N through a wide area network (WAN) 130.
A data center 110 typically consists of servers and mass storage facilitating cloud storage services to the clients 120. Such services enable applications including, for example, backup and restoration of data, data migration, data sharing, data collaboration, and so on. Cloud storage services are accessible from anywhere in the world. To this end, each client 120 implements a web services interface designed to at least synchronize data with the data centers 110. Applications enabled by the cloud storage services are not aware of the specifics of the services and the underlying data synchronization operations. The disadvantage of commercially available cloud storage services is that such services do not implement standard file sharing protocols (e.g., common internet file system (CIFS) or network file system (NFS)). Furthermore, accessing files stored in the cloud storage is typically slower than accessing files stored in local storage devices.
Existing cloud storage networks do not permit background and real time processing of files or other types of unstructured data uploaded to a cloud storage system. In particular, background processing may include a variety of tasks, such as thumbnail creation, automatic document summarization, image resizing, video transcoding, metadata indexing, sending user notifications, and file scanning.
File scanning may be particularly desirable to allow additional services on the stored data, for example security, indexing, and analytic services. Such services can be achieved by means of scanning engines, which may be internal to the cloud storage service or implemented as external cloud storage services. Such engines may scan files or other unstructured data, store the scanning results, and potentially perform some action based on those results.
It would therefore be advantageous to provide a solution that would overcome the deficiencies of the prior art by permitting background and real time scanning of data uploaded to or stored in a cloud storage system.