Collaboration systems, such as Microsoft SharePoint™, comprise a collocated collection of applications that are accessible to multiple users through one or more user interfaces, and comprise one or more data stores. Each application or module contributes a distinct set of features to the collaboration system. Such features may include a Web server, a collaborative document repository, a blogging framework, and an authentication server. The user interfaces comprise dedicated client applications and web pages that provide access to the collaboration system. The data stores are used to save data that users create or upload via a network connection and which can be updated or modified by other users.
Most commonly, a data store is provided by a relational database, such as a Structured Query Language (SQL) database. SQL databases are well suited for saving and accessing large numbers of small data items that have inherent structure. Many applications provided by collaboration systems deal with such small structured data items. Examples are intranet applications that comprise HTML data, blog applications, and discussion forums that comprise data featuring information about authorship, date of creation, discussion thread information, and time stamps.
Collaboration systems often comprise document management and repository services that allow users to keep track of different versions of data, such as text processing or spreadsheet documents, over time. Databases can be used to store and handle such unstructured data files as well as the structured metadata that relates to them.
A user does not access the data stores directly, but interacts with the collaboration system. The collaboration system acts as a gatekeeper and enforces a set of rules on each user request before eventually proceeding to retrieve or update the requested information in the data store. The client application is then notified accordingly. By enforcing a set of rules on each transaction request, the collaboration system can make sure that every user is given access to the latest version of any data item in the data stores. Collaboration systems generally provide options for data backup, although more commonly in large organizations this process is handled by third-party tools. In known collaboration systems, the data stores themselves are generally kept on site with the server or servers that implement the collaboration system. A backup is normally performed periodically as a scheduled task and the backup data is stored separately as per user requirements.
Different kinds of data nowadays tend to merge into complex files that have a large volume and lack easily accessible structure. Typical examples of such files include images or other media files such as encoded audio or video signals. Audio-visual content is generally encoded in order to reduce its size, resulting in a stream of binary digits that is as such not readable or usable. The data can only be interpreted for viewing or listening once a decoding step has been performed on the binary data. Binary files of that type are typically known as Binary Large OBjects or BLOBs. However a BLOB may refer to any unstructured binary data, including text documents, spreadsheets, or any data that would generally be considered a file on a computing system.
In collaboration systems, or generally in document repository systems, the inclusion of BLOBs can pose problems to the efficiency of the system. While the underlying SQL data stores are efficient for storing and accessing large numbers of structured data items, they are not efficient for storing and accessing unstructured data items, such as BLOBs. However, as collaboration systems are being used for the sharing and collaboratively working on unstructured data files, BLOB data can rapidly represent over 90% of a collaboration systems data store volume. This incurs poor performance when retrieving stored data and may cause prolonged outages of the system when the data stores are being backed up by the system.
It has been proposed to offload BLOB storage to different, unstructured data stores such as disk file systems that can handle large data files. Such storage can be provided on site using a dedicated storage device, or even at a physically remote storage site. While the metadata related to a BLOB remains stored in an SQL database, the BLOB itself is stored in a remote store. The metadata comprises an access token and a unique identifier for the BLOB. Using the access token, the BLOB can be retrieved and accessed. While such a solution can improve the overall performance of an SQL store in a collaboration system, the system may still encounter prolonged outage times when the large remote data stores are being backed up by the system.
As with any kind of data stored by a collaboration system to which several users may have access, it is important to provide effective and secure access control to BLOB data. One possibility is to encrypt stored BLOB data, and to make sure that it is improbable for unauthorized users to access or decipher an encrypted BLOB.
Collaboration systems such as Microsoft SharePoint™ generally do not natively support inline encryption of data, including remotely stored BLOB data. However, there is often an interface that allows the externalisation of BLOB data. In Sharepoint, such an interface is implemented by External BLOB Storage (EBS) and Remote BLOB Storage (RBS). During the process of such externalisation it is possible to encrypt the data. This may be achieved using a block cipher such as the AES 256 algorithm. In such a case the encryption key is stored on the local server in a key store. The encrypted BLOB is stored in the remote BLOB store. A third element that is used for encoding and deciphering the BLOB may be provided by an Initialization Vector (IV). Initialization Vectors are commonly used with block ciphers. A block of data that is to be encoded is first randomized through multiplication by the IV. The randomized block is then encrypted using the block cipher. This process makes sure that two identical blocks of data will not be encoded to the same encrypted bit sequence by the block cipher, as they will have been randomized by two different IVs.
In known collaboration systems and in general use of block ciphers, the IV is stored together with the encrypted data as part of the encrypted data, for example, in the remote BLOB store. A sufficiently privileged user or administrator, or a hacker maliciously gaining the privileges of such a user, can therefore access an unencrypted form of the binary data by gaining access to the key store and to the BLOB store.
The present application describes methods and systems that alleviate at least some of the problems in relation with the secure storage of Binary Large Objects in collaboration systems.