The exemplary embodiment relates to the field of document processing. It finds particular application in connection with selectively encrypting XML documents for processing to enhance security.
There are multiple service providers that provide remote processing services of structured documents, such as extensible markup language (XML) documents. For example, a customer may request that a service provider performs batch operations on a set of XML documents such as indexing, validation and transformation through a world wide web (WWW) interface. Currently, when a customer wants an external service provider to host and manage confidential documents, the customer has to be able to trust the service provider, along with the service provider's information system and internal policies regarding confidential material. Confidential documents may be transmitted to the service provider's hosting system over an encrypted secured channel to protect the sensitive information from being intercepted during transmission. Additionally, the documents themselves may be encrypted in a manner that allows only the receiving party (e.g., the service provider) to decrypt and read the documents. Provided that the decryption key is not known by the service provider, pure storage and archiving of encrypted documents is highly secure, but of little interest as no meaningful operations can be performed on the customer's documents.
However, an XML document, once encrypted using standard approaches, is like an opaque and flat bit packet on which only two basic operations can be undertaken: integrity checking and decryption. Therefore, once transmitted to and hosted at the service provider, the document must be decrypted in order to offer complex processing involved in services such as indexing, validation and transformation. In order to allow for decryption of the customer's documents at the service provider, the customer shares the decryption key with the service provider which can be risky. The decryption key may be intercepted or used by the intended recipient in an unauthorized manner. Moreover, there is the problem of data remanence (persisting information on a disk after file system deletion), as well as bugs or viruses on the service provider's system that may compromise the security of any stored documents. Thus, in order for services to be provided to a customer, the underlying data and structure of the customer's documents must be readable by the service provider without risk to the confidentiality of the customer's data. Accordingly, it is desirable to have a method and system for preserving security for confidential documents while retaining the ability to process the documents remotely by a service provider.