This invention relates to data processing, and in particular to computer processing of set systems such as can be used in various applications including, for example, broadcast encryption and certificate revocation.
EXCLUSIVE SET SYSTEMS. In [8] Kumar and Russell formalized the notion of an exclusive set system, which is a family of sets for which every large subset of the universe can be written as the union of some collection of subsets from the family. More formally,
Definition 1. A family of subsets CC={S1, . . . , Sk}over [n] is (n,k,r,t)—exclusive if for any subset R⊂[n] with |R|≦r, we can write [n]\R=∪tj=1Sij for some 1≦ij≦k. (Indices ij do not have to be distinct, so R can be the union of less than t distinct sets Sij.) Here [n] denotes the set of positive integers {1, . . . , n}. Clearly, [n] can be replaced with any set U of n entities.
The family {Sj1, . . . , Sjt} is called a cover or covering for the set [n]\R, and is sometimes denoted CR herein. In practical applications, we can think of R as the set of “revoked” entities, and the set P=U\R as the set of “privileged” entities. We will sometimes refer to P as “target set”. We denote C(P)=CR.
In the example of FIG. 1, the elements of [n] are shown as crosses in a two-dimensional plane. Each element iε[n] is marked with reference numeral 104.i. The set R consists of elements 104.1, 104.2, 104.3 (r≧3). The set [n]\R is covered by three sets S1, S2, S3 (t≧3 and k≧3), where S1={4,5,6}, S2={6,7}, and S3={8}.
Determining the exact tradeoff between n, k, r, and t is a fundamental combinatorial problem with significant applications in computer science.
APPLICATION TO BROADCAST ENCRYPTION. In a broadcast encryption scheme, there is a server 210 (FIG. 2A) sending a broadcast to n clients 104.1-104.n. The broadcast content B is encrypted with some symmetric encryption algorithm 1 (as shown at 220) using a secret key bk. The encrypted content E1bk(B) is broadcast to the clients 104. Each client 104.i possesses an encryption key ki for a symmetric encryption algorithm 2. In this example, the set R of revoked clients consists of terminals {1, . . . , r}, i.e. {104.1, . . . , 104.r}. The server encrypts the key bk with the algorithm 2 (as shown at 230) n−r times using the respective keys kr+1, . . . , kn of the non-revoked clients. The resulting encryptions are shown as E2kr+1(bk), . . . , E2kn(bk). The server broadcasts these encryptions (possibly over a wireless network).
Each client 104 (FIG. 2B) receives these broadcasts. The non-revoked clients 104.r+1 . . . , 104.n each execute a decryption algorithm 2 (as shown at 240) corresponding to the encryption algorithm 2. At step 240, each of these clients i (i=r+1, . . . , n) uses the corresponding key ki and the encryption E2ki(bk) to recover the key bk. The key bk and the broadcast encryption E1bk(B) are then provided as inputs to a decryption algorithm 1 corresponding to the encryption algorithm 1, as shown at 250. The output is the broadcast content B.
The revoked clients 104.1, . . . , 104.r cannot recover the broadcast content B because they do not receive the encryptions of the broadcast key bk with the keys k1, . . . , kr.
In this example, each broadcast includes n-r encryptions at step 230. The number of encryptions can be reduced to at most t if each set Si is associated with an encryption key kSi provided to all clients 104 which are members of the set Si. See FIG. 3. The server determines the set cover {Sij|j=1, . . . , t} for the set [n]\R. At step 230 (FIG. 4A), the server 210 encrypts the key bk using the corresponding keys.
      k          S              i        j              .Since only the non-revoked clients each have one or more of the keys
      k          S              i        j              ,only these clients will be able to recover the key bk at step 240 (FIG. 4B) and recover the broadcast content B. At step 240, the client can use any key
  k      S          i      j      for the set Sij to which the client belongs. Any coalition of the revoked members (revoked clients) learns no information from the broadcast even if they collude.
Since each subset of t keys can correspond to at most one set [n]\R, we need
            (                                    k                                                t                              )        ≥                  ∑                  i          =          0                r            ⁢              (                                            n                                                          i                                      )              ≥          (                                    n                                                r                              )        ,or equivalently,
  k  =            Ω      ⁡              (                              t            ⁡                          (                                                                    n                                                                                        r                                                              )                                            r            /            t                          )              .  (The lower bound we use here is the same as that given by Lemma 11 in [10], and is unknown to be tight for general n, r, and t. We note that the bounds in that paper are generally not tight.) For instance, their Theorem 12 can be improved by using the sunflower lemma with relaxed disjointness (p. 82 in [6]) instead of the sunflower lemma. This general technique of using exclusive set systems for broadcast encryption in known in the art as the subset-cover framework. In particular, an exclusive set system forms a “subset cover”:
Definition 2. A collection CC(U) of subsets S1, . . . , Sw with Sj⊂ U, 1≦j≦w is called a subset cover for U if for any subset P⊂ U, there exist indices i1, . . . , imε[1, w] such that:
  P  =                    ⋃        m                    j        =        1              ⁢          S              i        j            We call m the size of the covering.
Although problems like set-cover are NP-hard, it turns out that it is very simple to determine if a given collection of subsets forms a cover for a given target set.
Lemma A collection S1, . . . , Sw with Sj, ⊂ U, 1≦j≦w forms a subset cover if and only if all the singleton sets (that is, {u} for all uεU) are included among the Sj.
Proof: It is easy to show the “if” direction since for any (finite) subset P⊂ U, we have:
  P  =            ⋃              u        ∈        P              ⁢          {      u      }      and the {u} subsets are among the cover by hypothesis. For the “only if” direction, observe that if P={u}, then the set {u} must necessarily be part of the subset cover. Therefore, for each u ε U, we have that {u} is in the subset cover.
Broadcast encryption techniques can be used for secure multicast of privileged content such as premium video, audio, and text content. One may also use broadcast encryption for content protection on external storage devices such as the hard disk of a mobile phone, USB storage devices, Flash cards, CD and DVD ROMs, etc. Other applications include Symmetric Key Infrastructures, and other situations in which a valuable content must be transmitted to multiple recipients in a secure manner.
APPLICATION TO CERTIFICATE REVOCATION. In FIG. 5, elements 104 are digital certificates used in public key infrastructures (PKI) to facilitate secure use and management of public keys in a networked computer environment. Each certificate 104 contains a user's public key PK and the user's name and may also contain the user's email address or addresses, the certificate's serial number SN (generated by a certificate authority 610 (FIG. 6A) to simplify the certificate management), the certificate issue date D1, the expiration date D2, an identification of algorithms to be used with the public and secret keys, an identification of the CA 610, validity proof data 104-V (described below) and possibly other data. The data mentioned above is shown at 104D. Certificate 104 also contains CA's signature 104-SigCA on the data 104D. CA 610 sends the certificate 104 to the user's (key owner's) computer system (not shown). Either the owner or the CA 610 can distribute the certificate to other parties to inform them of the user's public key PK. Such parties can verify the CA's signature 104-SigCA with the CA's public key to ascertain that the certificate's public key PK does indeed belong to the person whose name and email address are provided in the certificate.
If a certificate 104 is revoked, other parties must be prevented from using the certificate. Validity proof data 104-V is used to ascertain that the certificate is valid. In existing certificate revocation schemes known in the art, such as the one of Micali [12,13,14] and subsequently by Aiello et al., [2], in each period m (e.g. each day), certificate authority 610 issues a validation proof cm (possibly over a wireless network or some other type of network) for each non-revoked certificate in the public-key infrastructure. CA's clients 620 (FIG. 6B) provide the validation proof cm for the certificate with the certificate's validity data 104-V to a verification algorithm, as shown at 630. The verification algorithm's output indicates whether or not the certificate is valid in the period m.
In the original work of Micali, one validation proof was issued per non-revoked certificate. Thus the overall communication complexity of the system was proportional to n−r where n is the total number of users and r is the number of non-revoked certificates. Aiello et al. observed that instead of having one validity proof apply to an one individual user, one could group users together into various subsets Si as in the definition 1 or 2. In FIGS. 3 and 6A, each subset Si is associated with cryptographic information kSi from which the CA can generate a validation proof cm(Si) for the period m. This single validation proof proves the validity of all the certificates in the subset Si. For each period m, the CA determines a cover {Sij} for the set of non-revoked certificates, computes the validation proofs cm(Sij), and distributes the validation proofs to the clients 620 (which may include the certificate owners and/or other parties).
Since each subset Si must be provided with a validity proof cm(Si), the number of total validity proofs may increase, but the communication complexity for transmitting the proofs is now proportional to the t parameter in the underlying subset-cover system, and generally speaking, t<n−r, so the overall communication needed for this approach is less than that needed for the original Micali approach.
Subset covers can be constructed using trees. FIG. 7 illustrates a binary tree 710 for eight certificates, numbered 1 through 8. Each node represents a set Si in a subset cover CC(U) for the set U of the certificates. Each leaf node (labeled 1, 2, . . . ) represents a singleton set for a respective certificate 1, 2, . . . Each higher level node represents the union of its children. E.g., node 1-4 represents the set of certificates 1 through 4. The root represents all the certificates. (We will use the numeral 710 to represent both the tree and the subset cover, i.e. the system of subsets.)
If a certificate is revoked, then the corresponding leaf is revoked, i.e. represents a set for which a validity proof must not be provided. Also, each node in the path from the leaf to the root is revoked. In the example of FIG. 7, the certificates 3, 7 and 8 are revoked (as indicated by the “x” mark). The sets 3-4, 1-4, 1-8, 7-8, 5-8 are therefore revoked. The minimal cover of the non-revoked certificates consists of nodes 1-2, 4, 5-6. Generally, the minimal cover C(P) consists of all the nodes that are immediate children of the revoked nodes. Computer algorithms for tree traversal are known that can be implemented on CA 610 to mark revoked nodes when a certificate is revoked, and to find all the immediate unrevoked children of the revoked nodes. FIG. 8 illustrates an algorithm for generating C(P) as the set of the immediate unrevoked children. C(P) is initialized to empty (step 1). At step 2, for each revoked node v, if v has unrevoked children, the children are added to C(P).
The tree method of FIG. 7 is known as a complete subtree method. Each user then stores 1+log2n keys; these correspond to the keys associated with the vertices (i.e. nodes) on the path from the user's assigned leaf to the root of the tree. Using the complete subtree method, there is a covering of size O(r log(n/r)). See [9].
FREE RIDERS. Most of the prior art on broadcast encryption assumes a stringent security requirement. In particular, any member of the privileged set P can decrypt a broadcast, but no one outside this set should be able to. For many applications this is unnecessary. In particular, the system might be able to tolerate some number of non-privileged users being able to decrypt a broadcast. Such a user is termed a free rider.
We now consider why it is desirable to permit free riders. In many situations, it is conceivable that relaxing the system to allow free riders might decrease other system costs substantially. For example, we may be able to decrease the amount of data communicated during the broadcast—if there are a large number of users to whom data is broadcast, then this might decrease the overall network traffic by a substantial amount. Consider, for example, the subset cover of FIG. 7 (which was described in the context of certificate revocation but can also be constructed for broadcast encryption). In this example, the optimal (minimal) covering C(P) of the set of privileged users consists of three nodes 1-2, 4, 5-6 as described above. If one free rider is allowed, then we could make node 3 a free rider. The minimal covering would consist then of only two nodes 1-4 and 5-6. The network traffic would be reduced.
The notion of free riders was introduced by Abdalla, Shavit, and Wool [1]. (The bracketed numbers denote references listed at the end of this disclosure before the claims.) The authors of [1] first demonstrate by example that allowing free riders can reduce the amount of data communicated (and hence the overall network traffic) in many situations. Next, they consider the question of how one might be able to arrive at the “optimal” solution that minimizes traffic for a given number of free riders. They observe that this problem is NP complete in the worst case; i.e., it is unlikely for there to be an efficient solution. They even provide theoretical evidence that the task of coming up with an algorithm that provides performance which always closely approximates the optimal solution is also challenging. Next, they suggest some “heuristic” approaches that apply to a broadcast encryption schemes in the subset-cover framework. Finally, they do experiments to analyze the performance of their heuristics on the Complete Subtree method. Their heuristics seem to perform well in these experiments, and they use these results to bolster their claims that free riders are a useful notion in the context of broadcast encryption.
More particularly, [1] defines an f-redundant cover as a cover CC={Si} such that, for each set P⊂U, there is a covering Cf(P)={Sij} satisfying the condition:
                                                                                                      ⋃                  m                                                  j                  =                  1                                            ⁢                              S                                  i                  j                                                                                                  P                                      ≤        f                            (        1        )            For a complete subtree method, their heuristic approach is as follows. The covering Cf(P) is initialized to empty. The tree (such as in FIG. 7) is traversed starting at the root (node 1-8 in the example of FIG. 7) in a depth-first-search or a breadth-first-search order. Whenever a “good” set (i.e. node) v is found, add v to the cover Cf(P) and ignore v's children. The set v is good if the following condition holds. Let vu denote all the users in v that are not yet covered by Cf(P) (i.e. are not in any set in Cf(P)). Let vP denote all the privileged users (i.e. members of P) that are not yet covered by Cf(P). Then v is good if
                                                                    v              U                                                                                  v              P                                                  ≤        f                            (        2        )            
[1] notes that this approach works also on non-binary trees.
This approach provides an f-redundant covering in time O(n). However, the authors of [1] do not guarantee that the covering Cf(P) is minimal.