Secure Multi-Party Computation
With the widespread availability of communication networks, such as the Internet, it is now common to ‘outsource’ data processing tasks to third parties for a number of reasons. For example, the processing can be done at a reduced cost, or the third party has better computational resources or better technologies. One concern with outsourcing data processing is an inappropriate use of confidential information by third parties.
A user or client computer system, perhaps with limited data processing resources, desires to recognize speech. A third party or server computer system has some of the necessary resources. Conventionally, the client and the server are usually called Alice and Bob, respectively. Speech recognition could easily be a part of this trend, where now third-party servers provide speech recognition services for remote clients.
The private nature of speech data however is a stumbling block for such a development. Individuals, corporations and governments are understandably reluctant to send private speech through a network to another party that cannot be trusted.
It is desired to have the third party sever recognize the speech without revealing the speech to the third party. At the same time, the third party wants to keep its speech recognizer secret. For such applications, conventional cryptography protects only the speech during transport, and not the processing by the third party.
Secure multi-party computations (SMC) are often analyzed for correctness, security, and overhead. Correctness measures how close a secure process approaches an ideal solution. Security measures the amount of information that can be gained from the multi-party exchange. Overhead is a measure of complexity and efficiency.
A multi-party protocol between Alice and Bob is secure when privacy and correctness are guaranteed for both Alice and Bob. The protocol protects privacy when the information that is ‘leaked’ by the distributed computation is limited to the information that can be learned from the designated output of the computation.
In a semi-honest case, both parties follow the protocol as prescribed but may record all messages and subsequently deduce information not derivable solely from the protocol output. In the malicious case, however, no assumption is made about: the behavior of the parties. It: is required that the privacy of one party is preserved even in the case of an arbitrary behavior of another party. A protocol in the semi-honest case can be made secure in the malicious case when accompanied with zero-knowledge proofs that both parties follow the protocol.
Zero Knowledge Protocols
Zero-knowledge or secure multi-party protocols were first described by Yao, “How to generate and exchange secrets,” Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pp. 162-167, 1986, incorporated herein by reference. Later, that: zero-knowledge technique was extended to other problems, Goldreich et al., “How to play any mental game—a completeness theorem for protocols with honest majority,” 19th ACM Symposium on the Theory of Computing, pp 218-229, 1987, incorporated herein by reference. However, those theoretical constructs are still too demanding to be of any use for practical applications.
Since then, many secure methods have been described, Chang et: al., “Oblivious Polynomial Evaluation and Oblivious Neural Learning,” Advances in Cryptology, Asiacrypt '01, Lecture Notes in Computer Science Vol. 2248, pages 369-384, 2001, Clifton et al., “Tools for Privacy Preserving Distributed Data Mining,” SIGKDD Explorations, 4(2); 28-34, 2002, Koller et al, “Protected Interactive 3D Graphics Via Remote Rendering,” SIGGRAPH 2004, Lindell et al., “Privacy preserving data mining,” Advances in Cryptology—Crypto 2000, LNCS 1880, 2000, Naor et al., “Oblivious Polynomial Evaluation.” Proc. of the 31st Symp. on Theory of Computer Science (STOC), pp. 245-254, May 1999, and Du et al., “Privacy-preserving cooperative scientific computations,” 4th IEEE Computer Security Foundations Workshop, pp. 273-282, Jun. 11, 2001, all incorporated herein by reference.
Secure Inner Dot Products
A large number of computer implemented methods and applications require that one computes an inner product. Therefore, protocols and procedures for determining a secure inner dot product (SIP) have been developed. It is understood that these protocols are known to those of ordinary skill in the art.
The protocols can be categorized broadly as cryptographic protocols and algebraic protocols. The protocols provide different levels of security and efficiency. Because the computational cost, in general, is constantly decreasing, the main concern in evaluating the protocols is security and communication costs.
Generally, an inner dot product of two vectors x and y produces two scalar values a and b according to xTy=a+b, where T is the transpose operator.
Cryptographic Protocols
Several methods are known for providing secure inner dot products, see B. Goethals, S. Lain, H. Lipmaa, and T, Mielikainen, “On private scalar product computation for privacy-preserving data mining,” C. Park and S. Chee, editors, Intl. Conference on Information Security and Cryptology, volume 2506 of Lecture Notes in Computer Science, pages 104-120, 2004; X Feigenbaum, Y. .Ishai, T. Malkin, K, Nissim, M. Strauss, and R, N. Wright, “Secure multi-party computation of approximations,” Proceedings of the Intl. Colloquium on Automata, Languages and Programming, volume 2076 of Lecture Notes in Computer Science, pages 927-938, 2001; R. Canetti, Y. Ishai, R. Kumar, M, K. Reiter, R. Rubinfeld, and R. N. Wright, “Selective private function evaluation with applications to private statistics,” Proceedings of the ACM symposium on principles of distributed computing, pages 293-304, 2001, all incorporated herein by reference.
The Goetbal, et al. protocol has as input private vectors x, y, belonging to Bob and Alice respectively, and as output the scalar values a and b such that a+b=xTy. During an initialization phase. Bob generates a private and public key pair (sk, pk). Bob sends pk to Alice. Then, for each i ε {1, . . . , d}. Bob generates a new random string ri, and sends ci=En(pk; xi, ri) to Alice. In response, Alice sets
      z    ←                  ∏                  i          =          1                d            ⁢              c        i                  y          i                      ,generates a random plaintext b and a random nonce r′, and sends z′=z·En(pk; −b, r′) to Bob. Then, Bob determines a=De(sk; z′)=xTy−b. They describe a proof that the protocol is correct and secure. It can also be shown that the communication overhead is k′/τ, where k′ is the bit size of each encrypted message sent by Alice, and τ=└√{square root over (m/d)}┘ for a large m. In a homomorphic cryptosystem, typical values would be k′≈2048 and m≈21024, see Pascal Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” J. Stern, editor. Proceedings of Advances in Cryptology—EUROCRYPT '99, volume 1592 of Lecture Notes in Computer Science, pages 223-238,1999, and Ivan Damgard and Mads Jurik, “A generalization, simplification and some applications of Paillier's probabilistic public-key system,” Proceedings of the Intl. Workshop on Practice and Theory in Public Key Cryptography, volume 1992 of Lecture Notes in Computer Science, pages 119-136, 2001, all incorporated herein by reference.
Another protocol uses the technique of oblivious polynomial evaluation (OPE), see Moni Naor and Benny Pinkas, “Oblivious transfer and polynomial evaluation,” Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 245-254, 1999, incorporated herein by reference. The protocol uses an additive oracle, which computes every term xjyj of the dot product xTy and expresses the jth term as SAj+SBj. Alice and Bob receive SAj and SBj, respectively for all j. The result xTy is given by
            ∑              i        =        1            d        ⁢          S              A        i              +            ∑              i        =        1            d        ⁢                  S                  B          i                    .      Alice and Bob can implement the additive oracle using the OPE.
Canetti, et al. describe a protocol, which uses homomorphic encryption, for determining a weighted sum of a set of numbers from a data base. Feigenbaum, et al., describe a protocol to determine the L2 distance between two vectors, see Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. Strauss, and R. N. Wright, Secure multi-party computation of approximations,” Proceedings of the Intl. Colloquium on Automata, Languages and Programming, volume 2076 of Lecture Notes in Computer Science, pages 927-938, 2001, all incorporated herein by reference. Both of those protocols can be used to determine dot-products securely.
Algebraic Protocols
Algebraic protocols can also be used for determining secure dot products, However, most of these protocols leak some information, see W. Du and M, J. Atallah, “Privacy-preserving cooperative statistical analysis,” Proceedings of the 17th Annual Computer Security Applications Conference, December 2001; W. Du and Z. Zhan, “A practical approach to solve secure multi-party computation problems,” Proceedings of New Security Paradigms Workshop, Sep. 23-26 2002; I. Ioannidis, A, Grama, and M. Atallah, “A secure protocol for computing dot-products in clustered and distributed environments,” Proceedings of the Intl. Conf. on Parallel Processing, 2002; P. Ravikuniar, W. W, Cohen, and S, E. Fienberg, “A secure protocol for computing string distance metrics,” Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, pages 40-46, Brighton, UK, 2004; and J. Vaidya and C. Clifton, “Privacy preserving association rule mining in vertically partitioned data,” Proceedings of the Intl. Conf on Knowledge Discovery and Data Mining. The properties and weaknesses of some of these protocols have been analyzed, see E. Kiltz, G, Leander, and J. Malone-Lee, “Secure computation of the mean and related statistics,” Proceedings of the Theory of Cryptography Conference, volume 3378 of Lecture Notes in Computer Science, pages 283-302, 2005 and B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On private scalar product computation for privacy-preserving data mining,” C. Park and S. Chee, editors, Intl., Conference on Information Security and Cryptology, volume 2506 of Lecture Notes in Computer Science, pages 104-120, 2004, all incorporated herein by reference.
The basic idea in the protocols of Du, et al., is to express the vector x as a sum or M random vectors, i.e.,
      ∑          m      =      1        M    ⁢            u      m        .  For every m, Alice sends um concealed in a set of k random vectors. Bob computes the dot product with y for all the vectors, and adds a random number rm to all products. The results are sent to Alice. Alice and Bob repeat this M times. Finally, Alice and Bob have
                    ∑                  m          =          1                M            ⁢                        (                                                    u                m                t                            ⁢              y                        +                          r              m                                )                ⁢                                  ⁢        and              ⁢                  -                  ∑                  m          =          1                M            ⁢              r        m              ,respectively, which when added gives the required result.
Du, et al. use random permutations, Let π(x) be a vector whose elements are a random permutations of elements of x. The product π(x)Tπ(y) is equal to xTy. Bob expresses y as the sum of M random vectors, i.e.,
  y  =            ∑              m        =        1            M        ⁢                  v        m            .      Bob generates m random vectors ri and m random permutations πm. For each m, Alice learns πm(x+rm) without learning πm or rm, using a homomorphic encryption scheme as in Du, et al. Bob sends πm(vm) to Alice, and Alice computes πm(x+rm)tπm(vm). Finally, Alice has
            ∑              m        =        1            M        ⁢          (                                    x            t                    ⁢                      v            m                          +                              r            m            t                    ⁢                      v            m                              )        ,and Bob has
  -            ∑              m        =        1            M        ⁢          (                        r          m          t                ⁢                  v          m                    )      which together form the result. Alice's chances of successfully guessing all elements in y is
            (              1                  d          !                    )        M    .
Another protocol assumes that the number of elements d in the data vector x is even, A vector x1 is defined as d/2 dimension vector including the first d/2 elements of the vector x, and the vector x2 includes the last, d/2 elements of x, and xTy=xT1y1+xT2y2. Alice and Bob jointly generate a random invertible d×d matrix M. Alice computes x′=xTM, which is partitioned into x′1 and x′2, and sends x′2 to Bob. Bob computes y′=M−1y, splits it as y′1 and y′2 and sends y′1 to Alice. Alice computes x′1y′1 and Bob computes x+2y′2 so that their sum is the required result.
Other algebraic protocols for determining dot products are also known, see I. Ioannidis, A. Grama, and M. Atallah, “A secure protocol for computing dot-products in clustered and distributed environments,” Proceedings of the Intl. Conf. on Parallel Processing, 2002; J. Vaidya and C. Clifton, “Privacy preserving association rule mining in vertically partitioned data,” Proceedings of the Intl. Conf. on Knowledge Discovery and Data Mining, 2002, and P. Ravikumar, W, W. Cohen, and S. E. Fienberg, “A secure protocol for computing string distance metrics,” Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, pages 40-46, 2004, all incorporated herein by reference.
Classifiers
Data classification is well known. If the data have a multivariate distribution, then the classifier typically uses multivariate Gaussian distributions. Each class can be modeled by either a single multivariate Gaussian distribution, or a mixture of multivariate Gaussian distributions.
The classifier determines a value of a discriminant functiongi(x)=ln p(x|ω1)+ln P(ωi)for all classes ωi and assign the data x to class ωi, if gi(x)>gj(x) for all j≠i. Here, p(x|ωi) is a class-conditional probability density function, and P(ωi) is an a priori probability of class ωi.Single Multivariate Gaussian Distribution
The mean vector and covariance matrix of the ith class are μi and Σi, respectively. Hence, p(x|ωi)˜(μi, Σi), and the above Equation can be expressed as:
            g      i        ⁡          (      x      )        =                    -                  1          2                    ⁢                        (                      x            -                          μ              i                                )                t            ⁢                        ∑          i                      -            1                          ⁢                  (                      x            -                          μ              i                                )                      -                  d        2            ⁢      ln      ⁢                          ⁢      2      ⁢      π        -                  1        2            ⁢      ln      ⁢                          ⁢                                ∑          i                              +          ln      ⁢                          ⁢                        P          ⁡                      (                          ω              i                        )                          .            If the term (d/2) ln 2π term is ignored, then this can be rewritten as:gi(x)=x′ Wix+ w′ix+wi0,where
                                                        W              _                        i                    =                                    -                              1                2                                      ⁢                          ∑              i                              -                1                                                    ⁢                                  ,                                                                                    w                _                            i                        =                                          ∑                i                                  -                  1                                            ⁢                                                          ⁢                              μ                i                                              ,          and                ⁢                                                                    w                      i            ⁢                                                  ⁢            0                          =                                            -                              1                2                                      ⁢                          μ              i              t                        ⁢                                          ∑                i                                  -                  1                                            ⁢                                                          ⁢                              μ                i                                              -                                    1              2                        ⁢            ln            ⁢                                                        ∑                i                            ⁢                                                                                          +                      ln            ⁢                                                  ⁢                                          P                ⁡                                  (                                      ω                    i                                    )                                            .                                          
Vectors of (d+1)-dimensions, x and wi, can be generated by appending the value 1 to the vector x, and appending an element wi0 to the vector wi. By changing the matrix Wi into a (d+1)×(d+1) matrix Wi, where the first d components of the last row are zeros and the last column is equal to wit, gi(x)=xt Wix+ witx+wi0, can be written as:gi(x)= xtWi x. 
Expressing x as x, the above Equation becomes:gi(x)=xtWix. Mixture of Distributions Gaussian
The mean vector and covariance matrix of the jth Gaussian distribution in class ωi are μij and Σij, respectively. Hence,
            p      ⁡              (                  x          ❘                      ω            i                          )              =                  ∑                  j          =          1                          J          i                    ⁢                        α          ij                ⁢                  ℵ          (                                    µ              ij                        ,                          ∑              ij                                ⁢                                          )                      ,where Ji is the number of Gaussian distributions describing class ωi, and αij are the coefficients of the mixture of Gaussian distributions.
A log likelihood for the jth Gaussian distribution in the ith class is given by
                              l          ij                ⁡                  (          x          )                    =                                    x            t                    ⁢                                    W              _                        ij                    ⁢          x                +                                            w              _                        ij            t                    ⁢          x                +                  w          ij                      ,                  ⁢    where                                W          _                ij            =                        -                      1            2                          ⁢                  ∑          ij                      -            1                                ⁢                  ,                            w          _                ij            =                        ∑          ij                      -            1                          ⁢                                  ⁢                  μ          ij                      ,                  ⁢    and              w      ij        =                            -                      1            2                          ⁢                  μ          ij          t                ⁢                              ∑            ij                          -              1                                ⁢                                          ⁢                      μ            ij                              -                        1          2                ⁢        ln        ⁢                                                        ∑              ij                        ⁢                                                                      .                    
Expressing x as a (d+1)-dimensional vector, and Wij, wij, wij together as the (d+1)×(d+1) matrix Wij, as described before, the above Equation becomes:lij(x)=xtWijx. 
Hence, the discriminant function for the ith class can be written asgi(x)=log sum(ln αi1+li1(x), . . . , ln αiJi+liJi(x))+ln P(ωi).where
                              log          ⁢          sum                ⁡                  (                                    x              1                        ,            …            ⁢                                                  ,                          x                              J                i                                              )                    =                        max          ⁡                      (                                          x                1                            ,              …              ⁢                                                          ,                              x                                  J                  i                                                      )                          +                  ln          ⁡                      (                                          ∑                                  j                  =                  1                                                  J                  i                                            ⁢                                                          ⁢                              ⅇ                                  Δ                  j                                                      )                                ,                  ⁢    and              Δ      j        =                  x        j            -                        max          ⁡                      (                                          x                1                            ,              …              ,                              x                                  J                  i                                                      )                          ⁢                                  ⁢                              ∀            j                    ⁢                      ∈                                          {                                  1                  ,                  …                  ⁢                                                                          ,                                      J                    i                                                  }                            .                                          
Hereinafter, the data vector x denotes a (d+1)-dimensional vector with a last component equal to 1, unless stated otherwise.
Many conventional speech recognizer typically use hidden Markov models (HMMs) and mixtures of Gaussian distributions. It is desired to train such a speech recognizer in a secure manner. That is, the speech supplies by a client for the training remains secret, and the HMMs remain secret as well. After the recognizer is trained, it is also desired to evaluate the performance of the recognizer as it: is being trained, and to use the trained recognizer to recognize speech in a secure manner. The recognized speech is called a transcription.