Probabilistic inference based on Hidden Markov Models (HMMs) is common in machine learning, speech processing, and gene sequence analysis. Probabilistic inference with privacy constraints is a relatively unexplored area of research and applies to a multi-party scenario in which the data and the HMMs belong to different parties and cannot be shared. For example, a client computer (Alice) needs to analyze speech data from telephone calls. Alice outsources the speech recognition task to a server computer (Bob), who possesses the HMMs obtained from a trained database. Alice cannot share the speech data with Bob owing to privacy concerns, while Bob cannot disclose the HMM parameters, which can reveal information about the training database.
One method for secure inference via HMMs is based on privacy-preserving two-party maximization methods, in which both parties incur exactly the same protocol overhead. However, that method is not suited for applications where a thin client encrypts the data and transmits the encrypted data to the server for performing most of the computationally intensive tasks.
HMM and Three Basic Problems of HMMs
The HMM is a generalization of a Markov chain, in which a state of the HMM is not directly known but generates an output which can be analyzed. The outputs are also referred to as “observations.” Because the observations depend on a hidden state of the HMM, the observation can reveal information about the hidden state.
The HMM λ is a triple of parameters λ=(A, B, Π). A matrix A, A=(aij), is a state transition matrix, aij is a transition probability from a state Si to a state Si, wherein 1≦i, j ≦N, and N is a number of states of the HMM aij=Pr{qt+1=Sj|qt=Si}, 1≦i, j≦N, where {S1, S2, . . . , SN} is a set of states, and qt is the state at time t, and Pr is the joint probability.
A matrix B, B=(b1, b2, . . . , bN), is a matrix of probabilities of observations, bj is a column vector of the matrix of probabilities over a known alphabet of the observation sequence, j=1, 2, . . . , N. Thus, bj(vk)=Pr{xt=vk|qt=Sj}, 1≦j≦N, 1≦k≦M, where {v1, v2, . . . , vM} is the alphabet of observation symbols, and xt is the observation at time t. A vector Π, Π=(π1, π2, . . . , πN), is an initial state probability vector of the HMM, wherein πi=Pr{q1=Si}.
For the observation x1, x2, . . . , xT and the HMM λ=(A, B, Π), one problem is to determine the probability of the observation sequence with respect to the HMM, i.e., Pr{x1, x2, . . . , xT|λ}. Solutions for this problem in unsecure domain include a forward algorithm and a backward algorithm.
In statistical parsing, e.g., gene sequence analysis and natural language processing, a main problem is to determine a most likely sequence of states corresponding to the observation sequence with respect to the HMM. The problem is to efficiently compute the joint probability Pr{q1, q2, . . . , qT, x1, x2, . . . , xT|λ} for the HMM λ=(A, B, Π). The problem is usually solved in the unsecure domain by a Viterbi algorithm.
Another problem is to determine parameters of the HMM based on the observation sequence. One solution to this problem in the unsecure domain includes the Baum-Welch algorithm.
Forward Algorithm
A joint probability of an observation sequence in state Sj at time t isαt(j)=Pr{x1,x2, . . . , xt, qt=Sj|λ}.  (1)
The forward algorithm in the unsecure domain includes the following steps.                1. initializing α1(j)=πjbj(x1), 1≦j≦N;        2. determining, for each state Sj, 1≦j≦N, and for all observations t, 1≦t≦T−1, a likelihood of the observation sequence according to        
                                          3.            ⁢                                                  ⁢                                          α                                  t                  +                  1                                            ⁡                              (                j                )                                              =                                    [                                                ∑                                      i                    =                    1                                    N                                ⁢                                                                            α                      t                                        ⁡                                          (                      i                      )                                                        ⁢                                      a                    ij                                                              ]                        ⁢                                          b                j                            ⁡                              (                                  x                                      t                    +                    1                                                  )                                                    ;                            (        2        )                            4. determining the probability according to        5. Pr{x1, x2, . . . , xT|λ}=Σj=1NαT(j).        
Backward Algorithm
A backward probability is defined according toβt(j)=Pr{xt+1,xt+2, . . . , xT|qt=Sj,λ}.  (3)
The backward algorithm in the unsecure domain includes:                1. initializing βT(j)=1, 1≦j≦N;        2. for each 1≦i≦N and for all 1≦t≦T−1, determining        
                                          3.            ⁢                                                  ⁢                                          β                t                            ⁡                              (                i                )                                              =                                    ∑                              j                =                1                            N                        ⁢                                                            β                                      t                    +                    1                                                  ⁡                                  (                  j                  )                                            ⁢                              a                ij                            ⁢                                                b                  j                                ⁡                                  (                                      x                                          t                      +                      1                                                        )                                                                    ;                            (        4        )            and                4. determining the probability according to        5. Pr{x1, x2, . . . , xT|λ}=Σj=1Nπjbj(x1)β1(j).        
Viterbi Algorithm
A probability of most probable state sequence ending in the state Sj for the observation sequence at time t is determined according to
                                                        δ              t                        ⁡                          (              j              )                                =                                    max                                                q                  1                                ,                                  q                  2                                ,                                                                  ⁢                …                ⁢                                                                  ,                                                                  ⁢                                  q                                      t                    -                    1                                                                        ⁢                          Pr              ⁢                              {                                                      q                    1                                    ,                                      q                    2                                    ,                  …                  ⁢                                                                          ,                                      q                                          t                      -                      1                                                        ,                                                            q                      t                                        =                                          S                      j                                                        ,                                      x                    1                                    ,                                      x                    2                                    ,                  …                  ⁢                                                                          ,                                                            x                      t                                        |                    λ                                                  }                                                    ,                            (        5        )            where max is a function of a maximum value.
The Viterbi algorithm in the unsecure domain includes the following steps:                1. Initializing, for all 1≦j≦N, most probable state sequence according to δ1(j)=πjbj(x1) and initializing a matrix of indexes of probable states as φ1(j)=0;        
Determining a probability of the most probable state sequence ending in a state Sj for a next time t+1 and the matrix of indexes according to
                                          δ                          t              +              1                                ⁡                      (            j            )                          =                              max                                          i                =                1                            ,                                                          ⁢              …              ⁢                                                          ,                                                          ⁢              N                                ⁢                                    {                                                                    δ                    t                                    ⁡                                      (                    i                    )                                                  ⁢                                  a                  ij                                            }                        ⁢                                          b                j                            ⁡                              (                                  x                  t                                )                                                                        (        6        )                                                      ϕ                          t              +              1                                ⁡                      (            j            )                          =                  arg          ⁢                                    max                                                i                  =                  1                                ,                                                                  ⁢                …                ⁢                                                                  ,                                                                  ⁢                N                                      ⁢                          {                                                                    δ                    t                                    ⁡                                      (                    i                    )                                                  ⁢                                  a                  ij                                            }                                                          (        7        )            for all 1≦j≦N and at each 1≦t≦T−1.
Determining an index of the most likely final state according to i*T=argmaxi=1, . . . , N{δT(i)}, and backtracking the indexes i*t=φt+1(i*t+1), for t=1, 2, . . . , T−1.
Determining the most probable state sequence Si*1, Si*2, . . . , Si*T.
Baum-Welch Algorithm
The Baum-Welch algorithm, also known as forward-backward algorithm, estimates the optimal HMM parameters for a given observation sequence x1, x2, . . . , xT, and maximizes the probability of observation over all hidden Markov models, maxλPr{x1, x2, . . . , xT|λ}.
For a given HMM λ=(A, B, Π), the probability of being in the state Si at time t and being in the state Sj at the time t+1, is defined as a conditional probability ζt(i, j) according toζt(i,j)=Pr{qt=Si,qt+1=Sj|x1,x2, . . . , xT,λ},  (8)which is equal to
                                          ζ            t                    ⁡                      (                          i              ,              j                        )                          =                                            Pr              ⁢                              {                                                                            q                      t                                        =                                          S                      i                                                        ,                                                            q                                              t                        +                        1                                                              =                                          S                      j                                                        ,                                      x                    1                                    ,                                      x                    2                                    ,                  …                  ⁢                                                                          ,                                                            x                      T                                        |                    λ                                                  }                                                    Pr              ⁢                              {                                                      x                    1                                    ,                                      x                    2                                    ,                  …                  ⁢                                                                          ,                                                            x                      T                                        |                    λ                                                  }                                              .                                    (        9        )            
By employing the notations of αt(i) and βt(i) defined in Equation (1) and Equation (3), a conditional probability ζt(i, j) is
                                          ζ            t                    ⁡                      (                          i              ,              j                        )                          =                                                                              α                  t                                ⁡                                  (                  i                  )                                            ⁢                              a                ij                            ⁢                                                β                                      t                    +                    1                                                  ⁡                                  (                  j                  )                                            ⁢                                                b                  j                                ⁡                                  (                                      x                                          t                      +                      1                                                        )                                                                    Pr              ⁢                              {                                                      x                    1                                    ,                                      x                    2                                    ,                  …                  ⁢                                                                          ,                                                            x                      T                                        |                    λ                                                  }                                              .                                    (        10        )            
A total conditional probability γt(i) of being in the state Si at time t is determined according to
                    γ        t            ⁡              (        i        )              =                            ∑                      j            =            1                    N                ⁢                                            ζ              t                        ⁡                          (                              i                ,                j                            )                                ⁢                                          ⁢          for          ⁢                                          ⁢          all          ⁢                                          ⁢          1                    ≤      i      ≤      N        ,      1    ≤    t    ≤          T      .      
The Baum-Welch Algorithm updates, in the unsecure domain, the HMM λ as follows:                1. Initializing a HMM λ=(A, B, Π) randomly.        2. Determining an initial state probability vector based on an initial probability of the state according to3. πi=γ1(i), 1≦i≦N;   (11)        4. Determining transition probability according to        
                                          5.            ⁢                                                  ⁢                                          a                _                            ij                                =                                                    ∑                                  t                  =                  1                                                  T                  -                  1                                            ⁢                                                ζ                  t                                ⁡                                  (                                      i                    ,                    j                                    )                                                                                    ∑                                  t                  =                  1                                                  T                  -                  1                                            ⁢                                                γ                  t                                ⁡                                  (                  i                  )                                                                    ,                  1          ≤          i                ,                              j            ≤            N                    ;                                    (        12        )            
Determining, based on observation symbol of an alphabet vk, probabilities of observations according to
                                                                        b                _                            j                        ⁡                          (                              v                k                            )                                =                                                    ∑                                                      t                    =                    1                                    ,                                                            x                      t                                        =                                          v                      k                                                                      T                            ⁢                                                γ                  t                                ⁡                                  (                  j                  )                                                                                    ∑                                  t                  =                  1                                T                            ⁢                                                γ                  t                                ⁡                                  (                  j                  )                                                                    ,                  1          ≤          j          ≤          N                ,                  1          ≤          k          ≤                      M            .                                              (        13        )            
Determining the probability of the observation sequence with respect to updated HMM according to                1. Pr{x1, x2, . . . , xT| λ} where λ=(Ā, B, Π); andIf Pr{x1, x2, . . . , xT| λ}−Pr{x1, x2, . . . , xT|λ}≦D, where D is a pre-described threshold, then stop and select the parameters of the HMM as final parameters. Otherwise, update the HMM λ with the HMM λ and go back to step 2.        
Accordingly, there is a need in the art to determine the forward, the backward, the Viterbi and the Baum-Welch algorithms in a secure domain.