Assignment 3 of Pattern recognition is on HMMs. It should contain a detailed report on HMMs. The topics covered should include:

1. An introduction to HMM and its uses.

1. Problems of HMM, their explanation and relation to prior, posterior, and evidence.

2. Solution to the problems of HMM and their algorithms.

__Pattern Recognition__

__Assignment # 3__

Name: Muhammad Sohaib Jamal

__An Introduction to HMM and it’s Uses__

A Hidden Markov Model HMM is a stochastic model which has a series of observable variable X which is generated by hidden state Y. In an indirect way HMM consist of hidden states which has output that is comprised of a set of observations. Simple Markov Model models the states are directly observables means the states are directly output while in HMM the states are hidden and different form the observables or output. HMM is very reliable model for probabilistic estimation. HMM have applications in pattern recognitions such as speech recognition, gesture and hand writing recognition, computational Bioinformatics, etc.

Suppose we are considering three trails of a coin toss experiment and the person who is observing only know the results of the experiment when another person announces the result who is hidden in a closed room from the person noting the results. The result of this coin experiment can be any set of heads and tails e.g. THT, HHH, THH, TTT, THT etc. The person observing the results can get any sequence of heads and tails, and it is not possible to predict any specific sequence that will occur. The Observation Set is completely unpredictable and random.

- Let’s assume that the third trail of coin toss experiment will produce more Head than the Tails. The resulting sequence will obviously have more number of heads then tails for this particular case. This is called “Emission probability” denoted by B
_{j}(O).

- Now we suppose that the chance of flipping the third trail after the first and second trail is approximately zero. Then, the transition from 1
^{st}and 2^{nd}trail to 3^{rd}trail will be actually very small and as an outcome yields very little number heads if the person starts flipping the coin from 2^{nd}trail to 3^{rd}trail. This is called “Transition probability” denoted by a_{ij}.

- Assume that each trail has some probability associated with the previous trail, then the person will start the process of flipping from that particular coin. This is known to be the “Initial probability” denoted by Ï€
_{i.}

The sequence of number of heads or tails is known to be the observables and the number of trail is said to be the state of the HMM.

HMM is composed of:

- N number of hidden states S
_{1}, S_{2}â€¦â€¦â€¦., S_{N}

- M number of observations O
_{1}, O_{2}, â€¦â€¦, O_{M}

- The Ï€
_{i }(Initial state probability)

- Output Probability or Emission Probability B: P (O
_{M}| S_{N}), where O_{M}is observation and S_{N}is the state.

- Transition probability matrix A = [ a
_{ij}].

- Transition probabilities a
_{ij}.

- Mathematically the model is represented as HMM Î» = {Î , A, B}

__Problems of HMM and their explanations__

HMM has three basic types of problems:

**The Evaluation problem:**

Suppose we have an HMM, complete with transition probabilities a_{ij} and output probabilities b_{jk}. We need to determine the probability that a particular sequence of observables states O_{T} was generated by that model.

**The Decoding problem:**

The transition probabilities, output probabilities and set of observations O_{T} is given and we want to determine the most likely sequence of hidden states S_{T} that led to those observations.

**The Learning problem:**

In such problem the number of states and observation are given but we need to find the probabilities a_{ij} and b_{jk}. With the given set of training observations, we will determine the probabilities a_{ij} and b_{jk}.

__Relation of HMM to Prior, Posterior and evidence__

The Ï€i (Initial state probability) is analogous to the Prior probability. Because the initial probability is given before the set of experiments take place. This property of initial probability is identical to that of prior probability.

Similarly, the output probability or emission probability B: P (O_{M} | S_{N}) is analogous to the posterior probability. The posterior probability is used in forward backward algorithm.

In the same manner, evidence is the probability the next state is C given that the current state is state S_{j}. So the evidence is analogous to the transition probability A.

__Solution to the problems of HMM and their algorithms__

From the above mentioned discussion, we know that there are three different of problems in HMM. In this section we will briefly know how these problems are solved

**Evaluation problem**, this type of problem is solved the using Forward-Backward algorithm.

**Decoding problem**, for such type of HMM problem we use the Viterbi algorithm or posterior decoding

**Training problem**, in case of this type of problem we have the Baun-Welch re-estimation algorithm to solve it.

__Forward-Backward algorithm__

The forward and backward steps are combined by the Forward-Backward algorithm to estimate the probability of each state for a specific time t, and repeating these steps for each t can result in the sequence having the most likely probability. This algorithm doesn’t guarantee that the sequence is valid sequence because it considers every individual step.

The forward algorithm has the following three steps:

- Initialization step
- Iterations
- Summation of overall states

.

Similarly, for backward algorithm we have the same steps like the forward algorithm:

- Initialization step
- Iterations
- Summation of overall states

__Viterbi algorithm__

Viterbi algorithm is used to find the most likely hidden states, resulting in a sequence of observed events. The relationship between observations and states can be inferred from the given image.

In first step Viterbi algorithm initialize the variable

In second step the process is iterated for every step

In third step the iteration ends

In Fourth step we track the best path

__Baun-Welch re-estimation algorithm__

Baun-Welch re-estimation algorithm is used to compute the unknown parameters in hidden Markov model HMM. Baun-Welch re-estimation algorithm can be best described using the following example.

Assume we collect eggs from chicken every day. The chicken had lay eggs or not depends upon unknown factors. For simplicity assume that there are only 2 states (S1 and S2) that determine that the chicken had lay eggs. Initially we don’t know about the state, transition and probability that the chicken will lay egg given specific state. To find initial probabilities, suppose all the sequences starting with S1 and find the maximum probability and then repeat the same procedure for S2. Repeat these steps until the resulting probabilities converge. Mathematically it can be

__References__

- Andrew Ng (2013), an online course for Machine learning, Stanford University, Stanford, https://class.coursera.org/ml-004/class.

- Duda and Hart, Pattern Classification (2001-2002), Wiley, New York.