#################
Lecture 8 Reading
#################
**Read Jones & Pevzner sections 11.1 - 11.3, and the following material.**
Markov Processes
----------------
We first define Markov processes in terms of a sequence of
observable variables :math:`X_1,X_2,X_3,...X_n`, as a
*Markov chain*. We then introduce the problem of infering
a sequence of hidden variables from an associated sequence
of observable variables; this is called a *hidden Markov model*,
commonly abbreviated as "HMM".
The Markov Property
...................
Recall that the general conditional probability chain rule
always permits us to factor a joint probability
:math:`p(X_1,X_2,X_3,...X_n)` into :math:`n` factors of
the form :math:`p(X_t|X_1,X_2,...,X_{t-1})` where term :math:`t`
of the factorization depends on *all* of the variables that
preceded it in the factorization. You could say that each variable
"remembers" *every* previous variable, in the sense that it
depends directly on all previous variables.
The *Markov property* is when the variables in a process each
depend only a *fixed* number of other variables (by default, only
one). In other words, the joint probability can be written
as the product of a set of conditional probabilities
:math:`p(X_u|X_t)`. This is sometimes called a "memoryless"
process to reflect the fact that this conditioning has no
direct memory of the rest of the preceding variables.
Markov Chains
.............
The simplest version of such a process is simply a linear
chain of variables that obey the Markov property:
.. math:: p(X_1,X_2,X_3,...X_n)=p(X_1)p(X_2|X_1)p(X_3|X_2)...p(X_n|X_{n-1})
This is called a *Markov chain*.
It follows that any pair of variables :math:`X_t,X_v` separated by at least
one intervening variable (i.e. :math:`v>t+1`) are conditionally
independent given any variable :math:`X_u` between them
(i.e. :math:`t