Expectancy Maximization Algorithm
General Algorithm
Suppose we have a latent variable model
p(x,z;θ) with
z being the latent variable. The density of
x can be obtained using marginal probability over
z:
p(x;θ)=z∑p(x,z;θ) Now to maximize likelihood, we need to maximize the following function:
ℓ(θ)=i=1∑nlogp(x(i);θ) =i=1∑nlogz(i)∑p(x(i);z(i);θ) However, the above equation is not concave with respect to
θ. Hence, we can not use gradient ascent to find the maximum likelihood estimate.
Moreover, if
p(x;z;θ) is an exponential family distribution, then taking the derivative of the above equation with respect to
θ doesn't typically lead to a solvable expression.
To get around this, we imagine
Q(z) to be some distribution over
z so that
∑zQ(z)=1 and
Q(z)≥0Jensen's Inequality
Jensen's inequality states that for a convex function
[f′′(x)≥0] the following inequality holds:
E[f(X)]≥f(E[X]). And for concave functions:
E[f(X)]≤f(E[X]). Also, if
X is a constant then
X=E[X] and:
E[f(X)]=f(E[X]). Using Jensen's inequality, we can see that the following inequality holds:
logp(x;θ)=logz∑Q(z)Q(z)p(x,z;θ) ≥z∑[Q(z)logQ(z)p(x,z;θ)] From Jensen's inequality, we also know that our inequality becomes an equality (the bound becomes right) when the expectation is over a constant:
Q(z)p(x,z;θ)=c It turns out that this bound is tight when:
Q(z)=p(z∣x;θ) For convenience, let's define Evidence Lower Bound (ELBO) as:
ELBO(x;Q,θ)=Ez∼Q(z)[logQ(z)p(x,z;θ)]=z∑[Q(z)logQ(z)p(x,z;θ)] Therefore,
logp(x;θ)≥ELBO(x;Q,θ) Repeat until convergence { (E-step) For each i, set { Qi(z(i))←p(z(i)∣x(i);θ) (M-step) Set { θ←argθmaxi=1∑nELBO(x(i);Qi,θ) In the
E step we find the estimated distributions over
z(i) using our current value of
θ and thus our current estimated distribution over
x(i) given
z(i). We do so using the Bayes' rule:
p(z(i)=j∣x(i);θ)=∑l=1kp(x(i)∣z(i)=l;θ)p(z(i)=l;θ)p(x(i)∣z(i)=j;θ)p(z(i)=j;θ) In the
M step we update the value of
θ to maximizes the
ELBO for which it is equal to
logp(x;θ) for our current values of
θ.