Convergence of the MM Algorithm

Andrew Landgraf
November 18, 2014

Majorization Minimization

Goal minimize \( f(x) \)

Use surrogate function \( g(x|x_m) \), such that

\[ g(x_m | x_m) = f(x_m) \]

and

\[ g(x | x_m) \geq f(x) \]

Let

\[ M(x_m) = x_{m+1} = \text{argmin} ~ g(x |x_m) \]

MM and the EM Algorithm

The local and global convergence properties of [MM] exactly parallel the corresponding properties of the EM and EM gradient algorithms. This is hardly surprising because the relevant theory relies entirely on optimization transfer and never mentions missing data.

– “Optimization Transfer Using Surrogate Objective Functions” by Lange, Hunter, and Yang (2000)

Convergence of the EM Algorithm

From “On the convergence properties of the EM algorithm” by Wu (1983)

Under certain conditions:

All limit points of any MM sequence are stationary points of the loss function
If the loss is strictly convex, the limit point of the MM sequence is the unique minizer of the Loss

Counter-Example

counter

Multiple Limits

multiple

Reference

Properties of MM Algorithms on Convex Feasible Sets: Extended Version