James Hensman’s Weblog

January 28, 2009

Some notes on Factor Analysis (FA)

Filed under: Uncategorized — jameshensman @ 9:21 am

Factor analysis is a statistical technique which can uncover (linear) latent structures in a set of data.  It is very similar to PCA, but with a different noise model.  The model conssits of a set of observed parameters $\{\bf{x}_n \}_{n=1}^N$ (this is your collected data), some latent paramenters $\{\bf{z}_n \}_{n=1}^N$, with distribution $p(\bf{z_n}) = \mathcal{N}(\bf{0}, \bf{I})$.  There exists a noisy linear map $\bf{A}$ from the latent space to the observed variables: $p(\bf{x}_n) = \mathcal{N}\left(\bf{Az}_n, \bf{\Psi}\right)$, where $\bf{\Psi}$ is a diagonal matrix.

It’s also possible to assume a mean vector for the observed variables, but I’m going to ignore that for a moment for clarity.

Some simple algebra yields $p(\bf{x_n}) = \mathcal{N}\left(\bf{0}, \bf{AA}^\top + \bf{\Psi}\right)$. It should now be clear that we are modeling the distribution of $\bf{X}$ as a Gaussian distribution with limited degrees of freedom.

Variational Approach

The variational approach involved placing conjugate priors over the model parameters ($\bf{A}$ and $\bf{\Psi}$), and finding a factorised approximation to the posterior.

The variational approach to FA yields a distinct advantage: by placing an ARD prior over the columns of $A$, unnecessary components get ‘switched off’, and the corresponding column of A goes to zero. This is dead useful: you don’t need to know the dimension of the latent space beforehand: it just drops out of the model.

Papers
There is a super Masters thesis on variational FA here by a chap called Frederik Brink Nielsen.

An immediate extension of factor analysis springs to mind: if the distribution of the data is just a Gaussian, why not have a mixture of them? It seems Ghahramani was there first: GIYF

More recently, Zhao and Yu proposed an alteration to the Variational FA model, which apparently achieves a tighter bound by making $\bf{A}$ dependent on $\bf{\Psi}$. this apparently makes the model less prone to under-fitting (i.e. it drops factors more easily). Neural Networks Journal