James Hensman’s Weblog

January 28, 2009

Some notes on Factor Analysis (FA)

Filed under: Uncategorized — jameshensman @ 9:21 am

Factor analysis is a statistical technique which can uncover (linear) latent structures in a set of data.  It is very similar to PCA, but with a different noise model.  The model conssits of a set of observed parameters \{\bf{x}_n \}_{n=1}^N (this is your collected data), some latent paramenters \{\bf{z}_n \}_{n=1}^N, with distribution p(\bf{z_n}) = \mathcal{N}(\bf{0}, \bf{I}).  There exists a noisy linear map \bf{A} from the latent space to the observed variables: p(\bf{x}_n) = \mathcal{N}\left(\bf{Az}_n, \bf{\Psi}\right), where \bf{\Psi} is a diagonal matrix.

It’s also possible to assume a mean vector for the observed variables, but I’m going to ignore that for a moment for clarity.

Some simple algebra yields p(\bf{x_n}) = \mathcal{N}\left(\bf{0}, \bf{AA}^\top + \bf{\Psi}\right). It should now be clear that we are modeling the distribution of \bf{X} as a Gaussian distribution with limited degrees of freedom.

Variational Approach

The variational approach involved placing conjugate priors over the model parameters (\bf{A} and \bf{\Psi}), and finding a factorised approximation to the posterior.

The variational approach to FA yields a distinct advantage: by placing an ARD prior over the columns of A, unnecessary components get ‘switched off’, and the corresponding column of A goes to zero. This is dead useful: you don’t need to know the dimension of the latent space beforehand: it just drops out of the model.

Papers
There is a super Masters thesis on variational FA here by a chap called Frederik Brink Nielsen.

An immediate extension of factor analysis springs to mind: if the distribution of the data is just a Gaussian, why not have a mixture of them? It seems Ghahramani was there first: GIYF

More recently, Zhao and Yu proposed an alteration to the Variational FA model, which apparently achieves a tighter bound by making \bf{A} dependent on \bf{\Psi}. this apparently makes the model less prone to under-fitting (i.e. it drops factors more easily). Neural Networks Journal

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: