Factor analysis is a statistical technique which can uncover (linear) latent structures in a set of data. It is very similar to PCA, but with a different noise model. The model conssits of a set of observed parameters (this is your collected data), some latent paramenters , with distribution . There exists a noisy linear map from the latent space to the observed variables: , where is a diagonal matrix.

It’s also possible to assume a mean vector for the observed variables, but I’m going to ignore that for a moment for clarity.

Some simple algebra yields . It should now be clear that we are modeling the distribution of as a Gaussian distribution with limited degrees of freedom.

**Variational Approach**

The variational approach involved placing conjugate priors over the model parameters ( and ), and finding a factorised approximation to the posterior.

The variational approach to FA yields a distinct advantage: by placing an ARD prior over the columns of , unnecessary components get ‘switched off’, and the corresponding column of A goes to zero. This is dead useful: you don’t need to know the dimension of the latent space beforehand: it just drops out of the model.

** Papers **

There is a super Masters thesis on variational FA here by a chap called Frederik Brink Nielsen.

An immediate extension of factor analysis springs to mind: if the distribution of the data is just a Gaussian, why not have a *mixture* of them? It seems Ghahramani was there first: GIYF

More recently, Zhao and Yu proposed an alteration to the Variational FA model, which apparently achieves a tighter bound by making dependent on . this apparently makes the model less prone to under-fitting (i.e. it drops factors more easily). Neural Networks Journal