James Hensman’s Weblog

February 12, 2009

Expected Values (Matrices)

Filed under: Uncategorized — jameshensman @ 9:41 am

I’ve been fiddling around in python, implementing a Variational Bayesian Principal Component Analysis (VBPCA) algorithm. It’s true, I could do this happily in vibes , but where’s the fun in that? Besides, I think I learned more doing it this way.

Happily, there’s a nice paper by Chris Bishop, which explains clearly what’s going on, and gives you the update equations. For example:

{\bf m_x}^{(n)} = \langle \tau \rangle {\bf \Sigma_x} \langle {\bf W}^\top \rangle (t_n - \langle {\mathbf \mu} \rangle)

which involves taking the expected value of \tau, {\bf W} and \mathbf \mu. These three are straightforward:

The distribution for \tau is p(\tau) = \textit{Gamma}(\tau \mid a,b), and so the expected value of \tau is a/b.

The distribution for \mu is p(\mu) = \mathcal{N}(\mu \mid {\bf m_\mu},{\bf \Sigma_\mu}), and so the expected value of \mu is simply {\bf m_\mu}.

The distribution for \bf W is a slightly more complex beast: \bf W is a non-square matrix, and we have a Gaussian distribution for each row. That is: p({\bf W}) = \prod_{i=1}^d \mathcal{N}({\bf w}_i \mid {\bf m}_w^{(i)}, {\bf \Sigma}_w). Bizarrely, the rows share a covariance matrix. The expected value of {\bf W} is simple: is just a matrix made of a stack of {\bf m}_w^{(i)}s.

The tricky bit comes in a different update equation, where we need to evaluate \langle {\bf W}^\top {\bf W} \rangle . The first thing to notice is that (where {\bf W} is a d by q matrix):

{\bf W^\top W} = \sum_{i=1}^d {\bf w}_i {\bf w}_i^\top.

Since p({\bf w}_i) is Gaussian, \langle {\bf w}_i {\bf w}_i^\top \rangle = {\bf m}_i{\bf m}_i^\top + {\bf \Sigma}_w. Now we can write:

\langle {\bf W^\top W} \rangle = \langle \sum_{i=1}^d {\bf w}_i {\bf w}_i^\top \rangle =  \sum_{i=1}^d \langle {\bf w}_i {\bf w}_i^\top \rangle = \sum_{i=1}^d \left({\bf m}_i{\bf m}_i^\top + {\bf \Sigma}_w \right)

Simple when you know how.

Edit: Anyone know how to get a bold \mu? I’ve tried {\bf \mu} and \mathbf{\mu}.


February 4, 2009

Back to school for me…

Filed under: Uncategorized — jameshensman @ 6:32 pm

I’ve been having a preliminary flick through Andrew Gelman’s book, (amazon) which so far seems excellent. I thought I’d have a shot at the questions in the introductory chapter.

First question. Easy, no problem.

Second question. Ooh, bit trickier. Got there in the end.

Third question. This has taken me 40 minutes, which seems like justification for posting a solution on this ‘ere weblog.

so the question is asked:

Suppose that in each individual in a population there is a pair of genes, each of which can be either X or x, that controls eye colour: those with xx have blue eyes, those with XX or Xx or xX have brown eyes. Those with Xx are known as heterozygotes.

The proportion of individuals with blue eyes (xx) is a^2, and the proportion of heterozygotes is 2a(1-a).

Each parent transmits one gene to the child: if the parent is a heterozygote, the probability that they transmit X is 0.5. Assuming random mating, show that amongst brown eyed parents with brown eyed children, the proportion of heterozygotes is 2a/(1+2a).

Okay says I, it’s just Bayes’ rule, no? Let’s denote all heterozygotes as Xx (this should save significant keypresses…), children as ch and parents as pa.

Under Bayes rule we need a likelihood p(\text{ch}=Xx | \text{pa}), prior p(\text{pa}) and a ‘marginal likelihood’ term, which we get by summing the above. So we’re going to get something like:

p(\text{ch}=Xx) = \frac{p(\text{ch}=Xx | \text{pa}) p(\text{pa})} {\sum_\text{ch} p(\text{ch}=Xx | \text{pa}) p(\text{pa})}

We also need to make sure that we only consider brown eyed individuals (XX, Xx, and xX. not xx). let’s have a look at the priors:

p(xx) = a^2
p(Xx) = 2a(1-a)
and a little manipulation yeilds:
p(XX) = (1-a)^2

So looking at combinations of parents who have brown eyes:

p(\text{pa}=XX,XX) = (1-a)^4
p(\text{pa}=XX,Xx) = 2*2a(1-a)*(1-a)^2
p(\text{pa}=Xx,xx) = 4a^2(1-a)^2

Each of which can be considered a prior, with correspondings likelihoods (of the child being Xx):

p(\text{ch}=Xx | \text{pa}=XX,XX) = 0
p(\text{ch}=Xx | \text{pa}=XX,Xx) = 0.5
p(\text{ch}=Xx | \text{pa}=Xx,xx) = 0.5

Now, the top line of Thomas Bayes’ famous rule looks like:

0.5*4a(1-a)^3 + 0.5*4a^2(1-a)^2

remembering that the bottom line must consist of all brown eyed children of brown eyed parents (not just the heterozygotes), the bottom line looks like:

(1-a)^4 + 4a(1-a)^3 + 0.75*4a^2(1-a)^2

Phew. Cancelling a few terms does indeed leave

p(\text{ch}=Xx | \text{pa}=(Xx,XX),(XX,XX),(Xx,Xx)) = \frac{2a}{1+2a}

Not going to school for years switches your brain off: It took me as long to figure out how to do this as it did to blog it…

There’s actually a second part to the problem, (first set by Lindley, 1965, apparently), which turns out to have some rather messy terms in. I think I’ll save blogging that for another day.

Blog at WordPress.com.