import numpy as np #using arrays B = np.random.randn(2,2) # random array C = np.random.randn(2,2) A = np.dot(B.T, np.linalg.inv(C)) #using matrices B = np.mat(B) #cast as matrix C = np.mat(C) A = B.T*C.I

Despite the nicer syntax in the matrix version, I prefer to use arrays. This is in part because they play nicer with the rest of numpy, but mostly because they behave better in higher dimensions. This is really useful when you have to deal with lots of matrices, which seems to occur all the time in my work:

A = np.random.randn(100,2,2) # a hundred 2x2 arrays A2 = [np.mat(np.random.randn(2,2)) for i in range(100)] # a hundered 2x2 matrices

Transposing the list of matrices is easy: just use the .T operator. This doesn’t work for the 100x2x2 array though, since it switches the axis in a way we don;t want. The solution is to manually specify which axes to switch.

A2T = [a.T for a in A2] # matrix version AT = np.transpose(A,(0,2,1)) #array version 1 #or AT = np.rollaxis(A,-2,-1)# array version 2

Suppose you have a series of matrices which you want to (right) multiply by another matrix. This is messy with the matrix object, where you need to do list comprehension, but nice as pie with the array object.

#matrix version A = [np.mat(np.random.randn(2,2)) for i in range(100)] B = np.mat(np.random.randn(2,2)) AB = [a*B for a in A] #array version A = np.random.randn(100,2,2) B = np.random.randn(2,2) AB = np.dot(A,B)

Left-multiplication is a little harder, but possible using a transpose trick:

#matrix version BA = [Ba for a in A] #array version BA = np.transpose(np.dot(np.transpose(A,(0,2,1)),B.T),(0,2,1))

Okay, the syntax is getting ugly there, I’ll admit. Suppose now that you had two sets of matrices, and wanted the product of each element, as in

#matrix version A = [np.mat(np.random.randn(2,2)) for i in range(100)] B = [np.mat(np.random.randn(2,2)) for i in range(100)] AB = [a*b for a,b in zip(A,B)] #array version A = np.random.randn(100,2,2) B = np.random.randn(100,2,2) np.sum(np.transpose(A,(0,2,1)).reshape(100,2,2,1)*B.reshape(100,2,1,2),-3)

I’ll admit that the syntax of this is very weird. To see how it works, consider taking the (matrix) product of two arrays in a similar manner:

A = np.random.randn(2,2) B = np.random.randn(2,2) np.sum(A.T.reshape(2,2,1)*B.reshape(2,1,2),0)

The reshaping persuades numpy to broadcast the multiplication, which results in a 2x2x2 cube of numbers. Summing the numbers along the first dimension of the cube results in matrix multiplication. I have some scribbles which illustrate this, which I’ll post if anyone wants.

The main motivation for using arrays in this manner is speed. I find for loops in python to be rather slow (including within list comps), so I prefer to use numpy array methods whenever possible. The following runs a quick test, multiplying 1000 3×3 matrices together. It’s a little crude, but it shows the numpy.array method to be 10 times faster than the list comp of np.matrix.

import numpy as np import timeit #compare multiple matrix multiplication using list coms of matrices and deep arrays #1) the matrix method setup1 = """ import numpy as np A = [np.mat(np.random.randn(3,3)) for i in range(1000)] B = [np.mat(np.random.randn(3,3)) for i in range(1000)] """ test1 = """ AB = [a*b for a,b in zip(A,B)] """ timer1 = timeit.Timer(test1,setup1) print timer1.timeit(100) #2) the array method setup2 = """ import numpy as np A = np.random.randn(1000,3,3) B = np.random.randn(1000,3,3) """ test2 = """ AB = np.sum(np.transpose(A,(0,2,1)).reshape(1000,3,3,1)*B.reshape(1000,3,1,3),0) """ timer2 = timeit.Timer(test2,setup2) print timer2.timeit(100)

In the likely event that there’s a better way to do this, or I’ve screwed up some code somewhere, please leave me a comment

]]>I’m not sure what the tapping of the finger is about (at the start of the vid), but I have data from the experiment which I’d like to correlate to the position of the finger on the rig. the idea is to follow that black blob.

Having broken the vid into png images, I loaded them into python using PIL:

from PIL import Image images = [Image.open('./tmp/'+f) for f in imagenames]

and then made them into numpy arrays, slicing off only the red channel and the middle part of the image(by inspection, the dot was most visible on the red channel):

images = [np.asarray(image)[upper:lower,:,0] for image in images]#0 indexes red

Now, scipy.ndimage has an awesome tool for identifying regions of an image. I used a threshold (set by inspection) first:

images = [image

Luckily numpy/Scipy has all the tools I need to get started… on images. I need to break the (avi) video file into a series of png files (i could use jpeg, I suppose, but it’s not like hdd space is limited here…). The tool I’ve been using for this is ffmpeg.

Now, ffmpeg is a phenomenal beast. It can re-scale videos, re-encode them, crop them, and all kinds of other things that I’m not interested in. The command I’ve used to break a avi file into pngs is:

ffmpeg -i ./tmp/%03d.png

It’s nice that ffmpeg recognises the %03d format string, huh? Now that I’ve got some images, I can load them into numpy and process them (future blog post!)

After doing said processing, ffmpeg should be able to re-code my processed images into a video file for convenience. However, I’m really lazy, and can’t be bothered to read the whole of the ffmpeg doc. So I’m copying Mike here and using:

mencoder ‘mf://tmp/*.png’ -mf type=png:fps=10 -ovc lavc -lavcopts vcodec=wmv2 -oac copy -o <whatever>.mpg

I’m sure ffmpeg can do the job too, but it just isn’t working for me.

In fact, I’m wrapping both of those commands up in python, using the os module.

]]>Happily, there’s a nice paper by Chris Bishop, which explains clearly what’s going on, and gives you the update equations. For example:

which involves taking the expected value of , and . These three are straightforward:

The distribution for is , and so the expected value of is .

The distribution for is , and so the expected value of is simply .

The distribution for is a slightly more complex beast: is a non-square matrix, and we have a Gaussian distribution for each *row*. That is: . Bizarrely, the rows share a covariance matrix. The expected value of is simple: is just a matrix made of a stack of s.

The tricky bit comes in a different update equation, where we need to evaluate . The first thing to notice is that (where is a d by q matrix):

.

Since is Gaussian, . Now we can write:

Simple when you know how.

Edit: Anyone know how to get a bold ? I’ve tried {\bf \mu} and \mathbf{\mu}.

]]>First question. Easy, no problem.

Second question. Ooh, bit trickier. Got there in the end.

Third question. This has taken me 40 minutes, which seems like justification for posting a solution on this ‘ere weblog.

so the question is asked:

Suppose that in each individual in a population there is a pair of genes, each of which can be either X or x, that controls eye colour: those with xx have blue eyes, those with XX or Xx or xX have brown eyes. Those with Xx are known as heterozygotes.

The proportion of individuals with blue eyes (xx) is , and the proportion of heterozygotes is .

Each parent transmits one gene to the child: if the parent is a heterozygote, the probability that they transmit X is . Assuming random mating, show that amongst brown eyed parents with brown eyed children, the proportion of heterozygotes is .

Okay says I, it’s just Bayes’ rule, no? Let’s denote all heterozygotes as Xx (this should save significant keypresses…), children as ch and parents as pa.

Under Bayes rule we need a likelihood , prior and a ‘marginal likelihood’ term, which we get by summing the above. So we’re going to get something like:

We also need to make sure that we only consider brown eyed individuals (XX, Xx, and xX. not xx). let’s have a look at the priors:

and a little manipulation yeilds:

So looking at combinations of parents who have brown eyes:

Each of which can be considered a *prior*, with correspondings likelihoods (of the child being Xx):

Now, the top line of Thomas Bayes’ famous rule looks like:

remembering that the bottom line must consist of all brown eyed children of brown eyed parents (not just the heterozygotes), the bottom line looks like:

Phew. Cancelling a few terms does indeed leave

Not going to school for years switches your brain off: It took me as long to figure out how to do this as it did to blog it…

There’s actually a second part to the problem, (first set by Lindley, 1965, apparently), which turns out to have some rather messy terms in. I think I’ll save blogging that for another day.

]]>In [1]: from numpy import matlib as ml

In [2]: %bg ml.rand(2,2)

Starting job # 0 in a separate thread.In [3]: %bg ml.zeros(3)

Starting job # 1 in a separate thread.In [4]: jobs[0].result

Out[4]:

matrix([[ 0.97556473, 0.67794221],

[ 0.9331659 , 0.78887001]])In [5]: jobs[1].result

Out[5]: matrix([[ 0., 0., 0.]])

Mucho coolness, especially if you have a long process (or lots of them) to complete. I wonder if it’s actually multi-threaded, as in using both of my CPUs?

]]>For example, within my Variational Factor Analysis (VBFA) class, I need to keep a record of something I’m calling `b_phi_hat`

. One of the the methods in the class involves the update of this little vector, which depends on its initial (prior) value, `b_phi`

. Like this:

class VBFA: import numpy as np def __init__(self): self.b_phi = np.mat(np.zeros((5,1))) #blah blah... def update_phi(self): self.b_phi_hat = self.b_phi for i in range(5): self.b_phi_hat[i] = self.something()

`update_phi()`

get called 100s of times when the class is used. Spot the problem? It’s on line 8, where `b_phi_hat`

is *assigned* to `b_phi`

. When the loop runs on the next two lines, it’s modifying the original, not just a copy of the original, i.e. after the first iteration line 8 doesn’t ‘refresh’ `b_phi_hat`

, it keeps it at its current value.

What I should have written is:

import numpy as np class VBFA: def __init__(self): self.b_phi = np.mat(np.zeros((5,1))) #blah blah... def update_phi(self): self.b_phi_hat = self.b_phi.copy() for i in range(5): self.b_phi_hat[i] = self.something()

which explicitly makes a copy of the original on line 8, refreshing ` b_phi_hat`

with every iteration.

It’s also possible to assume a mean vector for the observed variables, but I’m going to ignore that for a moment for clarity.

Some simple algebra yields . It should now be clear that we are modeling the distribution of as a Gaussian distribution with limited degrees of freedom.

**Variational Approach**

The variational approach involved placing conjugate priors over the model parameters ( and ), and finding a factorised approximation to the posterior.

The variational approach to FA yields a distinct advantage: by placing an ARD prior over the columns of , unnecessary components get ‘switched off’, and the corresponding column of A goes to zero. This is dead useful: you don’t need to know the dimension of the latent space beforehand: it just drops out of the model.

** Papers **

There is a super Masters thesis on variational FA here by a chap called Frederik Brink Nielsen.

An immediate extension of factor analysis springs to mind: if the distribution of the data is just a Gaussian, why not have a *mixture* of them? It seems Ghahramani was there first: GIYF

More recently, Zhao and Yu proposed an alteration to the Variational FA model, which apparently achieves a tighter bound by making dependent on . this apparently makes the model less prone to under-fitting (i.e. it drops factors more easily). Neural Networks Journal

]]>logsumexp = lambda x: np.log(sum([np.exp(xi) for xi in x]))

Which is going to suck when them members of x are small. Small enough that the precision of the 64 bit float you holding them runs out, and they exponentiate to zero (somewhere near -700).Â Your code is going to barf when it get to the `np.log`

part, and finds it can’t log zero.

One solution is to add a constant to each member of x, so that you don’t work so close to the limits of the precision, and remove the constant later:

def logsumexp(x): x += 700 x = np.sum(np.exp(x)) return np.log(x) - 700

Okay, so my choice of 700 is a little arbitrary, but that (-700) is where the precision starts to run out, and it works for me. Of course, if your numbers are way smaller than that, you may have a problem.

Edit: grammar. And I’m getting used to this whole weblog shenanigan. Oh, and `<code>blah</code> looks rubbish: I'm going to stop doing that.`

- python
- probabilistic models of data
- learning
- teaching

I’m hoping that by storing my thoughts, they’ll begin to make more sense. I may even start writing thoughts on my thoughts, but mostly I’m just going to put them here for safekeeping.

]]>