James Hensman’s Weblog

January 26, 2009

Working with log likelihoods

Filed under: python — jameshensman @ 1:41 pm
Tags: ,

Sometimes, when you have a series of number representing the log-probability of something, you need to add up the probabilities. Perhaps to normalise them, or perhaps to weight them… whatever.  You end up writing (or Mike ends up writing):

logsumexp = lambda x: np.log(sum([np.exp(xi) for xi in x]))

Which is going to suck when them members of x are small. Small enough that the precision of the 64 bit float you holding them runs out, and they exponentiate to zero (somewhere near -700).  Your code is going to barf when it get to the np.log part, and finds it can’t log zero.

One solution is to add a constant to each member of x, so that you don’t work so close to the limits of the precision, and remove the constant later:

def logsumexp(x):
    x += 700
    x = np.sum(np.exp(x))
    return np.log(x) - 700

Okay, so my choice of 700 is a little arbitrary, but that (-700) is where the precision starts to run out, and it works for me. Of course, if your numbers are way smaller than that, you may have a problem.

Edit: grammar. And I’m getting used to this whole weblog shenanigan. Oh, and <code>blah</code> looks rubbish: I'm going to stop doing that.



  1. Is it OK if x is constrained to be 0\leq x\leq 1 ?

    Comment by mikedewar — January 26, 2009 @ 2:20 pm | Reply

    • Yep. Then e^x \in [1, e]. more often though, x are values of the log-likelihood and as such are x \in [-\inf,0]

      Comment by jameshensman — January 26, 2009 @ 2:51 pm | Reply

  2. Thanks for the tip, James. Here is my new way:

    logsumexpr = lambda x: pylab.log(sum([pylab.exp(xi+700) for xi in x]))-700

    which is essentially what you said, except:
    a) it’s a one liner with the \lambda-calculus and therefore uber geek cred goes to ME
    b) it works when x is a list! Yours only works on numpy arrays

    Incidentally the ‘r’ in logsumexpr is for ROBUST (oh yeah!)

    Comment by mikedewar — January 26, 2009 @ 5:28 pm | Reply

  3. Yikes! This only works for a while. As x gets smaller and smaller (a longer and longer chain gets less and less likely no matter what) then the amount you need to add on to x so that the log() doesn’t barf -inf increases. The trouble is, of course, that if you add much more than 700 to the x varaible the exp() barfs inf. So while adding on a constant works for a while, it doesn’t work forever!

    Comment by mikedewar — January 26, 2009 @ 6:22 pm | Reply

  4. Yeah, you’re right. You can write a little bit of code to work out the ‘best’ constant. You’re still screwed if you’ve got a really big range in x: more than about 1400 (i think) and it barfs even if you add a ‘clever’ constant.

    Comment by jameshensman — January 26, 2009 @ 10:15 pm | Reply

  5. Typically the best constant is just -max(x). Shouldn’t matter if the range is large: if x_i / x_j is e^{1400}, then x_j for the purposes of this function can be treated as zero.

    Comment by chas — December 15, 2009 @ 1:49 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at WordPress.com.

%d bloggers like this: