##
Church: a language for probabilistic computation *October 23, 2010*

*Posted by Sarah in Uncategorized.*

Tags: AI, computer science

trackback

Tags: AI, computer science

trackback

As a novice with some interest in applied areas, it happens all too often that I fall in love with an idea about perception/cognition/data analysis, and find out only later “Oh wait, that wasn’t actually math! And I’d have to do so much background reading in a completely different field to understand it!” I’ve experienced “Oh wait, I’m not an electrical engineer!”, “Oh wait, I’m not a neuroscientist!”, and “Oh wait, I’m not a statistician!”

Well, today is the day I say “Oh wait, I’m not a computer scientist!”

The idea that caught my attention is probabilistic computing. We often want to build a machine that can make predictions and build models (a computer that can diagnose medical symptoms, or predict which creditors will default, or even, dare we whisper, an AI). This is essentially a Bayesian task: given some data, which probability functions best explain it? The trouble is, computers are bad at this. For the most part, they’re built to do the opposite task: given probability distributions and models, simulate some data. Generating probability functions and finding the best one can be prohibitively expensive, because the space of probability functions is so large. Also, while the computational complexity of evaluating f(g(x)) is just f + g, the computational complexity of composing two conditional probability distributions B|A and C|B is

ΣB P(C, B|A)

whose computational time will grow exponentially rather than linearly as we compose more distributions.

Church, a language developed at MIT is an attempt to solve this problem. (Apparently it’s a practical attempt, because the founders have already started a company, Navia Systems, using this structure to build probabilistic computers.) The idea is, instead of describing a probability distribution as a deterministic procedure that evaluates the probabilities of different events, represent them in terms of probabilistic procedures for generating samples from them. That is, a random variable is actually a random variable. This means that repeating a computation will not give the same result each time, because evaluating a random variable doesn’t give the same result each time. There’s a computational advantage here because it’s possible to compose random variables without summing over all possible values.

The nice thing about Church (which is based on Lisp, and named after Alonzo Church) is that it allows you to compose practically any query without significantly increasing runtime. “What is the probability of A and B or C and not D given that X and Y and not Z?” and so on.

The PhD dissertation that introduced Church is where I started, but it’s actually much clearer if you learn about it from the interactive tutorial. The tutorial is really a beautiful thing, well-written to the point of addictiveness. It is completely accessible to just about anyone.

Interesting idea. From skimming of the first 30 pages of thesis, it seems the main idea is that we should use MCMC sampling instead of regular probability calculus, am I right?

I do wonder if MCMC is more efficient than deterministic approach. For instance, recent results suggest that when partition function of binary MRF’s can’t be deterministically approximated in polynomial time, MCMC fails to mix in polynomial time (http://arxiv.org/abs/1005.5584 and refs in the intro)

Yeah, I think you’re right: MCMC is how they estimate the posterior probabilities in a reasonable time. I don’t know how efficient they are (but the authors also suggest making changes in hardware, replacing logic gates with stochastic logic gates, so maybe that would help?) I’m afraid I’m brand new to this, though.

I like your blog, btw.

Well, if MCMC takes exponential time to mix, then faster logic gates won’t really help. Imagine trying to estimate probability density of a model with probability proportional to p(x)\propto \exp(t \sum_{ij} x_i x_j) where x {1,-1} valued vector and sum is taken over all edges in some graph. If the graph is a grid and t is high enough, there will be essentially two states — all 1’s or all -1’s. If you change a single x component, all the other ones flip as well.

Local MCMC algorithms like Gibbs sampler work by sampling looking at neighboring x’s and updating one x at a time. In the situation with global interactions like above, it will take too many samples for the result to be a good estimate of true distribution.

On other hand, even if MCMC and “classical” probability calculus are theoretically the same, people usually come up with good randomized algorithms before deterministic ones. I can think of several probability estimation problems that have fast MCMC algorithms but no deterministic ones (yet).