## Vision without categories?November 17, 2010

Posted by Sarah in Uncategorized.
Tags: , ,

I’d just like to mention that I’ve come across Tomasz Malisiewicz’s blog on machine learning and computer vision, and I’m hooked. You should be too.

There’s the usual panoply of links to interesting papers, but then there’s also Tomasz’s radical idea for reimagining computer vision using a memex instead of a set of categories. He thinks that the “vision problem” will be solved by something much closer to actual AI than is generally considered necessary today. His ideas are informed by Wittgenstein and Vannevar Bush as well as contemporary research. It sounds interesting, to say the least. Then there’s also Tomasz’s stirring (if somewhat intimidating) advice to students and researchers to be Renaissance men and look beyond the A+ and the well-received publication. All in all, very worth reading.

(Sensible Sarah says: “Hey, wait a minute! I thought I was a math student — what’s up with all this vision stuff? And I have a qual to pass in a month!” Sensible Sarah throws up her hands in dismay.)

Posted by Sarah in Uncategorized.
Tags: , , ,

Currently reading this paper by Alex Bronstein et al.

The idea here is to make searchable databases for shapes (2d and 3d objects) in the same way a text engine responds to search queries.

There are two main parts to this task: feature detection, and feature description. Different methods can be employed to define what a feature is; a dense descriptor just selects all the points in the image as descriptors, but there are other possibilities. SIFT looks for local maxima of the discrete image Laplacian that persist through different scales. MSER finds level sets that show the smallest variation of area when traversing the level-set graph.

Feature description aims to produce a “bag of words” from the features, by performing vector quantization on the feature space. Two images can then be compared by comparing their bags of features with plain old Euclidean distance.

There are extra challenges if we want to do this with 3d objects rather than images: for one, conformations of 3d objects (like a body in different poses) are much more complicated than the (usually affine) transformations seen on a given image. Also, while images are typically represented as matrices of pixels, 3d objects can be represented as meshes, point clouds, level sets, etc, so computations have to be portable between formats. Also, 3d objects don’t have a universal system of coordinates.

The feature description process used by the authors is based on heat kernel signatures.
Recall that the heat kernel $K_t(x, z)$ is the fundamental solution of the heat equation
$(\Delta_x + \frac{\partial }{\partial t} )u = 0.$
The heat kernel can also be seen as the transition density function of a Brownian motion.
For any point on the surface, we give its heat kernel signature as
$p(x) = c(x)(K_{t_1}(x, x), K_{t_2}(x, x), \dots K_{t_n}(x, x))$
This is invariant under isometric deformations of the space, since it depends only on the Riemannian metric. It captures local information in a small neighborhood of x for small t, and global information for large t. It can be proven that any continuous map preserving the heat kernel signature must be an isometry. And computing it depends on computing the eigenvalues of the Laplacian, which can be done with various formats of representing 3d shapes.

$K_t(x, x') = \sum_{l = 0}^\infty e^{-\lambda_i t} \phi_l(x) \phi_l(x')$

So, since the eigenvalues of the Laplacian decay exponentially, we can truncate this sum for computation of the heat kernel.
In the discrete case, instead of the Laplacian we would use a discretization of the form

$\hat{\Delta} (f) = \frac{1}{a_i} \sum_j w_{ij} (f_i - f_j)$

After vector quantization to get a bag of features, the authors generalize this notion to a spatiall sensitive bag of features:

$\int_{X \times X} \theta(x) \theta^T(y) K_t(x, y) d\mu(x) d\mu(y)$
giving a matrix that represents the distribution of nearby words.

From there, we can compare these matrices by embedding them into a Hamming Space. (Hamming distances happen to be very quick to compute on modern CPU architectures.

ShapeGoogle was tested against various possible transformations of shapes (isometries, topology changes, holes, noise, scale changes) and performed well, outperforming other methods like Shape DNA, part-based bag of words, and clock matching bag of features.

It’s an interesting project, and I suspect that the ability to search shape databases will prove useful. Here’s a list of current content-based image retrieval search engines: both Google and Bing use these methods (as opposed to only looking at metadata.)

## Partha NiyogiOctober 9, 2010

Posted by Sarah in Uncategorized.
Tags: ,

Sad news: I found out this week that Partha Niyogi passed away. I never met him, but I read his papers and I admired his work.

Geometric Methods and Manifold Learning

Partha Niyogi, Mikhail Belkin

2 videos

Here’s a talk he gave in Chicago last year on manifold learning; it’s a lovely, complete overview of the problem and current algorithms (Isomap, LLE, Laplacian eigenmaps, diffusion distances.) Also at the MLSS conference site are many other talk videos about computational learning — Stephane Mallat, Emmanuel Candes, Guillermo Sapiro, etc. If I had infinite time I would just watch all of them.

## Machine Learning VideosJune 25, 2010

Posted by Sarah in Uncategorized.
Tags: ,