jump to navigation

Shape Google November 15, 2010

Posted by Sarah in Uncategorized.
Tags: , , ,

Currently reading this paper by Alex Bronstein et al.

The idea here is to make searchable databases for shapes (2d and 3d objects) in the same way a text engine responds to search queries.

There are two main parts to this task: feature detection, and feature description. Different methods can be employed to define what a feature is; a dense descriptor just selects all the points in the image as descriptors, but there are other possibilities. SIFT looks for local maxima of the discrete image Laplacian that persist through different scales. MSER finds level sets that show the smallest variation of area when traversing the level-set graph.

Feature description aims to produce a “bag of words” from the features, by performing vector quantization on the feature space. Two images can then be compared by comparing their bags of features with plain old Euclidean distance.

There are extra challenges if we want to do this with 3d objects rather than images: for one, conformations of 3d objects (like a body in different poses) are much more complicated than the (usually affine) transformations seen on a given image. Also, while images are typically represented as matrices of pixels, 3d objects can be represented as meshes, point clouds, level sets, etc, so computations have to be portable between formats. Also, 3d objects don’t have a universal system of coordinates.

The feature description process used by the authors is based on heat kernel signatures.
Recall that the heat kernel K_t(x, z) is the fundamental solution of the heat equation
(\Delta_x + \frac{\partial }{\partial t} )u = 0.
The heat kernel can also be seen as the transition density function of a Brownian motion.
For any point on the surface, we give its heat kernel signature as
p(x) = c(x)(K_{t_1}(x, x), K_{t_2}(x, x), \dots K_{t_n}(x, x))
This is invariant under isometric deformations of the space, since it depends only on the Riemannian metric. It captures local information in a small neighborhood of x for small t, and global information for large t. It can be proven that any continuous map preserving the heat kernel signature must be an isometry. And computing it depends on computing the eigenvalues of the Laplacian, which can be done with various formats of representing 3d shapes.

K_t(x, x') = \sum_{l = 0}^\infty e^{-\lambda_i t} \phi_l(x) \phi_l(x')

So, since the eigenvalues of the Laplacian decay exponentially, we can truncate this sum for computation of the heat kernel.
In the discrete case, instead of the Laplacian we would use a discretization of the form

\hat{\Delta} (f) = \frac{1}{a_i} \sum_j w_{ij} (f_i - f_j)

After vector quantization to get a bag of features, the authors generalize this notion to a spatiall sensitive bag of features:

\int_{X \times X} \theta(x) \theta^T(y) K_t(x, y) d\mu(x) d\mu(y)
giving a matrix that represents the distribution of nearby words.

From there, we can compare these matrices by embedding them into a Hamming Space. (Hamming distances happen to be very quick to compute on modern CPU architectures.

ShapeGoogle was tested against various possible transformations of shapes (isometries, topology changes, holes, noise, scale changes) and performed well, outperforming other methods like Shape DNA, part-based bag of words, and clock matching bag of features.

It’s an interesting project, and I suspect that the ability to search shape databases will prove useful. Here’s a list of current content-based image retrieval search engines: both Google and Bing use these methods (as opposed to only looking at metadata.)


Convergence of the Discrete Laplace-Beltrami Operator June 15, 2010

Posted by Sarah in Uncategorized.
Tags: ,
add a comment

Via the Geomblog, here’s a paper about the convergence of discrete approximations to the Laplace-Beltrami operator.

What is the Laplace-Beltrami operator? It’s a generalization of the Laplacian for use on surfaces. Like the Laplacian, it’s defined as the divergence of the gradient of a function.
But what does this really mean on a surface?
Given a vector field X, the divergence is defined as
div X = \frac{1}{\sqrt{|g|}} \partial_i (\sqrt{|g|}X^i)
in Einstein notation, where g is the metric tensor associated with the surface. (Compare to the ordinary divergence of a vector field, which is the sum of its partial derivatives.)
The gradient is defined as
(grad (f))^i = g^{ij} \frac{\partial f}{\partial x^j}
Combining the definitions,
\Delta f = \frac{1}{\sqrt{|g|}} \partial_i(\sqrt{|g|} g^{ij} \partial_j f)
Euclidean space is a special case where the metric tensor is just the Kronecker delta.

Why do we care about this operator? Well, you can analyze surfaces with it; on the applied end this means it’s important for signal and image processing. (The Laplace-Beltrami operator is essential to diffusion maps, for instance.) The thing is, in computational applications we generally want a discrete version of the operator to converge to the real McCoy. Hence the importance of convergence algorithms.

Taubin’s discrete version of \Delta f is defined as
f(v) = \sum_i \omega_i(f(v_i - f(v))
an averaging operator over the neighbors. This is a weighted graph Laplacian. But this cannot be an approximation to the Laplace-Beltrami operator because it goes to 0 as the mesh becomes finer.

The authors use a different discretization. We assume a triangular mesh on the surface. The local tangiential polygon at a point v is defined to be the polygon formed by the images of all the neighbors upon projection onto the tangent space at v.

A function can be lifted locally to the local tangiential polygon, by defining it at the points on the surface, and making it piecewise linear in the natural way.

After a few lines of manipulation of the Taylor series for a C^2 function on the plane, it follows that
\Delta f(0, 0) = f_{xx}(0,0) + f_{yy}(0, 0)
= \frac{2 \sum_{i = 1}^n \alpha_i (f(x_i, y_i) - f(0, 0))}{\sum_{i = 1}^n \alpha_i x_i^2} + O(r)

Now you can use this as a discrete Laplacian applied to the local lifting function. The authors show this converges to the Laplace-Beltrami operator by using the exponential map from the tangent space into the surface. The Laplace-Beltrami operator can be computed from the second derivatives of any two perpendicular geodesics; let the geodesics be the images under the exponential map of the basis elements of the tangent space.

We use the fact that the normal to a sufficiently fine triangular mesh approximates the normal to the surface according to
N_\Sigma(v) = N_A v + O(r^2)
where r is the size of the mesh.
This shows that the basis for the approximating tangent plane is close to the basis for the true tangent plane with error only O(r^2).
Calculating the approximate Laplacian gives us then that the error is only O(r).
There are also some numerical simulations in the paper giving evidence that this approximation is faster and more accurate than a competing approximation.