##
Random Projections for Manifold Learning
*June 11, 2010*

*Posted by Sarah in Uncategorized.*

Tags: compressed sensing, high dimensional data, isomap, richard baraniuk

add a comment

Tags: compressed sensing, high dimensional data, isomap, richard baraniuk

add a comment

Interesting paper by Chinmay Hegde, Michael Wakin, and Richard Baraniuk on manifold learning.

Given a set of data points which are suspected to lie on a low-dimensional manifold, how do you detect the dimension of the manifold? This method uses projection onto a random subspace of dimension if the high-dimensional space is in and $K$ is the true dimension of the manifold. The pairwise metric structure of the points turns out to be preserved with high accuracy. What this means is that we can do signal processing directly on the sorts of random measurements produced by devices that exploit compressed sensing (like the single-pixel camera, for instance.)

This work relies on the Grassberger-Procaccia algorithm. Given the pairwise distances between data points, this computes the scale-dependent correlation dimension of the data,

where

We can approximate K by fixing $r_1$ and $r_2$ to be the biggest range over which the plot is linear and calculating in that range.

Once K is estimated, there are various techniques to generate a K-dimensional representation of the data points. The key theorem here is that if the manifold has dimension K, volume V, and condition number , then fixing and and letting be a random orthoprojector from and

then with probability greater than ,

That is, this map almost preserves geodesic distances.

A priori, though, we have no way to estimate V and of the manifold — we don't know how big the curvature or volume are — so we need a learning procedure. Start with a small M, and estimate the dimension with Grassberger-Procaccia. Then use Isomap to identify a manifold of that dimension, using random projections. Then we calculate the residual variance (a measure of error) and increment M upwards until the residual variance is small enough for our own arbitrary criterion.

Lots more about Isomap here.