## Multiscale Methods for Detecting Dimensionality May 2, 2010

Posted by Sarah in Uncategorized.
Tags: , ,

I’m currently reading this 2009 paper by Anna Little, Yoon-Mo Jung, and Mauro Maggioni, and it’s very lovely.

The idea is this: given a cloud of data, it may well lie on a low-dimensional manifold. But what dimension? We care about this because it indicates how many coordinates are worth looking at. To give an example (not the authors’, but I think it’s illustrative) psychologists like to give the Myers-Briggs personality test, which examines four axes (introversion/extroversion, intuitive/sensing, thinking/feeling, perceiving/judging.) But is personality best described by four coordinates? Or is it five, or three? What does the landscape of personality really look like?

If you have a linear mixture model, you can use PCA to find the rank of the data matrix. If you have a noisy linear mixture model, we’re still pretty much okay so long as the noise isn’t overwhelming, and that’s the (relatively) well-trod territory of random matrix theory. But estimating the dimension of a nonlinear manifold becomes difficult; PCA will tend to severely overestimate dimension. For example, the covariance matrix of a circle turns out to have two eigenvalues, but a circle, of course, is one-dimensional. In general, by choosing a curve that spirals out into many dimensions, it’s possible to construct a one-dimensional manifold with a full-rank covariance matrix. So we need to get more sophisticated.

Locally, of course, a curve looks like a line, so if we take a small enough radius we can determine that it’s one-dimensional. Small enough that the curvature isn’t much; but large enough that in the case of a noisy matrix the covariance is measuring the “true” manifold instead of the noise. Since we don’t know how big the radius should be, we take various radii and get singular values from all of them. This is what makes the method “multiscale.”

Try this with a noisy nine-dimensional sphere and you get the dimensionality rather nicely. At tiny scales there aren’t enough data points and the rank of the covariance matrix is far too low. At larger scales, the top 9 eigenvalues start to separate from the others; at even larger scales, curvature starts to matter and more eigenvalues separate. But it’s understandable that there’s a “sweet spot” giving dimension 9.

The authors go on to do some estimates of noise and find that the size of the noise in the covariance matrix is essentially independent of the ambient dimension.

The algorithm they come up with goes like this:
1. Compute singular values for the covariance matrix, for each radius r and each point on the data set;
2. Estimate the noise size from the bottom singular values that do not grow with r;
3. Identify a range of scales where the noise S.V.’s are small compared to the other S.V.’s.
4. Estimate which non-noise S.V.’s correspond to curvatures and which correspond to tangent directions by comparing the growth rate as being linear or quadratic in r^2. (The effect of curvature is to produce other non-noise S.V.’s — this is why linear methods tend to overestimate dimension — but these are distinguishable by their different growth rate.)
This turns out to outperform other existing algorithms which do not use this multiscale technique.