## Johnson-Lindenstrauss and RIPSeptember 10, 2010

Posted by Sarah in Uncategorized.
Tags: ,

Via Nuit Blanche, I found a paper relating the Johnson-Lindenstrauss Theorem to the Restricted Isometry Property (RIP.) It caught my eye because the authors are Rachel Ward and Felix Krahmer, whom I actually know! I met Rachel when I was an undergrad and she was a grad student. We taught high school girls together and climbed mountains in Park City, where I also met Felix.

The Johnson-Lindenstrauss Lemma says that any p points in n-dimensional Euclidean space can be embedded in $O(\epsilon^{-2} \log(p))$ dimensions, without distorting the distance between any two points by a distance of more than a factor between $(1 -\epsilon)$ and $(1 + \epsilon)$. So it gives us almost-isometric embeddings of high-dimensional data in lower dimensions.

The Restricted Isometry Property, familiar to students of compressed sensing, is the property of a matrix $\Phi$ that $(1-\delta) ||x||_2^2 \le ||\Phi x||_2^2 \le (1 + \delta) ||x||_2^2$
for all sufficiently sparse vectors $x$.
The relevance of this property is that these are the matrices for which $\ell_1$-minimization actually yields the sparsest vector — that is, RIP is a sufficient condition for basis pursuit to work.

Now these two concepts involve equations that look a lot alike… and it turns out that they actually are related. The authors show that RIP matrices produce Johnson-Lindenstrauss embeddings. In particular, they produce improved Johnson-Lindenstrauss bounds for special types of matrices with known RIP bounds.

The proof relies upon Rademacher sequences (uniformly distributed in [-1, 1]), which I don’t know a lot about, but should probably learn.

[N.B.: corrections from the original post come from Felix.]

## Meyer/Basarab paperAugust 18, 2010

Posted by Sarah in Uncategorized.
Tags: , ,

this is what I’m reading through right now: “A variant on the compressed sensing of Emmanuel Candes,” by Yves Meyer and Matei Basarab. I’m not done with it, but it involves reconstructing a function by sub-Nyquist sampling of its Fourier coefficients, using an $\ell^1$-minimization technique. It’s a generalization of a theorem by Terence Tao about functions on finite fields. Looks to be interesting.

## PCMI (1)July 1, 2010

Posted by Sarah in Uncategorized.
Tags: , , , , , First, let me get it out of the way: Utah is beautiful. I’ve wanted to go to Utah ever since I read A Primer on Mapping Class Groups and realized that there are people doing cool math out west. It’s amazing up here. Liquid blue sky, sagebrush on the mountains, columbines and lupines and pine martens…
It’s actually so pretty that it tempts me to get too ambitious with the trail running.

Highlights:

Jared Tanner’s course on compressed sensing is, as far as anyone knows, the first set of lectures on this for an undergraduate audience. We showed that sparse recovery is possible for matrices in general position (checking this property is impractical, as it takes exponential time, but it happens that random matrices are in general position with high probability.) We went on to deal with Haar and Fourier bases, and we’re now studying basis pursuit and coherence.

Richard Baraniuk gave some graduate lectures on compressed sensing, from more of an application perspective. He gave a visual illustration that I really liked of why l^1 minimization is a better way to find sparsity than l^2 minimization — the l^1 ball is an octahedron, and it’s likely to intersect a line of random angle close to a coordinate axis. The l^2 ball is a sphere, which is less “concentrated” around the axes, and so is likely to intersect a line of random angle farther from a coordinate axis — that is, less sparse.

I’m now starting a course with Anna Gilbert about sparse approximation — she’s going to take more of the computational complexity approach.

The other nice thing about PCMI is meeting people you’ve only heard of through their research. I went hiking (and then pizza-eating) with Arthur Szlam, whom I knew about because he got his PhD from Yale and does the same kind of computational harmonic analysis that I’d (ideally) like to do. Talking to him was great — he runs at the prodigious rate of about three big mathematical insights per beer.

## Random Projections for Manifold LearningJune 11, 2010

Posted by Sarah in Uncategorized.
Tags: , , ,

Interesting paper by Chinmay Hegde, Michael Wakin, and Richard Baraniuk on manifold learning.
Given a set of data points which are suspected to lie on a low-dimensional manifold, how do you detect the dimension of the manifold? This method uses projection onto a random subspace of dimension $M = CK \log(N)$ if the high-dimensional space is in $\mathbb{R}^N$ and $K$ is the true dimension of the manifold. The pairwise metric structure of the points turns out to be preserved with high accuracy. What this means is that we can do signal processing directly on the sorts of random measurements produced by devices that exploit compressed sensing (like the single-pixel camera, for instance.)

This work relies on the Grassberger-Procaccia algorithm. Given the pairwise distances between data points, this computes the scale-dependent correlation dimension of the data, $\hat{D}(r_1, r_2) = \frac{\log C_n(r_1) - \log C_n(r_2)}{\log r_1 - \log r_2}$
where $C_n(r) = \frac{1}{n(n-1)} \sum_{i \neq j} I_{||x_i - x_j|| < r}$
We can approximate K by fixing $r_1$ and $r_2$ to be the biggest range over which the plot is linear and calculating $D_{corr}$ in that range.

Once K is estimated, there are various techniques to generate a K-dimensional representation of the data points. The key theorem here is that if the manifold has dimension K, volume V, and condition number $1/\tau$, then fixing $0 < \epsilon < 1$ and $0 < \rho < 1$ and letting $\Phi$ be a random orthoprojector from $\mathbb{R}^N \to \mathbb{R}^M$ and $M \ge O(\frac{K \log N V/ \tau log (\rho^{-1})}{\epsilon^2}$

then with probability greater than $1 - \rho$, $(1- \epsilon)\sqrt{M/N} \le \frac{d_i(\Phi_x, \Phi_y)}{d_i(x, y)} \le (1 + \epsilon)\sqrt{M/N}$

That is, this map almost preserves geodesic distances.

A priori, though, we have no way to estimate V and $\tau$ of the manifold — we don't know how big the curvature or volume are — so we need a learning procedure. Start with a small M, and estimate the dimension with Grassberger-Procaccia. Then use Isomap to identify a manifold of that dimension, using random projections. Then we calculate the residual variance (a measure of error) and increment M upwards until the residual variance is small enough for our own arbitrary criterion.