jump to navigation

Reisefieber June 26, 2010

Posted by Sarah in Uncategorized.
add a comment

Reisefieber: literally “travel fever,” the mixture of excitement and nerves before leaving on a trip. I love it. I hope I never lose my enthusiasm for getting on a plane.

So, tomorrow I leave for Park City (going to the PCMI summer school). So excited to meet all the image processing people I only know from books and papers.

Technical travel reading: A Wavelet Tour of Signal Processing by Stephane Mallat. This book is incredible. Everything you ever wanted to know about wavelets, frames, basis pursuit, l^1 pursuit, all that good stuff. The style is discursive. The illustrations are ample. The proofs are lovely.

Non-technical travel reading: Anansi Boys by Neil Gaiman.

Too heavy to take but incredible: Vietnam: A History. I’ve been deep in Indochina for the past couple of weeks and it’s strange and fascinating.


Machine Learning Videos June 25, 2010

Posted by Sarah in Uncategorized.
Tags: ,
add a comment

I’m currently watching this Stanford lecture course on machine learning. Very engaging and clear so far — I just learned gradient descent methods.

Atiyah-Macdonald: Modules and Exact Sequences June 23, 2010

Posted by Sarah in Uncategorized.
add a comment

A module is a generalization of the notion of ideals. Rings, ideals, quotient rings, and vector spaces are all modules. If A is a ring, then an A-module is an abelian group M on which A acts linearly:
a(x+y) = ax + ay
(a+ b) x = a x + bx
(ab) x = a(bx)
$1x = x$
Submodules are subgroups of M closed under multiplication by elements of A; the sum and intersection of submodules is again a submodule.

The annihilator of M is the set of all a in A such that aM = 0. A module is called faithful if its annihilator is zero.

Direct sum of two modules is just defined as an ordered pair (x, y), with componentwise addition and scalar multiplication. An A-module is called free if it’s isomorphic to the direct sum of modules isomorphic to A. (A finitely generated A-module is isomorphic to a quotient of A^n for some n.)

Let M be a finitely generated A-module. Let a be an ideal of A and let \phi be an A-module automorphism of M st \phi(M) \subset aM. Then \phi satisfies an equation of the form

\phi^n + a_1 \phi^{n+1} \dots + a_n = 0.

Let x_1 \dots x_n be a set of generators of M. Then \phi(x_i) = \sum a_{ij} x_j for some a_{ij}. That is,
\sum (\delta_{ij} \phi - a_{ij}) x_j = 0
Left-multiplying by the adjoint of the matrix (\delta_{ij} \phi - a_{ij}, we see that the determinant annihilates each x_i, hence is the zero endomorphism of M. Expanding out the determinant gives us an equation of the desired form.

Exact sequences. Suppose we have a sequence of homomorphisms between modules,
M_{i-1} \to M_i \to M_{i+1} \dots
This is exact at m_i if $Im(f_i) = Ker(f_{i+1})$. The sequence is exact if it is exact at every module.
0 \to M' \to M is exact iff f is injective;
M \to M^* \to 0 is exact iff g is surjective.
A short exact sequence:
0 \to M' \to M \to M^* \to 0.
Every long exact sequence can be split into these.
If the above is an exact sequence, then
0 \to Hom(M^*, N) \to Hom(M, N) \to Hom(M', N) is also exact; note that the order reverses when we go to the dual space.

Next time I’ll introduce the boundary map and the Snake Lemma.

Atiyah-MacDonald: Ideals June 18, 2010

Posted by Sarah in Uncategorized.
add a comment

An ideal, a, of a ring, A is a subset which is an additive subgroup and such that A a \subset a. The cosets of this ideal, A / a, are known as a quotient ring.

A basic, though special, example here is the ring of integers. Every ideal is the set of multiples of a single integer, n. The quotient rings are the integers mod n. The ideal generated by n is denoted (n), and it is called a principal ideal. Because all the ideals in the integers are multiples of a single element, the integers are known as a principal ideal domain.

Notice that (mn) \subset (n). Multiples of six are also multiples of three. An ideal generated by a prime number is not contained in any other ideal (except the whole ring of integers.) This brings us to the notion of prime and maximal ideals.

A prime ideal p in a ring A is an ideal that is not equal to (1) and if xy \in p then x \in p or y \in p. Equivalently, a prime ideal is one where A / p is an integral domain.

A maximal ideal m in a ring A is an ideal that is not equal to (1) and there is no ideal a such that m \subset a \subset (1). Equivalently, a maximal ideal is one where A / m is a field. (This last is true because A is a field iff the only ideals in A are 0 and (1).)

Maximal ideals are always prime, because a field is always an integral domain.

Every ring (not equal to the zero ring) has at least one maximal ideal. This is a result of Zorn’s lemma.

As a result, every non-unit of A is contained in a maximal ideal.

In a principal ideal domain, every non-zero prime ideal is maximal.
For if (x) \neq 0 is a prime ideal and (x) \subset (y), then we have some x = yz so that yz \in (x), hence z \in (x), say z = tx. Then x = yz = ytx, so yt = 1 and so (y) = (1).

The set of nilpotent elements in a ring is called the nilradical and it is an ideal. The product of anything with a nilpotent element is also nilpotent; the sum of nilpotent elements is nilpotent by the binomial theorem.

The nilradical is the intersection of all the prime ideals; if f is nilpotent, then f^n = 0 for some n, so f itself is in all the prime ideals. If f is not nilpotent, you can construct a prime ideal — the largest in the set of ideals a such that $f^n$ is not in a. (The product of two elements not in this ideal is not itself in this ideal.)

The Jacobson Radical is the intersection of all the maximal ideals in A. It (obviously) contains the nilradical.

You can get new ideals from old by summing them (that gives you a new ideal), intersecting them (also an ideal) or finding the product (the set of all finite sums \sum x_i y_i of products of elements in X and Y). In the integers, a \cup b is the ideal generated by their lcm, and $ab = (mn)$, the ideal generated by the product of their generators. If the generators are relatively prime, then the intersection and the product are the same ideal.

This is true for general rings, as can be proven by induction. The generalization of relatively primality is that two ideals are coprime if a + b = (1).

Experiment: Commutative Algebra June 18, 2010

Posted by Sarah in Uncategorized.
1 comment so far

I’m reading Atiyah and MacDonald’s Introduction to Commutative Algebra. So, I thought, how about an experiment: when I finish a chapter, I’ll write a post on it. It might be a good study trick — I like blogging, and it forces me to develop insights and revisit concepts. Uniting the “just for fun” and “serious business” parts of my life just might work.

Stay tuned for a chapter post on ideals!

Convergence of the Discrete Laplace-Beltrami Operator June 15, 2010

Posted by Sarah in Uncategorized.
Tags: ,
add a comment

Via the Geomblog, here’s a paper about the convergence of discrete approximations to the Laplace-Beltrami operator.

What is the Laplace-Beltrami operator? It’s a generalization of the Laplacian for use on surfaces. Like the Laplacian, it’s defined as the divergence of the gradient of a function.
But what does this really mean on a surface?
Given a vector field X, the divergence is defined as
div X = \frac{1}{\sqrt{|g|}} \partial_i (\sqrt{|g|}X^i)
in Einstein notation, where g is the metric tensor associated with the surface. (Compare to the ordinary divergence of a vector field, which is the sum of its partial derivatives.)
The gradient is defined as
(grad (f))^i = g^{ij} \frac{\partial f}{\partial x^j}
Combining the definitions,
\Delta f = \frac{1}{\sqrt{|g|}} \partial_i(\sqrt{|g|} g^{ij} \partial_j f)
Euclidean space is a special case where the metric tensor is just the Kronecker delta.

Why do we care about this operator? Well, you can analyze surfaces with it; on the applied end this means it’s important for signal and image processing. (The Laplace-Beltrami operator is essential to diffusion maps, for instance.) The thing is, in computational applications we generally want a discrete version of the operator to converge to the real McCoy. Hence the importance of convergence algorithms.

Taubin’s discrete version of \Delta f is defined as
f(v) = \sum_i \omega_i(f(v_i - f(v))
an averaging operator over the neighbors. This is a weighted graph Laplacian. But this cannot be an approximation to the Laplace-Beltrami operator because it goes to 0 as the mesh becomes finer.

The authors use a different discretization. We assume a triangular mesh on the surface. The local tangiential polygon at a point v is defined to be the polygon formed by the images of all the neighbors upon projection onto the tangent space at v.

A function can be lifted locally to the local tangiential polygon, by defining it at the points on the surface, and making it piecewise linear in the natural way.

After a few lines of manipulation of the Taylor series for a C^2 function on the plane, it follows that
\Delta f(0, 0) = f_{xx}(0,0) + f_{yy}(0, 0)
= \frac{2 \sum_{i = 1}^n \alpha_i (f(x_i, y_i) - f(0, 0))}{\sum_{i = 1}^n \alpha_i x_i^2} + O(r)

Now you can use this as a discrete Laplacian applied to the local lifting function. The authors show this converges to the Laplace-Beltrami operator by using the exponential map from the tangent space into the surface. The Laplace-Beltrami operator can be computed from the second derivatives of any two perpendicular geodesics; let the geodesics be the images under the exponential map of the basis elements of the tangent space.

We use the fact that the normal to a sufficiently fine triangular mesh approximates the normal to the surface according to
N_\Sigma(v) = N_A v + O(r^2)
where r is the size of the mesh.
This shows that the basis for the approximating tangent plane is close to the basis for the true tangent plane with error only O(r^2).
Calculating the approximate Laplacian gives us then that the error is only O(r).
There are also some numerical simulations in the paper giving evidence that this approximation is faster and more accurate than a competing approximation.

It’s that time of year again June 14, 2010

Posted by Sarah in Uncategorized.
Tags: ,
add a comment

I’m by no means the most fanatical of Joyce fans, but yeah, I am one. So it always makes me happy when June 16 rolls around. In preparation, fix yourself a Gorgonzola sandwich and check out Robert Berry’s graphic-novel adaptation of Ulysses. And take a little time to curse the boneheads at Apple for refusing to carry it on any iPad or iPhone applications, due to some pages of (cartoon) nudity. (Um, hasn’t this happened to Ulysses before?)

Persistent homology (and geometry?) June 13, 2010

Posted by Sarah in Uncategorized.
Tags: , ,
add a comment

I read this AMS article by Robert Ghrist about persistent homology and I am intrigued.

This is a method for using topological invariants to define data. In particular, we want to define the homology of a data set representing a point cloud in a high-dimensional Euclidean space. A topologist would replace the point cloud with a simplicial complex — one of the easiest to compute is the Rips Complex, whose k-simplices correspond to unordered k+1-tuples of points within pairwise Euclidean distance \epsilon.

However, the resulting simplicial complex depends on the choice of \epsilon. A very small \epsilon leaves the complex a discrete set, while a very large \epsilon results in the complex being one big simplex. As \epsilon moves, holes in the simplicial complex are born, grow, and die; the picture over time provides a description of the data’s topology.

The “persistence complex” is a sequence of chain complexes C_*^i together with chain maps C_*^i \to C_*^{i+1}, which are inclusions. This is motivated by an increasing sequence of \epsilons and the inclusions from one complex to the next. (The precision here goes from fine to coarse.)

Ghrist introduces the notion of a “barcode” — each “bar” is a generator of the homology group, and the length of the bar is the range of values of \epsilon for which this particular element is a generator of the homology group. A barcode is the persistence analogue of a Betti number.

Now, what I always wondered here is what this has to do with geometry. Consider a finger-shaped projection sticking out of a 2-dimensional surface. At different resolutions, the projection can appear to break off into an island. (This is a practical problem for protein visualization.) This would be an example of a feature that could be captured with persistence homology; but it could also be explained directly by noticing that the tip of the projection is a region of high curvature. Could other persistence homology features be explained by geometrical properties?

This paper by Gunnar Carlsson et al, seems to provide an answer. The authors define a tangent complex of a space, which is the closure of the set of all tangents to points in the space. Then, they define the filtered tangent complex, which is the set of tangent vectors for which the osculating circle has a radius larger than some \delta. We have an inclusion between filtered tangent complexes of different \deltas. (For curves, there is only one osculating circle; for surfaces, there is one in each direction, so the tangent space is defined based on the maximum between them.)

Then we look at the homology groups of the filtered tangent spaces. This provides a barcode. Such barcodes can, for instance, distinguish a bottle from a glass. (The relationship of the tangent-space barcode to the Rips-complex barcode remains mysterious to me.)

Random Projections for Manifold Learning June 11, 2010

Posted by Sarah in Uncategorized.
Tags: , , ,
add a comment

Interesting paper by Chinmay Hegde, Michael Wakin, and Richard Baraniuk on manifold learning.
Given a set of data points which are suspected to lie on a low-dimensional manifold, how do you detect the dimension of the manifold? This method uses projection onto a random subspace of dimension M = CK \log(N) if the high-dimensional space is in \mathbb{R}^N and $K$ is the true dimension of the manifold. The pairwise metric structure of the points turns out to be preserved with high accuracy. What this means is that we can do signal processing directly on the sorts of random measurements produced by devices that exploit compressed sensing (like the single-pixel camera, for instance.)

This work relies on the Grassberger-Procaccia algorithm. Given the pairwise distances between data points, this computes the scale-dependent correlation dimension of the data,
\hat{D}(r_1, r_2) = \frac{\log C_n(r_1) - \log C_n(r_2)}{\log r_1 - \log r_2}
C_n(r) = \frac{1}{n(n-1)} \sum_{i \neq j} I_{||x_i - x_j|| < r}
We can approximate K by fixing $r_1$ and $r_2$ to be the biggest range over which the plot is linear and calculating D_{corr} in that range.

Once K is estimated, there are various techniques to generate a K-dimensional representation of the data points. The key theorem here is that if the manifold has dimension K, volume V, and condition number 1/\tau, then fixing 0 < \epsilon < 1 and 0 < \rho < 1 and letting \Phi be a random orthoprojector from \mathbb{R}^N \to \mathbb{R}^M and

M \ge O(\frac{K \log N V/ \tau log (\rho^{-1})}{\epsilon^2}

then with probability greater than 1 - \rho,
(1- \epsilon)\sqrt{M/N} \le \frac{d_i(\Phi_x, \Phi_y)}{d_i(x, y)} \le (1 + \epsilon)\sqrt{M/N}

That is, this map almost preserves geodesic distances.

A priori, though, we have no way to estimate V and \tau of the manifold — we don't know how big the curvature or volume are — so we need a learning procedure. Start with a small M, and estimate the dimension with Grassberger-Procaccia. Then use Isomap to identify a manifold of that dimension, using random projections. Then we calculate the residual variance (a measure of error) and increment M upwards until the residual variance is small enough for our own arbitrary criterion.

Lots more about Isomap here.

Compressed Sensing and tutorial June 7, 2010

Posted by Sarah in Uncategorized.
Tags: , ,
1 comment so far

I’m going to be going to the Park City Math Institute’s summer program at the end of June. The topic is Image Processing and there are lecture courses on mathematical image processing and compressed sensing, as well as whatever graduate lectures I also have time for. I’m excited about the whole thing.

So, in preparation, have been reading about compressed sensing, and the nicest and most complete tutorial I could find is this one by Richard Baraniuk: (PDF here). Compressed sensing, in short, is how you beat the Shannon bound; you can collect very few measurements and nonetheless reconstruct the signal perfectly. This, to drastically understate the point, is useful. Now I’m tempted to do nothing but read papers. (However, there’s also Serious Business to be done — currently Stein’s Singular Integrals.)