##
High dimensions and policy *April 28, 2010*

*Posted by Sarah in Uncategorized.*

Tags: speculation

trackback

Tags: speculation

trackback

Robin Hanson says this:

Imagine the space of all policies, where one point in that space is the current status quo policy. To a first approximation, policy insight consists on learning which directions from that point are “up” as opposed to “down.” This space is huge – with thousands or millions of dimensions. And while some dimensions may be more important than others, because those changes are easier to implement or have a larger slope, there are a great many important dimensions.

In practice, however, most policy debate focuses on a few dimensions, such as the abortion rate, the overall tax rate, more versus less regulation, for or against more racial equality, or a pro versus anti US stance. In fact, political scientists Keith Poole and Howard Rosenthal are famous for showing that one can explain 85% of the variation in US Congressional votes by a single underlying dimension, where there are two separated clumps. Most of the remaining variation is explained by one more dimension. Similar results have since been found for many other nations and eras.

This sounds, to me, like the main insight of dimensionality reduction. How do you know you’ve picked a good basis? Is the set of coordinates you choose to measure actually the set of coordinates that most efficiently explain the data?

Maybe policy outcomes really are nearly clustered along one axis, and maybe that axis is the Democrat/Republican one. Possibly. But we’d have to check. That’s what rank estimation is for. (See Kritchman and Nadler, who do it with a matrix perturbation approach.)

I’d like to see that particular insight trickle into society more broadly. There *are* objective ways to compare the usefulness of coordinate systems. Put another way, if you want to play Twenty Questions with the universe, some questions are better than others. And there is always a possibility that the ones we’re using aren’t so good.

I am not so optimistic about looking for explanation with variation based methods for high dimensional data. The reason is that

correlation does not imply dependence. This is especially so when dealing with high dimensional data. One variable can just happen to correlate (perfectly) with a (linear) combination of several other variables. Yet they can be totally unrelated (intrinsically independent) to each other.

To help people to find useful coordinate systems (i.e. intrinsic variables or dimensions), I would try to plot all those variables in a map that, somehow, shows the topological or cluster structure among the variables. Multidimensional scaling, for instance, offers

a collection of methods to produce such maps.

I think there’s an important point here that I’m missing, but I’m going to have to ask for clarification. How can two variables be both independent and uncorrelated?

I think you mean that the *sample correlation* between observations of several variables can be non-zero, but that the correlation between the variables in an underlying probability model is zero (as independent random variables are necessary uncorrelated:

If X and Y are random variables, and E is the expectation operator

Corr(X,Y) is proportional to Cov(X,Y) so Corr(X,Y)=0 implies Cov(X,Y)=0 and

Cov(X,Y)=E[(X-E[X])(Y-E[Y])] =E[XY]-E[X]E[Y]

Hence

E[XY]=E[X]E[Y]

Which is precisely independence.

)

Thanks! That makes much more sense.

Sorry, I made a mistake here (you should read the above proof backwards): its independence –> uncorrelated

and correlated –> independence

E[XY]=E[X]E[Y] is a consequence of independence, but it does not imply independence.

There’s another issue which is relevant here. Some viewpoints naturally connect to do them resting on the same type of moral or ethical argument. Thus for example, it isn’t at all surprising that attitudes abotu stem cells and abortion highly correlate.

Exactly. There are a number of things we can analyze: what correlates in political voting; what correlates in non-politicians’ answers to survey questions, which is more what you’re talking about; and what policies are actually correlated in real life (that is, have similar input requirements or similar consequences.)

The first, political voting, is basically one-dimensional.

The second, poll results, turn out much more complicated. People who call themselves Democrats, in particular, are very ideologically diverse. (A recent OkTrends post is precisely about that.) If you plot poll results on a conventional Nolan chart, you don’t get much of a linear correlation.

The third… I don’t know if anybody has even attempted that problem. It would be very hard. You could restrict attention to, say, congressional budget proposals and try to map them by their substantive, rather than political, similarities along different coordinates. (e.g. a coordinate could be “net taxes/transfers to farmers.”) The dominant coordinates in *that* map might turn out to have nothing to do with our dominant political axes. Or they might; but as far as I know it isn’t known.

The invisible fourth option is what Robin Hanson is talking about — the space of possible policies, not actually implemented policies. That would be a map that included, say, asteroid defense, and could conclude that asteroid defense was a more important issue than gay marriage. The fourth option is what policy really needs. But it’s really hard to imagine implementing it. The third option is something we could actually do.