Principal Components Analysis – Georgia Tech – Machine Learning
Okay, so the first linear transformation algorithm we’re going to discuss, is something called Principal Components Analysis. Okay Michael?>>Sure.>>Now just to be clear here, the amount of time it would take me to derive Principal Components Analysis and work it all out in its gory detail would take forever and so I’m not going to do that, I’m going to leave that to reading. But I want people to understand what Principal Components Analysis actually does and what its properties are, okay?>>Yeah.>>Okay, so, Principal Components Analysis has a long history. It’s a particular example of something called an eigenproblem.>>Mm.>>Okay. It’s a particular example something called an eigenproblem which you either already know what that means or you don’t. And if you don’t then it means you haven’t read the material that we’ve given you so I’m going to ask you to do that. But, whether you have or have not let me just kind of give you an idea of what Principal Components Analysis actually does. And I think the easiest way to do that is with a very simple two dimensional example. So here’s my very simple two dimensional example. All right, so you see this picture Michael?>>Yep.>>So this is a bunch of dots sampled from some distribution that happens to lie on a two dimensional plane like this, okay?>>Yep.>>Now, what, so this is in fact two dimension. So we have two features here, we’ll just call them x and y. We could have called them one and two, it doesn’t really matter, this is just the x, y plane. And let me tell you what Principal Components Analysis actually does. What Principal Components Analysis does, is it finds directions of maximal variance. Okay, does that make sense?>>Variance of what?>>The variance of the data. So if I had to pick a single direction here, such that if I projected it onto that dimension, onto that direction, onto that vector. And then, I computed the variance. Like, literally the variance of the points that were projected on there. Which direction will be maximal?>>I would think it would be the one, that is sort of diagonal. It kind of, blobs along that particular direction.>>Right, that’s exactly right. And, and to see that, imagine that we projected all of these points onto just the x dimension. That’s the same thing as just taking that particular feature. Well, if we projected all of them down, we would end up with all of our points living in this space. And when we compute the variance, the variance is going to be something that captures the distribution between here and here. Does that make sense?>>Yep.>>Similarly, if we projected it onto the y axis. Which is the equivalent of.>>Those are examples of feature selection.>>Yes, of fea, it’s exactly, it’s a, it’s a feature selection. It’s equivalent of just looking at the second feature here, y. I’m going to end up comput having a variance that spans this space between here and here. By contrast, if we do what you want to do, Michael and we pick a direction that is about 45 degrees if I drew this right. We would end up projecting points between here and here. Now, it’s not as easy to see in this particular case but the variance of points as they get projected onto this line will have a much higher variance than on either of these two dimensions. And in particular it turns out that for data like this which I’ve drawn as an oval that you know sort of has an axis at it’s 45 degrees, this direction or axis is in fact the one that maximizes variance. So, Principal Components Analysis, if it had to find a single dimension would pick this dimension. Because it is the one that maximizes variance. Okay, you got it?>>Sure.>>Okay. Now. What’s the second thing what’s the second component that PCA or Principal Components Analysis would find. Do you know?>>I don’t know what you mean by second. This is now a direction. That ha, that has high variance.>>Yes.>>No.>>The first one.>>Yes.>>because it seems like you know either the x or the y is pretty high or something that looks just like that red line but it’s just a little bit tilted from it would also be very, very high.>>Right, that’s exactly right so, in fact what Principal Components Analysis does is it has one other constraint that I haven’t told you about. And that constraint is, it finds directions that are mutually orthogonal. So in fact, since there are only two dimensions here, the very next thing that Principal Components Analysis would do, is it would find a direction that is orthogonal or we think about it in two dimensions, as perpendicular to the first component that it found.>>I see. So there really only, really only is one choice at that point.>>That’s right. Or. You know there is two choices because you, doesn’t matter which direction you pick in principal.