# Principal Components Analysis – Georgia Tech – Machine Learning

Okay, so the first linear transformation algorithm we’re going to discuss, is something called Principal Components Analysis. Okay Michael?>>Sure.>>Now just to be clear here, the amount of time it would take me to derive Principal Components Analysis and work it all out in its gory detail would take forever and so I’m not going to do that, I’m going to leave that to reading. But I want people to understand what Principal Components Analysis actually does and what its properties are, okay?>>Yeah.>>Okay, so, Principal Components Analysis has a long history. It’s a particular example of something called an eigenproblem.>>Mm.>>Okay. It’s a particular example something called an eigenproblem which you either already know what that means or you don’t. And if you don’t then it means you haven’t read the material that we’ve given you so I’m going to ask you to do that. But, whether you have or have not let me just kind of give you an idea of what Principal Components Analysis actually does. And I think the easiest way to do that is with a very simple two dimensional example. So here’s my very simple two dimensional example. All right, so you see this picture Michael?>>Yep.>>So this is a bunch of dots sampled from some distribution that happens to lie on a two dimensional plane like this, okay?>>Yep.>>Now, what, so this is in fact two dimension. So we have two features here, we’ll just call them x and y. We could have called them one and two, it doesn’t really matter, this is just the x, y plane. And let me tell you what Principal Components Analysis actually does. What Principal Components Analysis does, is it finds directions of maximal variance. Okay, does that make sense?>>Variance of what?>>The variance of the data. So if I had to pick a single direction here, such that if I projected it onto that dimension, onto that direction, onto that vector. And then, I computed the variance. Like, literally the variance of the points that were projected on there. Which direction will be maximal?>>I would think it would be the one, that is sort of diagonal. It kind of, blobs along that particular direction.>>Right, that’s exactly right. And, and to see that, imagine that we projected all of these points onto just the x dimension. That’s the same thing as just taking that particular feature. Well, if we projected all of them down, we would end up with all of our points living in this space. And when we compute the variance, the variance is going to be something that captures the distribution between here and here. Does that make sense?>>Yep.>>Similarly, if we projected it onto the y axis. Which is the equivalent of.>>Those are examples of feature selection.>>Yes, of fea, it’s exactly, it’s a, it’s a feature selection. It’s equivalent of just looking at the second feature here, y. I’m going to end up comput having a variance that spans this space between here and here. By contrast, if we do what you want to do, Michael and we pick a direction that is about 45 degrees if I drew this right. We would end up projecting points between here and here. Now, it’s not as easy to see in this particular case but the variance of points as they get projected onto this line will have a much higher variance than on either of these two dimensions. And in particular it turns out that for data like this which I’ve drawn as an oval that you know sort of has an axis at it’s 45 degrees, this direction or axis is in fact the one that maximizes variance. So, Principal Components Analysis, if it had to find a single dimension would pick this dimension. Because it is the one that maximizes variance. Okay, you got it?>>Sure.>>Okay. Now. What’s the second thing what’s the second component that PCA or Principal Components Analysis would find. Do you know?>>I don’t know what you mean by second. This is now a direction. That ha, that has high variance.>>Yes.>>No.>>The first one.>>Yes.>>because it seems like you know either the x or the y is pretty high or something that looks just like that red line but it’s just a little bit tilted from it would also be very, very high.>>Right, that’s exactly right so, in fact what Principal Components Analysis does is it has one other constraint that I haven’t told you about. And that constraint is, it finds directions that are mutually orthogonal. So in fact, since there are only two dimensions here, the very next thing that Principal Components Analysis would do, is it would find a direction that is orthogonal or we think about it in two dimensions, as perpendicular to the first component that it found.>>I see. So there really only, really only is one choice at that point.>>That’s right. Or. You know there is two choices because you, doesn’t matter which direction you pick in principal.

Thank you so much!

If you want to learn something save yourself 4:22 and look for another video.

Kooool………

They keep on talking but not teaching.

This kind of explanation was just what I was looking for

Not very helpful – should have added little more detail towards the end

not good.

wow, that's a nice transparent hand. lol

could you give a definition of VARIANCE, please?

why so many negatives? this is an excellent explanation.

You spent 1.20 min on nothing in the beginning.

Thanks for the great explanation. I think this is well explained and a great introduction to the reading I need to do.

Awesome video !!! I like the way how you communicate your ideas and key points ! Looking fowarding to see ing more !

i think, it is not maximizing the variance, but minimizing it. To get higher variance you can pick any line at a far away distance.

I find this omnipresent Michael sort of hilarious

But how would that help? Why we need the perpendicular line?

Terrible.

Good job!

Less ads in the middle of the video, I can find those in the description if I'm interested.

That was useless.

explaining it in plain English to start with would have been helpful I think. Giving a broad example (without a graph) and then explaining what components mean, you know. But like conversational style. Thanks anyway

Great. I don't think the dialog is irritating.

Depending on what information you need, this is either a good or bad video. For me it worked very well, although they divided their explanation over three videos. The first video is not that much explanatory but after the third, I had a very good image of PCA.

This post helped me a lot.

https://stats.stackexchange.com/a/140579/167453

worthless

wastage of time

What a dickhead, in love with the sound of his own voice. Blathering on and on, killing off whatever interest people had for the subject. Not a good ad for Udacity.

useless

terrible video. So is the Udacity program

This video has not get the point.

Second video here: https://www.youtube.com/watch?v=_nZUhV-qhZA

Michael? Bring me the beer

"Yep."

What the heck was that? More like Principal conversation analysis

Who's Michael?

In PCA you are maximizing the length of the vector to each of the points from the center point of the data (mean value of each of the two variables you are dealing with), at the same time, MINIMIZING the distance from each point to the line that you drew….you are confusing many people.

What a dumb

here it says that pca finds directions that are mutually orthogonal around 3:56

but the graph suggests that direction towards the increasing way of the variance

but it is supposed to be just the opposite right?

pca is supposed to lower the variance right ?

to ultimately find the principal component?

just correct me if im wrong

Which software/tool did you use to create this nice video? It looks and feels like you are writing on paper. Thanks.

YES, THANK YOU SO SO MUCH

I think what many people are finding very confusing about this video, is the lack of clarity on what variance is measured relative to. Intuitively I think many people will be assuming the variance is calculated by measuring distances between points and the line. At multiple points in the video, when talking about calculating the variance, the pen is gesturing across the line (perpendicular to the line) and also red lines are drawn between the dots and the line, strongly implying that that's what is being measured (it's not). Ignoring the very misleading gestures; what the video is saying is that PCA is

maximizingthe variance of the pointsafterthey are are projected on to the line (i.e. variance along that line). The focus is meant to be on the distance between the dotted lines drawn to the X and Y axis vs the distance between the first two marks drawn on the example red diagonal line (larger due to larger variance). It doesn't explain that it's essentially equivalent tominimizingthe sum of squared distances between the points and the line. I think this is why there are lots of comments confused about the definition of variance and whether there's a mistake and maybe he meant to say/write 'minimize'.When he said that the diagonal direction is the one that maximize the variance he is WRONG.

It is the one that minimize the variance, and the orthongonal direction is the one that maximize it.

I don't know why michael is convinced