# Linear Regression Machine Learning (tutorial)

[BLANK_AUDIO] Let’s see, let’s see everybody. I’ll minimize myself. I don’t need to see myself. I need to see you guys. I need to see you guys. The world, this is Rash! I am hyped for this live session. Yo, class is in session everybody. We are about to do some math this

live session, so I’m super excited. But first of all,

let me take some roll call, all right? So, let’s see, Collin,

Brandon, Nil, David, Dakosh, Sebastian, Raj, Spencer,

Naresh, Niko, Clement, hi, guys! Michael, Benjamin, all right. So, that was roll call. Welcome to this live session for

the deporting, inter-deporting course. Okay. This is going to be so awesome. Because, I have been

waiting to do some math. Guess what guys. Guess what. I bought this pad to write some math on. Okay. I’ve never used this before so,

I’m super excited for this. I’m going to show you guys the math. Behind linear regression. By the end of this video, you guys are going to know like the back of your

hand, how to do linear regression. That includes gradient descent. And guess what?, we use gradient descent

all over the place in machine learning. Don’t worry if you don’t know what

that is, I’m going to show it to you. Okay? So, we’re going to deep dive into this. So, we’re going to start

off with a five minute Q&A, like always, and I think we’ve got some

Udacity peeps in the house as well. Drew. Nico, and Max, who’s the other instructor for the course. So, I think you’re here shout out,

say something so people know who you are and, so I’m going to

do my five minute Q&A, like always, and I’m going to answer all the questions

related to me, and my everything, but if you have any Udacity specific

questions, they will answer those, okay? So, let’s start up with

a five minute Q&A, and then we’re going to get right

into the code and that, okay? Do I have to know about

partial derivatives? We are going to do a partial derivative,

but I’ll show you how that works. [BLANK_AUDIO] I had to cut off cutie

pie to catch this. Wow I’m honored, I’m honored. Hey baby girl,

let me see that regression. All right, so that’s not a question. Let’s get some real questions in there,

some quality questions. All right. [BLANK_AUDIO] Would you want to check out

my Vive AI Assistant demo? Sure, yes, post a GitHub link in

the comments of one of my videos, I read all my comments,

I answer all my comments. See, I’m not fake,

you know what I’m saying? I answer all my comments. I’m here for you guys, have I

enrolled in- is calculus required for linear regression? Yes. A little bit of calculus, but

I’m going to go through that. Don’t be afraid by the word calculus. This is actually very intuitive. [BLANK_AUDIO] Can you mention some details

about the upcoming as well, looking to predict the genre from- [BLANK_AUDIO] All right. [BLANK_AUDIO] What basic maths will be needed? You’ll need to know basic algebra, okay? And then we’re going to learn

the calculus necessary to do this in this video, okay? Are against the future, yes. I mean the idea, between generate

model in general are really exciting, because you can generate

except don’t exist. This. And that has a lot of potential for

art and culture. GANs can change culture, right? We can generate music. We can generate art. We can generate paintings in

ways that humans couldn’t. Best book to understand math behind ML. Machine Learning and

Probabilistic Approach. That’s a pretty good one. Just mastering or coding too. Both. Mostly coding. Linear regression versus

other classifiers like SGDC. Linear regression is definitely easier. What is no free lunch? It’s a theorem, the no free lunch

theorem at a very high level it’s like, well you can’t make assumptions. You can’t make assumptions whenever

you are doing anything related to proving something. Just, when will you do NLP? Yo, I’m going to do so

much NLP in this course. I can’t wait for NLP, it’s coming up. Will you cover GANs? I kind of want to just do GANs

right now, you know what I mean? I’m super excited for

GANs, I will do GANs. [BLANK_AUDIO] I will give an intuition why to

do graded descent over, yes, I will explain that. Linear algebra is the way to go?, yes. What’s the difference between

cycle learn and TF learn? Cycle learn and TF learn,

great question. So, TF learn is a high level

wrapper on top of transfer flow. It’s very similar looking

to cycle learn, but cycle learn specifically is, it does. So, TF1 only focused on

deep neural networks. Cycle learn uses support vectrum shades,

and all sorts of other machinery models. Whereas TF1 is the same kind of. It has the same brevity, but

it focuses only on deep neural networks. Do you prefer WICA? No. No, when will you start

working on Anaconda? I mean, I’ll most likely start using

docker to contain those things. All right, rap for 50 case off,

let me rap for 50 case off. In this time,

I’m going to play an instrumental. I’m not going to just rap with that kind

of instrumental, you know what I mean? Don’t be discouraged, rap,

hip hop instrumental on YouTube, whatever it starts playing. Someone say a keyword and

then we’re going to get started. Triumph hip hop instrumental,

what is this about. Let’s go, play it. All right, let me just unplug my mic, so you guys can see this,

where’s the music. I’m going to say something,

you know what I’m saying? [MUSIC] 50k subs. I got 50k subs, man,

my mind is so fresh. I’m looking at this coffee mess,

looks like the best. I got caffeine on my mind,

it takes me so high through the sky. I got a USB 4, my my. I’m going to be writing math today,

like it’s all mine. Online, I see you man, it’s all fine. It’s all writing piece

of equations online. I see you coming back like threw

me your progression, wait. So that was it for the rap. Okay, so, that’s it for the rap. So, now we’re going to get

started with the code, okay? So, let’s go ahead and do this. I’m going to start screen sharing,

and then we’re going to get started. All right, here we go. [SOUND] Here we go, Google Hangouts. All right, and

what does Hangouts want to do? Hangouts wants to screen share. Hangouts wants to screen share. Your entire screen. Chair. All right, so I’ll minimize this,

and minimize, and then I’ll move this out of the way,

so I can see what you guys are doing. Okay, and we’re going to code this baby. Okay, I am in the corner here, let

make sure that you guys are seeing is, what I want you to see. [BLANK_AUDIO] Yes, what you guys are seeing is exactly

what I want you to see, perfect. All right. [BLANK_AUDIO] Okay, so here we go. [BLANK_AUDIO] Here’s what we’re going to do guys,

let me make this statement [SOUND]. This is big enough right? So in this lesson,

we’re going to do linear regression. And what is linear regression, right? So linear regression, in this case, and

let me make sure everything’s working. Everybody’s here, live chat’s working,

live video is not working. Okay, so here’s how it goes. So we’re going to do this, okay? So this is going to be

called linear regression. This is linear regression and

let me just show you guys. The best way to explain it is

to show it through visuals, so I’ll show it through visuals,

what exactly we’re going to be doing. And to show you visually,

I will give you a link to this, and I will just show it right here. This is what’s happening. So we have a set of points, and these

points are the test scores of students, and the amount of hours studied, okay? So this is what it looks like. So this, right on the right, this graph

here, these set of points are the set. The x values are the amount

of hours they studied and the y values are the test

scores they got. Okay. And intuitively, to us, there must be some kind of

correlation between these two values. But we want to prove this

programmatically, we want to prove this, I’m sorry, mathematically, we want to

prove that there is a relationship. And how do we prove that

there is a relationship? We draw a line of best fit. So how do we know what that line of best

fit is, or that linear regression is? Well we don’t know, we don’t know. We have to find that, and the way we’re going to find the line

of best fit is using gradient descent. And that process,

that training process looks like this. We’re going to draw a random line,

compute the error for that line. And I’ll talk about how we’re

going to compute that error. And that error value is going to say

how well-fit is this line to the data? And then based on that error,

it’s going to act as a compass. It’s going to tell us, well, how best should you re draw the lines

to be closer to the line invested. And we’ll keep doing that. So, it’ll be like draw a line, compute

error, draw a line, compute error, until eventually the line that we draw is the

optimal line that we should draw, okay? So, that’s at a very high level. But now I’m going to

go into the code and we’re going to talk

about this in detail. All right, so

lets go ahead and start it. [BLANK_AUDIO] So to start off, to start off I’m

going to write my main function, okay? So let me move all this stuff out of the

way, so I’ll get right into the code, all right? I’ll get right into the code. And guys, if people have questions and

I’m not able to answer them because I’m busy doing something,

please help me answer questions. I very much appreciate it. I very much appreciate it. Okay? So let me just start off by

writing the main function. What does the main function do? That’s where the meat of the code goes. Right, okay so in the main function,

we’ll write a run function, which is where we’re going to

store all of our logic. Okay, so let’s write up a run function. So the run function is a chance for us

to show what we’re doing at high levels, at a high level. So step one, is collect our data, right? Always in machine learning,

we want to collect our data. So we’ll get our data points. And what we’re going to do, how are we

going to collect our data, right? Well, to collect our data, we have to

import the one library that we’re using. I know guys,

we’re using a single library. And that library is NumPy, all right? And we’re going to use this little

symbol that means we don’t have to continually say NumPy whenever we

call its method or its functions. Okay, so what is the function

we’re going to use for NumPy? So the function we’re going to use for

NumPy, I’m sorry, right, main,

thank you, main, good call. So, the function we’re going to use for

NumPy is genfromtxt(). And what this is going to do, is it’s going to get the data

point from our data file. And let me show you guys

the data file as well. But basically we’re going to

separate it by the compass. Okay, and

we’re going to get those points. So what does this,

what does this data look like? Well, let me pull up terminal, and show you guys exactly what

this data looks like. So it looks like beta. Okay? So let me zoom in on this. Zoom, way more. 200 zoom. So these are just the hours studied,

on the left side, and then the test scores for a bunch of students, for

an intro to computer science class. Okay? The hours studied and

the test scores they got. Okay, so

that’s what we’re going to pull. That’s our data set. That’s what we’re going to

pull into our points variable. So, points is going to contain

a bunch of xy value pairs. Where x is the amount of hours

studied and y is the test score. Okay? And it’s separated by the comma. Okay, so that’s step one. We’ve done that, and genfromtext is

essentially running two main loops. The first loop converts each line

of the to a sequence of strings. And the second one is converting each

string to the appropriate data type. Okay, so that’s step one. Now, step two is to define

our hyperparameters. Okay, in machine learning, we have

what are called hyper-parameters. These are tuning nuts for our model. They are basically the parameters

that define how our model is analyzing certain data. How fast it’s spinning through the data. What operations performing on the data. There’s a whole bunch

of hyper-parameters. Thank you for the feedback. There’s a whole bunch

of hyper parameters and what we’re going to use

is the learning rates. Now the learning rate is used

a lot in machine learning, and it basically defines how

fast should our model converge? Convergence means when you

get the optimal result, the optimal model,

the line of best fit, in our case. That is convergence. So how fast should we converge? You might be thinking, well, shouldn’t

the learning rate just be a million, if you want to converge super fast? Well, no. Like all hyper-parameters,

it’s a balance, okay? So if the learning rate is too small,

we’re going to get slow convergence. But if it’s too big, then our error

function might not decrease, okay? So it might not converge. So, that’s our first hyper-parameter. Our next hyper-parameter is going

to be the initial value for b, and the initial value for m. And what is b and m? Well what we’re going to do, is we’re

going to calculate the slope, right? So this looks like a y equals mx plus b,

and so this is why I said we only

need to know basic algebra. This is the formula,

this is the slope formula, okay? All lines follow this formula, where y. So, m is the slope, b is the y

intercept, x and y are the points. Okay, so that’s the line, okay? So, this our initial b value,

our initial slope, and our initial y intercept. They’re going to start off as 0, okay? So, and then the last type of parameter

is going to be the number of iterations. How much do we want to train this model? Well, we have a very,

very small data set. There’s only a 100 points, okay. And for that, we’re not going to need to iterate

a million times or 100,000 times. We’re just going to iterate 1,000 times. Okay? So that’s our hyper-parameters, and now step three is going to be to fit,

train our models. It’s train our model. Train our model. Okay, so the first step is going to be to show

the starting gradient descent, okay? At b equals,

what is the starting gradient descent? It’s going to be zero, right? And then m is going to be the starting

point, for that we’ll say one. And this is just for

us to see the difference here, okay? [BLANK_AUDIO] All right, .format(initial_b,

initial_m). And so, what’s happening here? [BLANK_AUDIO] Compute error, for_line_given_points. So, all right, let me just write

this out and I’ll explain. initial_b, initial_m,

and then the points. Okay, so what’s happening here? Let’s go over what I just wrote here. So, in this line, we’re going to show

the starting b value, the starting m value, so what is our starting

y-intercept, what is our starting slope. And what is our starting error? And I’m going to show you how we’re

going to calculate that error. And to get that error,

given our b and m values, we have this function here called

compute_error_for_line_given_points. It’s going to take the b,

m and the points, and it’s going to compute the error for

that and it’s going to out put that. So, that’s going to be

our starting point, okay? And then, now, we’re going to actually

perform our gradient descent, and it’s going to give us

the optimal b and the optimal slope. I’m sorry, it’s going to go to the

optimal slope and the optimal y descent. So, for gradient descent,

we’re going to call this method the gradient_descent_runner,

so a given point. Given an initial b value, I’m sorry

initial m value given our learning rate, so this is where we’re going to

use all that kind of parameters, right?, because this is where

we’re training our model. So, number of iterations. Those are all the things we need for

this, okay?, and we’re going to define this

function in a second. We’re going to go deep dive and

define these functions. Okay, so then after we print our model,

well now we can just print it out, right? So, let me just copy and paste this. So, now, this is not our starting point,

this is now our ending gradient, ending point. So, face our ending point where b is

two, m is two and then error is three. And this number just define. What we’re going to see at the end. For the number of iterations for b. And then- [BLANK_AUDIO] For m and then for computing the error

for line at given points given that the final b, the final m value,

and then our points. Okay, so. [BLANK_AUDIO] Okay, so that is high level,

what’s happening here? So, all I did was I just

printed out the initial b and m value, which is nothing,

and then the error, and then I computed the rate of descent,

and then I print out the final values. So, I’m about to do this now. Okay, so we haven’t actually done this,

now we’re going to do it. So, the first thing I’m

going to talk about is, how we going to compute that error. Let’s write at that first function. What was that first function called? It was called

compute_error_for_line_given_points. Okay, so and the data set I’m

going to provide that as well, but let’s go ahead and

run up this method okay? So, this is the first step. We’re going to write up this method. Compute error for line at given points. Okay, I’m so

excited to show you guys this, because I get to use my math pad for

a second. Okay, so let me write this out,

okay?, hold on. Okay, here we go. So, let me write this out. Okay, so we’ve got a line here. Man, what a great line that is. Okay, so this is our plot, okay? And, so

we’ve got a bunch of data points here. We’ve got a bunch of data points. Write this all over the place and what we are going to do is to draw

a random line through the data. We don’t know the line invested, so we are going to draw a random

line through the data. And then, we are going to compute

the error of that line, so that error will tell us

how good our line is. Okay, so

how do we know how good our line is? But what we’re going to do is, we’re

going to go for every single y value, on that line we’re going to calculate

the distance from each point from our data to the line. Okay, so all of these distances,

all of these distances, distance one, distance two, distance three, distance

four, distance five, distance six and then you probably have more data

points down here, these distances, the distance to this line. And, so we’re going to take all those

distances and we want to sum them. And, so let me show you the equation for

that, okay? So, rather than actually

writing out this equation, like really sloppily, I’m going to

show it to you using this, okay? So, okay. So, this is the equation. So, let me explain what this is. So, we got all those distances,

right?, we got all those distances. We’re going to sum those

distances together and that, and I get the average of that. But guess what, we’re not just

going to sum those values alone, we’re going to square those values. And why are we squaring those values? Because, we’re squaring those values,

because we want it first of all to be positive, and it doesn’t really

matter what the actual value is. It’s more about the magnitude

of those values, right? And we want to minimize

that magnitude over time. So, this is the equation for that. Okay, so let me explain what

the hell this is, okay? So, we’re computing the error. We are computing the error

of our line given m and b. So, given m and b we are going to

compute the error of our line. M is our slope and b is our y intercept. So, this E, looking thing,

is called sigma notation. It’s a little weird,

giving you guys a little refresher here. This E thing, we’re going to see it a lot in machine

learning, it’s called sigma notation. And basically it’s a way of describing, calculating the sum of a set of values,

all right? So, the sum of a set of values,

which is what we’re doing. We’re calculating the sum of a set

of points, so if the starting point is where i equals 1 and the ending

point, and N is for every point. Okay, so for every point, you want to

calculate the difference in y values. So, it’s y-(mx+b). And why do we say (mx+b)? Because in the sub equation,

N y equals (mx+b) right? So, it’s y-(mx+b),

which essentially boils down to just y. So, it’s y minus y squared. And then we’re doing that for

every single point. And, so we’re going to add

all of those points together. Okay, and then get the average. And, so that why 1/N. Because we’re going to

get the average of that. And that’s value. That value is the error. Okay?, so at high level,

that is what that is. So, now let’s programmatically

write this out, okay? So, we’re going to start by initializing

the error, initialize it at zero. Okay?, so our total error at

the start is just going to be zero. There’s not anything that’s- [BLANK_AUDIO] We don’t have an error yet, okay? So, then for every point, so for

i in range of starting at zero, and then going for

the length of the points, right? So all of our data points, so for

every data point that we have. We’re going to say,

let’s get the x value, so x=points [i, 0]. And then we’re going to

get that y value, right? So, get the y value, right? So, I’m just basically

programmatically showing what I just talked about mathematically. Right? So, we’ve got the x value,

we’ve got the y value. And we want to compute that distance,

right? We’re going to do this

every single time. [BLANK_AUDIO] Then get the difference. [BLANK_AUDIO] Square it, and then add it to the total. Okay, so

here’s the actual equation, right? So, we’re going to do plus equal,

because it’s a summation, and we’re going to programmatically show what I

just talked about right here, right? y-(mx+b) squared. Okay? And we’re going to get the sum of that. So, y-(m * x + b) squared, okay? And we’re going to do that for

every point, so this whole iteration loop right here, is that equation,

okay?, minus the average part. So, that’s going to give

us the total value. The last part is to average it. So, we’ll take totalError

/ float [len[points]). So, we want it to be a float value. [BLANK_AUDIO] And that is the equation. That is the equation right there. Okay, so and then get the average. Get the average. [BLANK_AUDIO] So, this ten line

function just described, what I talked about right here

in this math equation, okay? We sum all the distances between all

those points, as I showed right here. We summed them all up, we squared

them and then we got the average. And that is our error. Okay? And we’re calculating that,

because we want a way for us, a measure of us,

something to minimize over time. Right? Something to minimize every

time we redraw our line, we want to minimize this error. Because this error basically is

a signal, it’s a compass for us. It’s telling us,

this is how bad your line is. It needs to get better. You need to make me smaller. I’m really big right now,

make me smaller. And that’s what gradient descent does. That’s what gradient descent does. And I’m going to explain how

gradient descent works in a second. But that’s that curves function, right? Okay?, what was the second

function we wrote? It was called gradient descent runner. So, this is our actual

brain descent function. So, now let’s write this out. Okay?, this is our second of

three methods, before we’re done. So, gradient_descent_runner. So, given a set of points,

given a starting value for b, given a starting value for m, given our learning rates and

given our number of iterations. We’re going to use all of these things

to calculate gradient descents. We’re going to use every single thing. Okay? [BLANK_AUDIO] Okay. So, let’s get that starting b and

m value, okay? So, the starting value for

b, we’re going to say to b. And the starting value for

m, we’re going to say to m. Okay? Simple enough. And now,

we’re going to perform gradient descent. What is gradient descent? I cannot wait to explain

gradient descent, guys. I found the perfect analogy for gradient

descent, and I’m really excited. Okay, before I explain that. Let’s just perform that you can erase

this, because the actual math is going to start in the last function

that I’m about to write. So, for

every single iteration that we define, we’re going to perform what’s

called gradient descent. So, we’re going to update b and

m with the new more accurate b and m by performing a gradient descent. By performing this gradient step, okay? So, b and m, we’re going to returned b

and m by performing this gradient step. We can already explain,

this is where the math is happening. Given out current b, our current m, given r the array

of points that we have. And then finally given

the learning rate. We’re going to calculate

that final value of b and m. And guess what? Once this gradient descent is done. We’re going to return that optimal e and

f, right? And, so that’s what we talked

about at the starting part, right? We returned that optimal b and

m and value. And before the gradient descent,

and then we then printed it out, because that optimal b and

m value gave us a line of best fit. We plug them into the y=

(mx+b) equate the formula. It gave us the line of best fit. So, now we’re going to write

out the gradient step. And this is gradient

mother f-ing descent. Okay, so

this is how it’s going to go down, okay? Here’s how it’s going to go down,

step_gradient. So, I’m just going to say, it’s time for

the magic, the magic, the greatest, the greatest, okay? So, that’s how excited am I,

just wrote the greatest twice. Okay, [LAUGH]. So, given our current b and

m values points and the learningRates. And this actually isn’t going to

help with that, so I’ll delete that. So, here are learningRates, okay? Let’s perform gradient_descent. So, okay, what is gradient_descent? Okay, so let me show you guys this. [SOUND]

How best do I describe this? So, we have. [BLANK_AUDIO] Let me just show you this image. This is going to help a lot. [BLANK_AUDIO] Okay, so this is a graph. So, let’s just look at the graph,

I mean it’s the same graph. It’s looking at it from

two different angles. It’s the same graph, okay? So, let’s look at the one on the left,

just to pick one. It’s the same graph though. We have a bunch of y values,

sorry a bunch of b values, and a bunch of m values. And then we have that error, right? That error that I just talked about,

right? So, given the 2D graph of b given

are every single y intercept, we could have given every single m

value we could have, what is the error? Okay, so for every y intercept and

slope curve what is the error? And, so we will find this is

a three dimensional graph. This is a three dimensional graph. Because the error value it’s kind

of like, it’s start up high, and then I do approach what’s called

the local minimal in our case. A local minimal, which is the small

that point at the very bottom, that is our that is where

we’re trying to get to. Okay so. Given a set of y-intercepts,

and given a set of slopes. Possible y-intercepts and possible slopes, we want to compute

the error for those three things. And if we were to graph the relationship

between these three things, it would look like this. Now, it tends to always

look very similar to this. In more complex cases we’d have many

minimal, we’d have many little values. But what we’re trying to do is get that

point, where the error is smallest. And, so how do we get that point

where the error is smallest? Well, we’re going to perform

what’s called gradient descent to get that smallest point. That value, smallest point. And a great analogy for this is a bowl. So, let me just search bowl, okay? It’s kind of like a bowl. It’s like we drop a ball into a bowl,

and we want to find that point, where the ball stops,

that endpoint, the lowest point. That b, m value is our optimal

line of vested fit value. Okay?, and the way we’re going to

get that is gradient descent. We’re going to descend, right?,

we’re descending down the bowl using the gradient, and

gradient is another word for slope. We’re going to descend down that bowl

until we get, through iteration, that lowest point. And gradient descent is used. Everywhere in machine learning. Okay? It is like the optimization method for

deep neural networks. It’s not that apparent right now. But know this. Know and understand gradient descent

like the back of your hands, because it is going to be very

useful in the future, okay? So. I don’t know why I’m

doing that equation. That was unneccessary. That was the equation for the sum of squared errors that we just

talked about, sum of squared distances. So, how are we going to

calculate that gradient descent. Well, now let’s actually do it. So, [BLANK_AUDIO] For our step gradient function, we’ll start off with an initial

gradient value for a b. So, b is going to be zero and x gradient

is going to be zero as well, okay? These are the starting points for

our gradients and gradient means slope. And, so the gradient is going

to act like a compass, and it’s going to always point down hill,

so this is what I mean by, once we calculate that error,

it’s going to act as a compass for us. It’s going to tell us. Where we should be going? What direction we should be going? How we should next redraw our lines. So for- [BLANK_AUDIO] Okay, someone asked why is

the lowest point the best? The lowest point is the best, because

it is where our error is the smallest. And when our error is the smallest, that’s when we have

the line of best fit. When the error is smallest,

that b and m value, those two, what we plug into our slope equation, is

going to give us the line of best fit. So, that’s why we’re

calculating the error, okay? So. [BLANK_AUDIO] So, for i in range[0, len[points]). [BLANK_AUDIO] Okay, so what we’re going to do is we’re

going to iterate through every single point on our scatter plot. Okay, so every single data point that

we have, we’re going to collect it. Okay, so we’re going to say, okay,

what is, so for google our first point, right? First point,

which gives us an x value and a y value. X value and y value. So, let me also write out

a little comment for this. Starting points for our gradients, okay? [BLANK_AUDIO] Now, we’re going to get the direction

with respect to b and m. Now, this is the last part, but

it’s a very, very important part. And this is where calculus

comes into play, okay? So, I’m going to talk about

how we’re doing this. Okay, so let me talk about

what we’re about to do. So, what we’re going to do, is so,

given for every single point, for every single point that we have, we’re going to calculate what’s

called the partial derivative, okay? It’s called the partial

derivative with respect to b and with respect to m, okay? And what that’s going to do, is it’s

going to give us a direction to go for both the b value and the m value, right? So, remember, in this graph,

we want a direction, right? We want to be going down the gradient. And, so on this left hand side

you see this gradient search. The m values and the b values are

increasing in the direction that they should be, because gradient intersect

is essentially a search policy. It’s a search policy. We’re trying to find

that minimum error value. Okay? And what we’re going to do to get that, is we’re going to compute the partial

derivative with respect to b, n, and f. Okay, let me show you the equation for

the partial derivative, okay? The partial derivative is

going to be right here. [BLANK_AUDIO] So, this is what the partial

derivative does. The partial derivative, we call it partial, because it’s not

telling us the whole story, right? We say, it’s partial, because we’re

calculating it for both b and m. There are two different dates. And, so

it’s going to give us the tangent line. So, it’s going to give us this

line as you see right here, right? See this line,

that line is our direction. And we’re going to use it to

update our g and m values. Okay? So, that’s what that is. And let me also show you the equation

for the partial derivative, because we’re about to write it out. So, here’s what the equation for the

partial derivative with respect to m and b looks like. Okay? They’re two different equations, right? So, let’s talk about the one on top. So, this little curvy thing

that you see up here, that just signifies that this

is a partial derivative. That’s that signifier that

this is a partial derivative. Now, we talked about sigma notation,

right?, because it’s a summation of values,

right? And that’s what we’re doing. We’re summing the partial derivative for

all of our points, okay? For all of them to compute

that gradient value, okay? And the partial variable with respect

to m and b is going to look like this. So, let’s write this out, okay? So, the b gradient, so

it’s going to give us two values. So, the b gradient is

going to be plus equals. And then what was it? Let me look at the equation again. 2 over N, so

negative 2 over N, all right? [BLANK_AUDIO] Thanks good vibes. And then it was y minus, right? And these are the equations,

they are laws. They are beautiful laws,

that always stay the same. And they give us a way of understanding the direction that we want to move in. Okay, so, b_current. Okay so. All right, so then we’ll do the same

thing, and what was the second equation. It looked pretty much the same,

minus it doesn’t have this x, right? The second one doesn’t have this x,

right? So, we’ll say, but it does have this 2N. It does have this 2N,

and then it does have [BLANK_AUDIO] Let’s see. Let’s have this x. It does have (y-([m_current * x). [BLANK_AUDIO] + b_current, okay? Okay, so now, we’ve computed

our partial derivatives, right? So, let me one more time show you guys. It’s giving us directions to go for

both b and m. And remember, they’re partial. It’s not telling us the whole story,

it’s telling us what direction should we go for b, and

what direction should we go for m? And it’s going to tell us the direction,

remember a bowl to get to that bottom point, where that error is

the smallest right here, okay? So, right here where my mouse is,

that point is what we want to get to, and that’s what the partial

derivative is going to help us with. So, once we’ve computed

the partial derivatives, the sum of them with respect to b and m, now we’re going to update our b and

m values, right? So, we’re going to use that

to update our b and m values. And guess what? This is our last step. This is our last step using

this partial derivative. [BLANK_AUDIO] Using our partial derivatives,

right plural? There’s two of them. So, and that’s going to give us a new

value for b and m, our updated b and m value. So, we have our current value for

b whatever it is, that we fed into the separated function

that keeps updating every time. And this is where our learning_rate

comes into play, okay? This is why our learning rate is so

important, because it defines the rate at which we’re updating our b and

n values, right? So, remember that 0.0001, right? And then also our n_current. [BLANK_AUDIO] That is learning_rate, [BLANK_AUDIO] Times the m gradient. Okay, and

then it’ll return those values. And we’re doing this every time, right? This is new b, and new m,

they our final b and m. It’s a step function, where we’re

doing this every iteration, right? We’re doing this for

the number of iterations we had 1000. But it’s going to return a new b and

m value every time. And guess what guys? That’s it for our code. That was it, so

let’s go over what we’ve done. Okay, but actually let me check for

errors, right? [BLANK_AUDIO] Let me check for errors, and

then I’m going to answer more questions, because I really want to make sure you

guys understand how this works, okay? So, let me demo this. So, python demo.py Only and

is not defined. Okay, right, guess what. I didn’t define N. N is the number of points. Length of points. Okay? So, let’s go. Learning rate is not defined. Where? Where is learning rate not defined? Learning rate is not defined. Wait a second. Yeah, right. Learning rate, right. Okay, what else is bad? I’ve got an overflow for double scalars. [BLANK_AUDIO] 14 y minus [BLANK_AUDIO] Uh-huh, uh-huh, uh-huh. [BLANK_AUDIO] [INAUDIBLE] Okay, so. What’s going on here? Okay, let’s save this. So yeah, it printed out the final, okay

so it got our final value right here. And if we wanted to,

let’s see, hold on a second. If we wanted to,

we got our backup here just in case. So right? So let me blow this up. Like way, way up. Let me just separate it. So this is what our outputs

going to look like. Right. So boom! Just like that. That’s how fast it trade,

in milliseconds. Why? Because our data set is so small. Okay, it’s data set was so small. Alright, so. That what’s happened and

after a thousand iterations, we got the optimal b and m values. So, right as we start up with b and

m at o at we calculate the error for our random line that we drew and

it was huge. But, eventually, after running

gradient descend we got the optimal b, the optimal m and

the lowest error point, which is at the smallest

point in the bowl. And we to do that we use gradient

decent with respect to b and m. Okay so let me go over one last time

every single thing that we just done. Is to really go over it and then will

do my last five minute Q & A okay. So we start out by collecting

our data set, right. Our data set was a collection

of test scores and the amount of hours studied, right. The x y value the test scores and the amount of hours studied

a two variable data set. Then we define our type of parameters

for our linear regression. Our learning weight, which talks

about how fast we should learn, our initial BNM values for

the slope equation: y=mx+b. The number of iterations, 1,000,

because our data set is pretty small. And then we ran gradient descent. So, what did gradient descent look like? Well for every iteration, for a thousand iterations, we computed the

gradients with respect to both b and m. And we did that constantly,

until we got that optimal b and m value. That gives us that line of best fit. Now, how did we compute the gradients? To do that, we said, okay, we’ll have a starting point

of 0 for both of those gradients. Remember, gradient is just

another word for slope. And then we said, okay so for every single point in our scatter plot,

for our data, we’ll compute the partial derivative

with the respect to of both b and m. And those two values are going

to give us a direction, a sense of direction of

where we want to go. How do we get to that lowest

point in that goal, right? That three dimensional graphic,

that lowest point and we use the learning rate to determine

how fast we want to update our DMN values, we got the difference

between the current value, and what we had before, and we return that. So for every point, we did that for

a thousand iterations, okay? And that’s what gave us the output and it looks like, visually,

it looks like this. [SOUND] Right? It’s like up, up, up, up, up,

up, up, up, up, up, up, up. It’s kind of like Wheel of Fortune,

right? It starts off fast, and it gets slower

and slower as it approaches convergence, the word we use when we have the optimal

line of best fit, convergence. See, let me do it one more time. Up, just like that, okay? So that was that, and now I’m going to screen share and

do a last five minute Q and A. Alright, stop screen share. Hi everybody, okay,

let me bring you guys back on screen, do my last five minute Q and

A, ask me anything and yeah. How’s it going everybody? [BLANK_AUDIO] Any questions? I’m open to questions. [BLANK_AUDIO] Where did I use NumPy? It’s at the very top. So, right, what’s the practical

use of linear regression? Great question. Any time we want to find

the relationship between two different variables. And then in more complex

cases there could be more. But, we want to prove mathematically. Right? Math is all about proving things

in a way that is unfalsifiable, that no one can say,

hey, that’s not true. Well I can prove it mathematically. So it’s a way to show the relationship

between two value pairs. So maybe housing prices,

and the time of year right? What is the real estate

market going to look like? Any time intuitively you think there

was a relationship you can prove it with linear regression, but

really I did this to show Grady the set. That optimization process is very

popular in Deep Learning and we’re going to use that in our Deep,

Run networks on the rest of the course, okay? And, why a device for this google? Because it is the deepest learning

library that is out there right now. That’s why. And, of course it would be,

because Google knows what they’re doing. They handle billions and

billions of queries every day. They have to be able to do

machine learning at scale. And, problems, they solve problems that

no one else has even thought of solving. And all of those solutions

are found in TensorFlow. For machine learning or

please think of the eye doctor. You can create a classifier to

classify between different types of disorder that you see in an x-ray. That’s going to augment doctors at

first, but eventually replace them. How about fitting a quadratic

curve inside of a linear line? We could do that as well. [BLANK_AUDIO] I’m going to provide the data set and

the code. I can talk slower, sure. How to find the optimal morning rate? That’s a great question. There’s several methods of doing that,

but that’s great intuition. Sometimes we can use machine learning to

find the optimal hyper-parameters, so it’s kind of like machine learning for

machine learning, but we’ll talk about that later. This is the first course,

he just calculates,I’ll do more of that in the future,

I’m going to keep doing calculus, okay? Two more questions then

we’re good to go, two more. How would you recommend me

to start machine learning? Watch this series. And watch my Learn Python for

Data Science series, watch my Intro to Tension Flow series, watch my

Machine Learning for Hackers series. Watch my videos. Why is your Udacity too extensive? I didn’t decide the price guys. I try to get it low. It’s whatever. You get paid graders for that okay. And grading is not cheap,

okay human graders. But look all the videos are going to be

released here on my channel all right. So I’m here for you guys, okay? I’m trying to grow my brand. I’m trying to grow myself,

Sharad Ravel, okay? [BLANK_AUDIO] This is the end, okay? So that’s it for the questions. And all right, so for now, I’ve gotta, [BLANK_AUDIO] Shoot a findings scene. For my next video. What? Yeah, so, thanks for watching. [SOUND] Love you guys. I’ll post the link in the comments

right when I’m done alright? The video description. I’ll post the GitHub link, and

then the data set, everything. So don’t go to the descriptions

within the hour, okay? Bye! Okay. [BLANK_AUDIO]

Hey Man!!.. You are doing a GRRRRRRRRRRRRRRRRRR888888888888888888888 JOBB!!!!!!!!…. All these Stuffs For Free……HATSS OFF!! I'm a CSE undergrad. from India & I'll praise u more on next comments :P…. Power to You Bro!!! btw… do you live in India??

P.S. I haven't commented ever on Youtube… Its very Hard for any1 to get me commenting….. Not sayin that m a gr8 person or so.. But ur Gr8 Work made me do this… Awesome Man Awesome!!!!!

Dear Siraj,

Why is it necessary to compute the partial derivate of the error function no. of points times?

It has been used in step_gradient function.

Thanks!

Does the product moment correlation coefficient 'r' have to be greater than a certain value? In order to actually find the line of regression?

I fucking love this guy. Also, is gradient descent essentially an optimization problem?

I don't understand the details, but it's nice to see the general 5-step format of creating Python code from linear regression. Thank you for showing how real-world data is used in programming, and for explaining how it's a smaller piece of building steps to the outcome, so eventually the outcome can just be used in machine and deep learning (in other videos). Thanks Siraj!

Hello Siraj:

I con't understand the 'learning_rate', what is it used for? and how to decide it's value?

Hi Siraj, this was amazing, thank you!

I have a doubt about partial derivatives. At 35:30, the formula doesn't include the summation of 2/N, right? (Since its before the Sigma notation)

But at 37:42, you seem to have summed 2/N in both. So is there something I'm missing?

i tried to run it on python3 and received a lot of syntax errors, someone can help me?

apparently, its all about the print function, but i dont get it since i pick the code from here

Audio has a noticeable amount of eco, makes it hard to focus, is it the same when using udacity?

Super grateful for these videos. It's much better than reading page upon page!

Where do I get the links for all the datasets?

Hi +Siraj Raval can i ask how do you put the python tensorflow in website if I wanted to launch at website?

Hi Siraj, lovin your videos so far. Very informative, direct to the point and fun to watch.

I'd like to ask something though which I'm having hard time composing the right question so I tried to rephrase it 3x, I hope you get what I'm asking.

1. What did we prove for finding the line of best fit for the data in the demo?

2. I mean how would you explain the result of the training process?

3. How do you explain the relationship of amount of hour study vs the test score?

That is something that is not clear to me.

the rap god!

you missed the '-' sign for your m gradient thats why you got an overflow right?

GUYS HIT LIKE FOR SIRAJ..

Siraj you are the best man teaching ML, AI n stuff practically I have come across…Please do not stop at any point no matter what …You are inspiration for people trying to learn these things ..

Hey siraj if you read this please reply comment !

I guess I must not know Python syntax well enough (I've used it, but am no expert at it yet). In the print 'starting gradient descent…' line, what are the {0}, {1}, {2}, and where are they coming from? I think they might be arguments going into the function, but I don't see any arguments in the run() function signature.

Thanks!

For some reason, the code seems a lot easier to read to me than the equation. Combined with the intuition offered, it's no problem. However, those print lines seem quite cryptic.

We should make some kind of simulation with a few dots on the graph, where you can move the line around by hand and watch the iterations of the sum equation with the totals on the side.

It might not be immediately obvious what that graph was showing. If we filled out the graph of the error based on the simulation, then it would be so intuitive that people could see it.

Thank you for sharing.

Great introduction Love u

Hey Siraj you are really an inspiration.

Can you please guide me how can we use deep belief networks for regression problems? As most of the examples given online are for classification-/mnist data sets ,

Nice course Siraj u explain hard topics fast and make it sound easy + u give a practical demo but to learn ML, I feel we use your course as a summary or a recap to a course by Andrew NG, i know its boring there but that should be a pace to learn something new. In here every second of have ton of info

I got my final b and m values. How do I plot my final line that cuts the dataset best?

Hey I Appreciate you're Time !! thanks I've become Very into this!!

Thanks!!!

Siraj , First of all this is dope, Amazed how you do it ?and What all things you have gone through for it .Finally, I am really great fan of your work man keep making such great content.

I really liked this .But how to write code for linear regression if more columns like y~x1+x2+x3+x4…..

starts at 8:00

Hello Siraj, where did u get the file data.csv file from? did u write it on your own?

Cant we use regressor of the LinearRegression class from sklearn package?

I made my own linear regression algorithm, and found that just calculating the partial derivative by the slope between two very close points worked well. I'm not sure exactly what your equation does, but doesn't it use the same logic? After all, we don't have the literal equation to take the derivative of.

How is OLS approach efficient than the gradient desent approach ?

Hi Siraj, for linear regression what will happen if we pass on the initial m amd b values from OLS regression and then apply gradient descent keeping the learning rate as in the example, aklso how do we narrow in on the value of learning rate and number of iterations

Where does the formula on 35:50 come from? Any additional material to understand that part?

it Starts at 8:04

How accurate is this? I run the same data through excel and this what I got, y = 1.322x+7.991. but in the code you demonstrated, those values are varying so much. How can b value vary so much? I something I'm missing here? someone, please help me to understand this.

Hello to you Siraj, I appreciate enormously what you do, I would like to know the knowledge to acquire to be able to follow a course on deep learning

Linear Regression: The Easier Way

https://medium.com/@sagarsharma4244/linear-regression-the-easier-way-6f941aa471ea

Thanks siraj for every thing. Could you provide some more insights to how to determine correct learning rate & Number of iterations. that will really helpful & you are awesome.

Can you give a code for multiple linear regression 🤗

Siraj get a proper mic, voice is not that clear. Thank you so much for the video!

This is definitely the right implementation of gradient descent. However, you didn't include any vectorization in your implementation, which is crucial for optimal numerical calculations. Therefore, I don't agree with this being "the right way" of doing linear regression with gradient descent.

Thanks for taking the time to show another way of doing gradient descent in python. The video is ok.

P.S.: I didn't see the "perfect analogy." I just saw an average explanation. But is good to see that you love yourself, lol. Very respectable. Regards.

where is the live class held?

I am working on regression and tried to code your lecture example. but i found the following error, i couldn't solve it. C:Python36pro>regression.py

File "C:Python36proregression.py", line 43

^

SyntaxError: unexpected EOF while parsing , Nothing is written on line 43, my code ends on line 42

Why is it ok to literally do a video based on Matt Nedrich's article ? It's clearly obvious you were looking at his article on your second monitor as you were typing the code and when you talked.

How do we visualize the scenario where the problem has more than one feature for example in the problem you stated, the number of features mentioned is 1,i.e. the number of hours a student studies with the corresponding y values(the marks scored by the student). What if we add another feature say #Number_of_books_referred, how do we visualize this scenario?

So now value of y will be depended on #Hours_Studied and #Number_of_books_referred. Can we still visualize this in 2d?

##this is data of dataframe

## 2104 399900

##0 1600 329900

##1 2400 369000

##2 1416 232000

##3 3000 539900

##4 1985 299900

##5 1534 314900

##6 1427 198999

##7 1380 212000

##8 1494 242500

##9 1940 239999

##10 2000 347000

##11 1890 329999

##12 4478 699900

##13 1268 259900

##14 2300 449900

##15 1320 299900

##16 1236 199900

##17 2609 499998

##18 3031 599000

##19 1767 252900

##20 1888 255000

##21 1604 242900

##22 1962 259900

##23 3890 573900

##24 1100 249900

##25 1458 464500

##26 2526 469000

##27 2200 475000

##28 2637 299900

##29 1839 349900

##30 1000 169900

##31 2040 314900

##32 3137 579900

##33 1811 285900

##34 1437 249900

##35 1239 229900

##36 2132 345000

##37 4215 549000

##38 2162 287000

##39 1664 368500

##40 2238 329900

##41 2567 314000

##42 1200 299000

##43 852 179900

##44 1852 299900

##45 1203 239500

import numpy as np

import pandas as pd

df=pd.read_csv('DataSet.txt',sep=',')

m=df.iloc[:,0].size

print("length of data {0}".format(m))

learningrate=0.1

print(df)

def error_calc(df,slope,intercept):

err=0

for i in range(0,m):

x=df.iloc[i,0]

y=df.iloc[i,1]

err+=((intercept+slope*x)-y)**2

return (err/float((2*m)))

def gradient(df,init_slope,init_intercept,learningrate):

new_slope,new_intercept=0,0

for i in range(0,m):

x=df.iloc[i,0]

y=df.iloc[i,1]

new_intercept+=-(2/m)*(y-(init_slope*x+init_intercept))

new_slope+=-(2/m)*x*(y-(init_slope*x+init_intercept))

new_slope=init_slope-learningrate*new_slope

new_intercept=init_intercept-learningrate*new_intercept

print(new_slope,new_intercept)

return [new_slope,new_intercept]

def run_gradient(df,init_slope,init_intercept,learningrate,n):

slope=init_slope

intercept=init_intercept

for i in range(0,n):

slope,intercept=gradient(df,slope,intercept,learningrate)

err=error_calc(df,slope,intercept)

print("error = ",err)

if(err<10):

break

return [slope,intercept]

def run():

more_run=1

while more_run>0:

slope,intercept=0,0

n=int(input("enter the number of iterations"))

print("initial values slope= {0}, gradient={1},error={2}".format(slope,intercept,error_calc(df,slope,intercept)))

slope,intercept=run_gradient(df,slope,intercept,learningrate,n)

print('final values slope= {0}, gradient={1},error={2}'.format(slope,intercept,error_calc(df,slope,intercept)))

predict=1

while predict >0:

predict=int(input("enter greater if want to predict"))

size_house=int(input("enter the size"))

pred_price=intercept+slope*size_house

print("price of house of size = {0} will be = {1}".format(size_house,pred_price))

more_run=int(input("enter greater than 0 for moreiteration"))

if __name__=='__main__':

run()

********i think i have made some mistake ,and not able to get correct result. would you please have a look at it,that will be very helpfull.

thanks in advance.

*******i can not get output

Process finished with exit code 0

i am running this program in my python but can not get out put

The negative sign is missing at line 45 for m_gradient which is why the error is going to inf. Both the partial equations have negative signs in front of them.

Your best video Siraj, extremelly clear

vai dusre ka voice aur apna face wha kya combination hain

Dude you are awesome mad impulsive person. You are totally in depth with what ever you are doing on that moment. I love that.

I wrote a blog on understanding linear regression using just SymPy, a symbolic mathematics library in python. It was shared on [Hacker News](https://news.ycombinator.com/item?id=16199436) where it garnered a fair bit of attention. Here is a link to the blog:

https://safwanahmad.github.io/2018/01/21/Linear-Regression-A-Tale-of-a-Transform.html

Please leave your comments on the page via Disqus.

Thanks man. Learnt a lot.

Hey Siral. I am a big fan of you

This was nice(perfect) revision lecture for me…..Siraj bro u r upgrading very fast👍..like me😄

I will be really thankful if u can make few videos on realistic data and few kaggle problems….. Because iris and other stuff have only limited things and features. PLEASE please PLEASE

The error function is being used just to print in this case?

I expected to multiply it, by the learning rate, then by the gradient to update weights.

The chance of you responding to this comment = the chance of a random person /7billions of existing human check the infos in the 2nd and above Google search result tabs.

Hi Siraj I am new to the world of computer science.

i just wanted to ask u that instead of using the algorithm of gradient descent i.e doing the partial derivative of error function w.r.t B and M, instead can't we store error value generated at every B and M value and than sort out point of minimum error and hence get optimum B and M value.

I came here after the first two weeks of Andrew Ng's machine learning course. It's soo cool to see something you have been learning about for two weeks to happen. Can't wait to implement it myself with my own data sets.

Nice flows playa, and thanks for the tutorials

What IDE do u use Siraj??

when i set initial b and m equal to 1 each, the final b and m are different . why does it happen?

good one

Better than Udacity. Thanks.

at 38:50 when we calculate the gradient: why do we subtract it from our current_b and current_m? I mean why are we not adding for example?

It was a great tutorial by you siraj…very helpful.

I am getting this error.can someone kindly help?……' index 2 is out of bounds for axis 1 with size 2'.

Sir please upload video on bgboost

as Fabian Becker said,

Please don't say you've found the optimal m/b. You could be hitting local minima or simply not have done enough iterations. Gradient descent is very vulnerable to fitness landscapes that are non-linear.

is there any algo that can help me reaching the global optima

Siraj you dumb ass, use sklearn, much better…

Thanks, Buddy. Taking this video as a reference I am able to do the same coding in R. Thanks again.

what is the industry method of implementing linear regression ? or else every time we have code like above to realize the linear regression?

I know it's been a long time since this video was uploaded, but i really want to know something.

If the correlation of the data is negative (negative m, decreasing slope), should this still work?

I tried it and my slope isn't fitting the data points at all, my m value is positive, it's like the slope was inverted.

Sorry if I couldn't express myself, English is not my first language.

By the way, great video, learned a lot in 40 minutes.

Ah i wish you made this in R ;(

This is the only decent explanation ive found lmao rip me

This is amazing, that Siraj for great tutorial

I've one question, after calculation gradient des, and mean square error, I plotted m, y, and cost by matplotlib, it worked but is there any better library to plot graphs. MAtplotlib is not the easiest one to understand from its documentation.

Hey Siraj, I know this video is quite old, but I have a question. How did you solve this error:

RuntimeWarning: overflow encountered in double_scalars

totalError += (y – (m * x + b))**2

In the video it occurs at about 41:11.

yeahhh… u look cute, i like to watch your video but (apologize) your hand movements makes it irritating could u please…… no offense……

so can we use linear regression algorithm in a prediction model?

I m getting runtime warning : overflow encountered in double_scalars for calculating new_b and new_m and values are giving nan value as output… can you help me with this ?

at 41:21, backup.py was ran to get the output. original python file is showing error as infinity with overflow encountered in double_scalar. How to solve the runtime warning and get correct error, otherthan inf value ?

content/stream starts at 7:20

is perpendicular distance is calculated from line to point for error calculation?

Check out my implementation of gradient descent in python for multivariate as well as univariate linear regression. Please star the repository if you like it.

Kudos to Siraj Sir for giving me inspiration to extend this optimisation algorithm to work with multiple features in your data.

https://github.com/umangjpatel/KaggleKingsCountyHousing/blob/master/MultipleLinearRegression.ipynb

The data.csv file is saved in the same folder as the python file correct?

Hi Siraj,

I am a new subscriber, could you also assist me doing the same in "R"?

Where is the dataset he used?

Isn't "error rate" 112 kind of high? I thought the error rate was supposed to be as small as possible?

Siraj, I like your video…

but the gradient and the Y intercept do not make sense when I tried plotting them on graph

This was a really helpful video!! Knowing how to implement gradient descent from scratch is one of the most fundamental things for neural nets.

Why do we use gradient descent to get the for minimum error cant we just store the errors and the values and find where the error is minimum ?

Hey Siraj, I have a question regarding the formula you used for calculating least squares (I doubt this question will be answered, but i'll give it a go):

Question :

Why are you squaring the difference when you could've just taken the absolute value of each consecutive term? Is the use of squaring more prevalent due to the fact it makes distinguishing outliers obvious (since if the squared result is above a certain threshold, it shall be considered massive)?

Also, is this squaring and not taking absolute value related to variance and standard deviation, because the same concept holds true there (though formula might be different, but it's quite similar if you think about it)?

Would love to hear back from you, however hard it may be.

Thanks,

Qasim Wani

In computing the error , why do we have to choose Average ? Can't we choose Minimum ?

In the coursera course of Andrew Ng in calculating the gradient descent there is no minus sign is taken and here there is a neg sign of the partial derivatives..so I can't get it why the neg sign is needed ,the gradient itself would be negative..please clarify

when ur code dont work use backup.py xD

Thank you Siraj for helping me understand Linear regression. I have question for Siraj and everyone. please do well to answer me. thanks in advance.

I want to perform a logistic regression. I was asked to use state and political party and vote gotten as my independent variable and make a prediction whether a political party wins or loses. I have 36 states in my country and i want to use 3 dominant parties i want to use as a case study. my problem is how the layout of these data will be; I am unable to resolve party been in a separate column unless I take one political party and take one state and do the prediction explicitly and then move on to another.

Please i really needs you guys help to resolve these issue. Thanks in advance.

Auto ML in linear regression : https://www.youtube.com/watch?v=ANY9dstzM5k