# Mathematics of Machine Learning

Hello world it’s Suraj and the mathematics of machine learning is it necessary to know math for machine learning Absolutely machine learning is math. It’s all math in this video I’m gonna help you understand. Why we use math in machine learning by example machine learning is all about Creating an algorithm that can learn from data to make a prediction that prediction could be what an object in a picture looks like or What the next price for gasoline might be in a certain country or what the best? Combination of drugs to cure a certain disease might be machine learning is built on mathematical prerequisites and Sometimes it feels like learning them might be a bit overwhelming, but it isn’t or is it no It’s not as long as you understand why they’re used it’ll make machine learning a lot more fun You can have a full time job Doing machine learning and not know a single thing about the math behind the functions you’re using But that’s no fun is it you want to know why something works? And why one model is better than another machine learning is powered by the diamond of? Statistics calculus linear algebra and probability statistics is at the core of everything Calculus tells us how to learn and optimize our model linear algebra makes running these algorithms feasible on massive data sets and Probability helps predict the likelihood of an event occurring So let’s start from scratch with an interesting problem The problem is to predict the price of an apartment in an up-and-coming neighborhood in New York City Let’s say Harlem shout-out to Harlem. Yo Westside represent, okay? let’s say that all we’ll know when we Eventually make a prediction is the price per square foot of a given apartment, that’s the only marker? We’ll use to predict the price of the apartment as a whole and love for us We’ve got a data set of apartments with two columns in the first column We’ve got the price per square foot of an apartment in the second column We’ve got the price of the apartment as a whole there’s got to be some kind of correlation here and if we build a predictive model We can learn what that correlation is so that in the future if all we’re given is the price per square foot of a house We can predict the price of it if we were to graph out this data Let’s graph this out with the x-axis measuring the price per square foot and the y-axis Measuring the price of a house it would be a scatter plot Ideally we could find a line that intersects as many data points as possible and then we could just plug in some input data into our line and out comes the prediction poof in mathematics the field of statistics acts as a collection of techniques that extract useful information From data. It’s a tool for creating an understanding from a set of numbers Statistical inference is the process of making a prediction about a larger population of data based on a smaller sample as in what can we infer about a Populations parameters based on a sample statistic sounds pretty similar to what we’re trying to do right now, right? Since we’re trying to create a line. We’ll use a statistical inference technique called linear regression this allows us to summarize and study the relationship between two variables a lemma one variable X is regarded as the Independent variable the other variable Y is regarded as the dependent variable the way we can represent linear Regression is by using the equation y equals MX plus B Y is the prediction X is the input? B is the point where the line intersects the y-axis and M is the slope of the line We already know what the x value would be and why is our prediction if we had M. And B We would have a full equation plug and play easy prediction, but the question is how do we get these variables? Naive way would be for us to just try out a bunch of different values Over and over and over again and plot the line over time using our eyes We could try and estimate just how well fit the line. We draw is But that doesn’t seem efficient does it we do know there exists some ideal values for M And B such that the line when drawn using those values Would be the best fit for our data set let’s say we did have a bunch of time on our hands And we decided to try out a bunch of predicted values for M. And B we need some way of measuring how good our predicted values are we’ll need to use what’s called an error function an Error function will tell us how far off the actual Y value is from our predicted value There are lots of different types of statistical error functions out there But let’s just try a simple one called least squares This is what it looks like we’ll make an apartment price prediction for each of our data points based on our own intuition We can use this function to double check against the actual apartment price value it will subtract each predicted value from the actual value and Then it will square each of those differences the Sigma that little a looking thing denotes that we are doing this not just for one data point but for Every single data point we have M data points to be specific This is our total error value. We can create a three dimensional graph now We know the x axis and the y axis they will be all the potential m and B values respectively But let’s add another axis the z axis and on the z axis would be all the potential error values for every single combination of M and B if we were to actually graph this out It would look just like this this kind of bowl like shape Cup it firmly in your hand like a nice Bowl if we find that data point at the bottom of the bowl the smallest error value. That would be our ideal m And B values that would give us the line of best fit But how do we actually do that now we need to borrow from the math discipline? as calculus the study of change It’s got an optimization technique called gradient descent that will help us discover the minimum value iteratively It will use the error for a given data point to compute What’s called the gradient of our unknown variable and we can use the gradient to update our two variables? Then we’ll move on to the next data point and repeat the process over and over and over again Slowly like a ball rolling down a bowl. We’ll find what our minimum value is see calculus Helps us find the direction of change in what direction? Should we change the unknown variables MMB in our function such that its? Prediction is more optimal aka the error is smallest, but apartment prices Don’t just depend on the price per square foot right Also included are different features like the number of bedrooms and the number of Bathrooms as well as the average price of homes within a mile if we factored in those features as well our regression line would look More like this there are now multiple variables to consider so we can call it a multivariate regression problem the branch of math concerned with the study of Multivariate spaces and the linear transformations between them is called linear algebra it gives us a set of operations that we can perform on groups of numbers known as matrices our training set now becomes an M by I Matrix of M samples that have I feature x’ instead of a single variable with a weight each of the features has a weight So that’s an example of how three of the four main branches of math dealing with machine learning are used But what about the fourth probability all right? So let’s just scratch this example what if instead of predicting the price of an apartment? We want to predict whether or not it’s in prime condition or not we want to be able to Classify a house with the probability of it being prime or not prime Probability is the measure of likelihood of something we can use a probabilistic technique called logistic? Regression to help us do this since this time our data is categorical as in it has different categories or classes instead of predicting a value or predicting the Probability of an occurrence since the probability goes between 0 and 100 we can’t use an infinitely stretching line We’re left with some threshold passed some point X We are more likely than not looking at a prime house We’ll use an s-shaped curve given by the sigmoid function to do this Once we optimize our function will plug in input data and get a probabilistic class value just like that so to summarize machine learning consists mainly of statistics calculus linear algebra and probability theory Calculus tells us how to optimize linear algebra makes executing algorithms feasible on massive data sets Probability helps predict the likelihood of a certain outcome and statistics tells us what our goal is This week’s coding challenge is to create a logistic regression model from scratch in Python on an interesting data set github links go in the comment section and winners will be announced in a week, please subscribe For more programming videos and for now I’ve got to build Thanks for watching