Using Machine Learning for Predicting NFL Games | Data Dialogs 2016

Using Machine Learning for Predicting NFL Games | Data Dialogs 2016

September 2, 2019 5 By Stanley Isaacs


thank you a lot it’s great to be here
it’s great to meet the students and the faculty for the program and hopefully
it’s a fun talk so this is a standard quote that gets thrown around a lot
you know the NFL is a very unpredictable league where you know a lot of times the
better team wins but generally speaking lots of crazy stuff happens I don’t know
how many people are like avid football fans or minor fans but even if you’re
not this talk is kind of designed to be showing like how you can use machine
learning for this particular problem but it’s also designed to show you like how
like machine learning you can actually use right most of the time we talked
about machine learning Netflix is giving you recommendations Google is searching
things for you Amazon wants you to buy things but you know it feels very other
people are doing it for you like what if you actually have a real life problem
yourself right for me my problem was I’m in a fantasy football league with my
brothers-in-law and I want to win right so and I’ve … I was doing it without
machine learning for years I was just thinking like oh I’ll pick this team
I’ll pick that team and every year I was doing it I’m like I could probably do
better I should get the computer to make some picks for me and I waited waited
many years and then finally I’m like I’m just gonna do it and so this is sort of
like an example for you guys also that you know you can actually use this stuff
in like real life so this is joint work with a good friend of mine we kind of
came up with the idea together we coded it together you know he teaches at
another University in a financial engineering program also honorary
mention to my 12 year old daughter she she also helps with this process she
helps make picks she insists on taking some cut of the winnings fifty dollars is a lot for like a
twelve-year-old so she’s pretty happy anyways okay so how do fantasy football
leagues generally work right there’s all different varieties of leagues this one
that we’re looking at today is like a relatively simple league we’re not
picking individual players but just like let’s go back one step you know
basically as a football fan or a sports fan you watch every week and you think
you like know better the coach should have done this or that team should have
won or you know like you think you know better and then also there’s like all
the people on TV they’re talking like this should happen or that should happen
this team’s gonna win like is that all really true the other thing that you
know we have like some market information in the sense that you know
before the games like there is a betting line you can go to Las Vegas you can bet
on these games and you know you can say this thing called the point spread which
is like the amount that a certain team is supposed to beat another team by like
that probably encapsulates like a lot of information right … people are you
know sort of irrational people always vote for their own team but you imagine
like lots of people voting and they’re actually putting their real money at
stake they’re not gonna on average let’s make stupid decisions so one of the
ideas for this league is that you know we start with the point spreads as like
a simple way to get started and see if we can really like do any better than
that using machine learning techniques okay so how does our league work our
league is like I said relatively simple it’s called you know a pick’em league
meaning that on average every week there’s like 16 games sometimes there’s
14 on bye weeks and stuff like that but you’re supposed to rather than worry
about the point spreads you’re just supposed to pick who wins so for example
you know whatever like I forgot all the games this week but I think Denver
was playing like you know Oakland this week right and you’re supposed to pick
out right you’re not supposed to worry about who’s favored and who’s not
favored you just pick who you think is a winner and then the way you kind of like
accumulate points in this league is you have to assign points like 16 all the
way down to one and if you get your top pick right then you get 16 points
you get your second top pick right you get 15 points if you miss your 14th pick
you get zero points for that one so you can imagine like and then the way you
win this league is you accumulate points over the course of the year and the
person with the most points at the end of the year wins wins the league and
generally speaking you can win in an individual week right by B just going
crazy picking all the right upsets and getting
everything just right but on average if you’re gonna win over the course of the
season it’s gonna be better to be steady and consistent and just like not make
mistakes so let’s just look a little bit of like how this looks when I go to the
website and this is how I make my picks I don’t even remember this was probably
like a long time ago but this is just kind of how it works you know you pick
these two teams this week I you know the model or I or whoever like decided that
Indianapolis was the top pick and so we assigned like 16 weight to them 15 right
to them and then you go and you just enter your picks and it kind of locks it
in and then you’re competing against everybody else okay so what are the
various strategies right like so one thing I kind of already mentioned and
eluded to is like let’s pick the simplest strategy that requires like no
brain power well I mean one brain power no brain power would just be the guest
randomly right but that’s not what we’re trying to do so we take exactly what you
know Las Vegas is telling us and we basically take the team that’s like the
highest spread so if a certain team is favored to win by ten and that’s the
highest that week we put them at sixteen and then the next team may be their only
favored to win by seven so we put them second and then we go down the line and
order them in that … sort of way and then we have like some various tie
breakers like if two teams are both you know favored to win by four you know we
just pick the ones that are like a home team or like if there’s still a tie then
we pick okay which ones got the better record but an on average like those
little differences like don’t make much of a difference on the other hand you
could just do it just ad hoc based right you could not care about what Las Vegas
says you could just do your own thing you could look at the win-loss records
of the teams you could look at or are they playing a good team are they
playing an away game or a home game are they playing a division game or a non
division game so this is a little nuanced depending on how familiar you are with the NFL basically the league is broken up
into you know like six divisions or I think it’s like eight divisions nowadays
and basically you play all the teams in your division multiple times during the
season as opposed to not you don’t play everybody so you have a much more
familiar relationship with the teams in your division you play them much more
there’s more heated rivalries there’s more competition you tend to play a
little bit different and then you know other things you could look into you could
look into injury reports you could just have personal preference intuition for
example I know my brother-in-law is a giant Steelers fan he kind of can’t
physically bet against them even though they might be like you know favorite to
lose so but you know but he’ll … he’ll pick them but he’ll put them at
the bottom like for one point even if it means that he’s picking the wrong thing
but you know like I’m not going to do that I want the machine to tell me
what’s the right thing to do um but the other thing to remember is like ideally
aside from the personal preference and intuition part ideally the point-spread
encapsulates a lot of what’s out there in the world if some major player got
injured the point spreads will affect that if some team doesn’t play good on
you know artificial turf or you know bad weather like it should encapsulate that
so our data set is actually relatively clean so if we just look back like
historically at this league like you know what happens it turns out that this
spread guessing strategy which you know I say like requires no brain power wins
… straight up will win this league half the time and you know so I just
kind of compiled the years the winning score of like whoever won that year and
then what this spread method you know using some back testing would have would
have gotten us and you can see like you know basically four out of the eight
years that I looked at it just using you know no smartness no machines no
intuition you would have won this league so now you’re you’re already in like
you’ve already put a set a pretty high bar right because all these people maybe
like 50 people in the league they’re all doing their best to try to win and you
know this really simple method that requires no guessing is already kind of
outperforming them so how can you do better so this is where I decided to
give my machine learning project and see if I
could do better so just some basic machine learning basics we’re gonna use
a technique called supervised learning supervised learning is where you give
the computer some training data you give it what you call our features which are
like the known you know variables and then you actually give it a known result
and then the computer extracts like a model out of that and then using that
model it can now predict what’s going to happen with new examples that it’s never
seen before and how good your model is is basically
how well did you train the model okay so one quick thing I think we’ve all seen
linear regression before this is not what we’re gonna use linear regression
is good for predicting you know some Y variable when you have a bunch of X
variables and you know we’ve all done this before we’ve minimized things but
that’s not necessarily going to help us with this problem because we’re trying
to predict wins and losses on the other hand a technique called logistic
regression is good for classifying things so this is like you know some
sort of I think this is from the Coursera machine learning course but
basically you’re trying to discriminate between two people and you can see this
blue line is what’s called the decision boundary and you have people that are
these yellow dots and you have those black pluses and the decision boundary
more or less does a good job of discriminating between the two but it
doesn’t get everything quite right and maybe that’s okay because you know you
can’t expect that your machine learning algorithm like will get it hundred
percent right and you’re willing to live sort of with what Ed mentioned is like
some googliness right like it’s okay it doesn’t have to be perfect and in fact
if you had drawn the perfect line that like you know just discriminates between
the you know the two different data sets here you get into a problem area that’s
called overfitting like you fit your data exactly but you don’t actually
you’re not very good at making predictions you’re only good at
memorizing what happened in the past so you don’t want to get into that
problem so when you’re doing logistic regression because you’re basically
doing like some sort of a binary classification you want to use a
function that like helps you sort stuff out so you can see this the standard
thing that goes into a logistic regression is this thing called
a sigmoid function it has like a nice feature that it’s like smooth and then
if you’re above 0.5 you know you very quickly go up to 1
and then if you’re sorry if you’re above zero you very quickly go up to 1 if
you’re below zero you very quickly go down to zero and the basically the the
answer you get is like related to the probability of the confidence in that
pick so if your probability is closer to 0.99 and you have a very high
probability if your probability is like close to 0.1 then you’re closer to zero
so you have a very low probability of well low probability of being classified
as a 1 you have a very high probability of being classified as a zero and this
works for us because in the end what we’re trying to do is we’re trying to
classify based on our you know history of these NFL games like did the team
that was favored did they win the game that they were supposed to win or not so
now we’re getting a little bit closer to solving our problem so in the simplest
form like the logistic regression has a set of inputs called features and
it has a single output for a binary classifier and in our case we have to
figure out what are the relevant features that I want to include in the
model and I have to also think about like exactly like carefully like what am
I going to get the computer to predict because I want that probability to be
meaningful for when I go to like make my picks like 1 through 16 so what are the
things that we picked we picked a very simple amount of features we didn’t look
at a ton of data we just looked at your current year’s and last year’s win loss
record we looked at what week of the season it is because let’s say you have
a hundred percent winning record that’s … much different than if you’re 1 and 0 or 10 and 0 it’s much more meaningful
also we look to see if it was a home game because it was a clear advantage to
playing at home and we look to see if it’s a division game because this is
one of the things that I’ve kind of noticed over the years that like two
teams in general will play each other well and in these division games even if
you’re like an underdog you tend to play much much better against your division
opponents because of the familiarity and especially here at home and I don’t
know how many people are like fans you all sorts of crazy stuff happens you
know like the Jets for example are not good but they’ll beat New England at
home and you know nobody’s surprised and then the spread this is also like one of
the key pieces of information that goes into it and the idea here is that we’re
using the spread and we’re using these other features to sort of augment the
model to see if we can do better and then the binary classifier the final
thing we’re trying to predict is did the team that was favored did they win the
game or not so just zeros and ones okay so it’s a data science talk we’re gonna do
some Python so we use this Python comes with a really nice machine learning
package I’m sure if you’re taking the machine learning course you run into
what’s called scikit-learn it’s actually pretty straightforward like the actual …
like Ed said like 80% of this work was getting the data formatted correctly so
that it could actually do three lines of code right literally three lines of code
you have X’s which are your features you have Y which is your classifier and you
fit the model you score the model and you predict and that’s it
and all the other stuff I’m going to show you is the 80% which like goes into
making sure that like the ones and the zeros and the numbers all look good
together alright so and then how do we do this in Python there’s these things
called iPython notebooks you know normally on a weekly basis I have like
just scripts that run automatically and you know spit out the right answer but
when I’m doing what’s called exploratory data analysis or looking at results and
trying to visualize results we try to use these notebooks so let’s see if this
doesn’t break completely … oh look at that nice use of technology all right so
here’s my notebook we’ll go through it relatively quickly there’s you know
first of all they are just some like setups you import some directories you
import some packages turn off warnings let’s see here so I’m not going to run
it live because I’m sure if I try to do that it would
break but I did run it just not too long ago so you should believe me it’s not
completely canned okay so first of all we have some reference data … here’s
the team’s what league they’re in what division they’re in this is important
aside from the historical data the next thing we’re going to do is we’re
going to define what we call the test and training sets so anytime you’re
doing machine learning you want to you’re trying to make predictions and
you’re trying to see how good your predictions are so you don’t want to
validate your data based on stuff that you memorize so you want to hold out
some data that you haven’t seen before and then you want to see how good your
model works on that luckily for us because we have like a lot of historical
data I can basically run the model on let’s say and I what we chose to do is
like pick three years of data so let’s say we took the data from 2008 9 and 10
and then we predict what we think would have happened in 2011 and since 2011 has
passed already we can test to see if our model was any
good or not and so this is how we tested the model but this is actually live
where I’m gonna show you like what we do on a weekly basis to make the
predictions for this week so right now the test year is 2016 we don’t know
what’s gonna happen we want to predict for 2016 and we’re gonna train based on
these three years 2013 through 15 and we kind of mess with different ideas of
which how many years to use like five years seems like a good idea but ended
up being too you know it like incorporated information was a little
too old one season was like not enough information to get the statistics like
kind of robust and then the other thing I would like remind you is that this is
mostly like a fun project and you know you guys can ask like a ton of questions
like did I do this and did I do that and we thought about some things and we
didn’t think about others but I think this idea is that you know you can you
know use this as a starting point in your explorations using machine learning
and see how far you want to go but I’m happy for suggestions though because I
do want the model to get better ok so this is the part that this is like the
80% basically getting all the training data you read in all the games you like
look at the records of the teams you have to compute all these like metrics
for you know who’s in what division who won
who lost and then so I do it for the training set I do it for the test set
not that exciting okay so right before I’m about to send
in the data to the model like what does it look like so I the computer doesn’t
care whether Baltimore’s playing Pittsburgh it’s just just a name to it
right so the things that the computer cares about is the features that I
talked about so this is what the features look like the favored record
this is the first week of the season so clearly your … your current record
for everybody is 0% and then and this is why the previous year’s record is
somewhat important because the first game of the season who knows who’s gonna
win there’s just the spread right but hopefully if like the Super Bowl winner
is playing you know somebody who was like terrible last year that’s some
indication of you know who might be better so we have the previous record we
have which game of the week you’re at we have the line we take the absolute value
of it because we have another field here that says favored home game so that
automatically accounts for the minus sign or the plus sign as to who might
be favored and there’s this flag for a division game and then this is the
classifier it’s not that exciting it’s just zeros and ones in that week did the
favorite team win so I send this all to the scikit classifier and it’s pretty
straightforward this is all wrapped so that we can you know run this over and
over again but what I showed you before about running the classifier and
predicting it inside it really is like just those three lines so we set up the
classifier and then we can predict week 9 which is the week that just happened
so we’re gonna look to see what happens and then we kind of look at like what
does the prediction data look like and so basically what’s happening is you
know these were the games this last week and I ranked them by the probability that
the particular team would win and so the nice thing here is like not only does it
tell me that if I’m above 50% that’s telling me that the favorite team should
win and there’s only one upset pick this week turned out it didn’t work but
everything that’s above 50% should be that the favorite team wins and this
also gives me a way to rank the teams between 16 all the way down to one
there’s also only 14 games this week so it goes 16 down to three and
so this is what I need to do in order to make my picks into the system and then
you can just see like and that’s pretty much it like we can see what the model
would have predicted and so let’s just jump back to basically the the other thing here we’ll present and then we’ll show a little bit results now that we showed how
we use this okay so back testing so we trained over multiple sets of three year
periods and like looking forward like another year and we look to see how the
spread strategy would have done against the person who won the league that year
and we also extrapolated how like the machine learning strategy would have
done that year and it looks pretty good and this is back testing so we just have
to remember that like back testing’s like never as good as forward testing I don’t
know if anyone’s ever traded on Wall Street at a hedge fund you have all
these great ideas you’re gonna make money you try to put it in action in
real life it doesn’t work but you know but still you have to do your back
testing and you have to convince yourself that you went through some
like reasonable amount of you know effort to make sure that you think the
strategy is gonna work going forward and then you tweak it along the way as
things break or you come up with more information one thing that I’ll mention
is like I keep referring to this moderate strategy over here there’s a
bunch of like different ways that you could actually make the picks in this
particular league one particular way which I call the conservative strategy
is to just always pick the favorite regardless but then only you like use
the numbers to kind of reshuffle the order so that would be very similar to
the spread strategy it would just kind of change the order of some of them the
other thing is to actually pick the predicted team so for example I don’t know if you remember at the bottom it said Baltimore was well Pittsburgh had a 44 percent chance of
winning which means Baltimore the underdog should be favored to win so
we’re gonna actually pick Baltimore to be favored to win but we’re gonna put
them at the bottom of the pile just because it’s an upset well the other
thing we could do which I call the aggressive strategy
is to figure out what’s the relation to the point five because like what if
Baltimore what if the probability of Pittsburgh winning was zero right that
means it’s a hundred percent chance that Baltimore is gonna win so then I should
actually take Baltimore and put it way at the top at sixteen but you know we
did some back testing on that and it turned out that the aggressive strategy
tends to have like a very high standard deviation it like wins some years like
by one hundred and forty points and it loses other years by a hundred and forty
points and so you know in an effort to be you know a little bit more
conservative and to see if we could like win more consistently we decided to pick
this moderately conservative strategy and and then live testing right live
testing like how how does it work any good at all or not so 2014 was the first
year that we ran the strategy the spread strategy actually won that year my
daughter was happy because she’s the one who puts in the picks for the spread
strategy because she’s pretty sure that that’s the best one the moderate
strategy did not do so well this year that year last year was pretty ideal the
moderate strategy came in first place and the spread strategy came in third
place and the second person was just barely above the spread strategy so um
that was actually kind of nice and it was like a little bit of validation of
the model and how it works and we were happy to see that happen and
and then currently we’re not doing so hot but I will say that because it’s
like a slow and steady strategy like about two-thirds of the way through the
season is like when it really kind of like starts to build up and like the the
consistency of it starts to like outperform the people that are just like
making random guesses on a weekly basis so hopefully good things will happen and
and then just you know depending on how much football you watch on Sundays and
Monday nights this is what we had picked for this current week and you can see
that the spread strategy which is the far corner if you see favored win they
only got two wrong whereas the algorithm with the moderate strategy actually got
three wrong because it wrongly picked the upset of Baltimore over Pittsburgh
and Pittsburgh actually won so um and then the Seattle game
which is why I’m wearing a Seattle t-shirt is gonna happen tonight and
they’re predicted to win I’m not necessarily a fan but it’s fun to root
for the algorithm … and that’s all I believe we have a little time for questions thanks very much we’ve got a hand right
up there at the back straight away if anyone’s got a mic we’ll go to the back
thank you hi thanks so much for your presentation one immediate question I
have is football to me doesn’t see I’m a sports fan I like a lot of different
sports and if I were gonna do something like this football would not be first on
my list because of the very limited number of games
that’s like seems like at least one thing that would make this a little less
conducive your training sets and dev sets and all that just can’t be as
large you know baseball yeah so did that go into you I mean are you just a huge
football fan like what what are the contributors don’t know I’m like a big
sports fan all around and the idea is to sort of like use this as like a starting
point and then we definitely want to look into like baseball and even like I
don’t know Pro Cycling you know it’s like one of my favorite things you know
it doesn’t seem like a team sport but it is if so yeah I agree like with baseball
there’s definitely a lot more like 162 games over the course of the season lots
of individual player statistics and then the other thing you know this was just
literally to get started to win this particular league but even like you can
imagine starting to look at player statistics and how to how do you do like
a player team oriented fantasy league but yeah it’s certainly a good point not
limited to football at all Thanks okay we’ve got let’s go right across at the
end of we just on we down at one mic at the moment yeah
okay gentleman in the white shirt there and then we’ll and then you can pass it
back for the next question after that thank you hello thank you for the
presentation we can hear you yeah it’s good yes so my question is that it
sounds like your model relies a lot on the Vegas spread mm-hmm I was thinking
um why do use the Vegas spread and have you thought about the place to get say
with I don’t know the predictions from 538 for instance instead of the Vegas spread yeah so that’s a good point so for I think the thing is like what
538 does is they do some version of this right they do something else that is
also like model driven right so I think one of the lessons that I got from the
years of working at Wall Street is there’s like market information right so
538 is model information and there’s market information and there’s a
difference between what the bank says is the valuation of a security according to
the model and what the market says and companies have gone bankrupt and credit
crises has have happened so the idea was to take market information and I think
what we do is we look at 538 to see what they’re predicting I think even like
being like Microsoft search engine if you just type in like NFL games like
they give like a probability of winning and we’ve kind of like matched up ours
you know see like oh like are we totally off-base what are they doing we’re
trying to we have you know we don’t really know like what they’re doing but
it must be something along these lines but you know probably they’re using like
a much larger richer data set to kind of you know pull in this is meant to be
like relatively simple like literally like five inputs into the model and see
if we could like you know do something that’s interesting and effective but yeah
great question hi my question is can you talk a little
bit about why you picked logistic regression versus any other classifier yeah so one of the reasons I picked so the thing is one of the things I didn’t show here is that we
do like to run a third strategy and … using like support vector
machines and that one also is like a really high volatility and so we haven’t
gotten that to work so there is like certain amount of like what’s the word
like machine learning know-how and like really understanding like some of those
like algorithms like much more like theoretical basis logistic regression I
feel like is the simplest to understand because of the binary classifier and
because we’re doing a binary output for example the support vector machines when
it gives the probability it’s not usually visualizable in this like
sigmoid function oriented way so I think for the illustrative purposes like
logistic regression is also like a great idea but we are trying to test like
other you know decision trees and random forests and see if like something would
do better or not but that being said like so far over the years the logistic
regression has actually performed the best in terms of like going up against
live competition okay thank you okay we’ve got a hand right at the back there
in the corner yes mic’s coming to you thanks sorry since it’s a dialogue I
feel obligated to talk for a minute not ask you a question is that fair kind of
trying to talk about what Ed talked and you talked together with the
question of like why start at a domain that doesn’t have that much data not
sports but this particular sport I’d like to actually tell you that I think
statistics and machine learning started with small data not big data and I know
it’s a very good thing to always think about big data as the challenge like
processing the data and doing all that it’s much more intensive when you have
big data but the challenge with small data is actually a very important one I
know sports may not be like life-threatening moments
and we may not think about it as important as I personally don’t think
about it as important as other things I’m life but but I think that this
raises are actually a really good point in data more data is better than no data
and it’s a really big important topic there’s a lot of domains that don’t have
big data and are very very important to tackle and the algorithms not all of
them but a lot of them especially like the stuff that you were talking about
are applicable and you should not shy away from them just because they’re
small smaller data sets so I really do I’m going to talk a little bit about
agriculture where life isn’t as pretty with big data sometimes and so I’m
really a big advocate of taking something with small data and showcasing
it trying it and even failing and learning from it so I appreciate the
effort to go not into baseball which I personally dislike so thank you for trying to tackle something different … I’ve got a mic so I’m going to talk so we’re going on going on the same line of thinking do you think
adding features to those additional five those initial five adding additional
features would improve the quality of your predictions given your experience
or do you feel like the simplest method is the best given the success of the
spread relatively so I think definitely like more data would kind of enhance the
model but the nice thing about machine learning models is like you don’t
presuppose like so I put in this thing like for the division games that I think
is important right but then when you actually run the model it kind of spits
back out at you like what is the relative weight of that factor and if it
thought it was useless it would be zero or you know you can another thing you
can do with what’s called feature engineering and machine learning is like
you can take out that feature and seeing if you’re training accuracy goes up or
these results are more stable so we kind of did that with at least the features
that we picked and then but we are trying to figure out like how to add
more data but it is also an 80% problem right like you know it’s just like work
and and we just haven’t gotten there but I think definitely like there’s lots of
other data in the world where you can look at like defensive statistics and
offensive statistics you know do you care about
like individual like players and injuries and stuff like that like if a
star quarterback is like not playing like is the difference that I guess the
thing is like what this the whole point of this exercise is that the spread is
already telling you something and can the model uncover like does a home game
mean more than what the spread is already telling you right because
generally speaking you kind of hear anecdotally that like all things being
equal the home team has like a three-point advantage in the spread
right so it’s already baked in so it’s only efficient market I thought what’s
that like the efficient market hypothesis right exactly
so it’s only like can the model discover something that it’s not taking into
account as much as it should or over doing something and so yeah that’s like
an open question like who knows which things will be relevant or not but it’s
definitely we’re trying with other data sets volunteers are welcome … we have
tome for a couple more questions let’s take yeah the mic is getting passed to
you there and then we’ll go to the back for the final question so has anyone in
your league become more data-driven as a result of you doing this project I’m not
sure they all know I’m doing it I even tell them that the person who like wins
the league just straight up uses spreads but I think the thing is everybody like
it’s a fun thing to write so there is like some I think what happens with a
lot of people is they start with the spreads and then they make tweaks like
they just kind of adjust based on personal preference and intuition and
what they happen to know so I think maybe some people are doing that a
little bit more but it still seems to be just kind of like a fun thing let me see
if I can outwit the algorithm and not to say the algorithm is not amazing all the
time it does pick bizarre things and you know even I like question it and I’m
like who knows but you know we just go in it and we root for the algorithm yeah
question at the back there yeah all right so when you were talking about
the the spread because there’s a lot of interesting information baked into it
have you ever actually thought of trying to predict the spread and see what kind
of inputs actually go into the spread itself and try to understand that a
little bit more deeply so we haven’t tried to predict the spread but one
thought that we had was to take out the spread and see if like it could like
rank you know independent of the spread right so that would be like an
interesting thing because then you’re almost like recreating the spread or
doing like an agnostic thing where you’re not taking this like market
information so that’s like one thing we thought of we haven’t tried to predict
the spread that’s an interesting it would be I guess what you’d have to do is
you’d have to take these probabilities and like map them historically to what
that means right like is a 90% probability a 10-point spread or not you
know then you would do some linear regression probably yeah
seems like it’d be fun yeah and the other thing I was thinking of have you
thought of incorporating because we have these beautiful simulators that we’ve
been building for 12 years now is in Madden 2016 and actually running a
bunch of games on that and then adding that as another feature the one thing I have thought of though
so this is like obviously like writing a bunch of Python code using scikit-learn
etc etc there are starting to be like drag-and-drop machine learning tools for
the non programmer I think like Microsoft has something there’s
something called BigML there’s this new thing I just ran across the other day
called orange I forgot what it was called orange something or other but
basically there are tools for the data savvy person but not necessarily like a
Python programmer to like pull in your data you know say that these things are
you know kind of do a little bit of cleansing do a little bit of that 80
percent and you know say that these are the features this is the variable and
predict for me and I actually tried running this on BigML and it more or
less gives like the same answer that you know the model was giving
short of like not knowing what the tiebreakers were so that was kind of
cool I thought that it was like like this is almost achievable for the masses
or my brother-in-law if he wanted to you know great Amit thanks so much for
sharing your fantasy football work with us