Let’s Write a Pipeline – Machine Learning Recipes #4 November 29, 2019 100 By Stanley Isaacs CategoryArticles BlogTagsclassifiers data data model developers Fullname: Josh Gordon GDS: Full Production google Iris dataset K nearest neighbors knn lets write a pipeline Location: NYC machine learning machine learning recipes neural network neural networks and deep learning Other: NoGreenScreen product: web scikit-learn sklearn supervised learning Team: Scalable Advocacy tensorflow playground testing testing data training training data Type: Other 100 Comments chakree ten says: September 21, 2016 at 5:54 am hello, this is what i got while running it on my sublime text 3 from sklearn.cross_Validation import train_test_SplitImportError: No module named cross_Validation Reply Gerald Reiner says: October 3, 2016 at 12:34 am you should just do this. google IO is fine. This seems more important Reply ach kh says: November 28, 2016 at 5:48 pm Any one knows where i can get some test datasets in a csv format? learning purposes (im not a machine btw ) Reply Olee _ says: December 8, 2016 at 3:19 pm Thank you for the great video, but my OCD is going on TILT with that lowercase 'y' variable lol Reply Nikolay Klimchuk says: January 3, 2017 at 10:40 pm What this guy is doing with his hands? Swimming? I feel dizzy… Reply Vadim Borisov says: January 9, 2017 at 1:02 pm can not understand why are you using Python2.x? why not Phyton3.x? Reply Vadim Borisov says: January 9, 2017 at 1:15 pm I build the network for the hardest data set, thanks for the link! Reply זאב ג. says: January 23, 2017 at 5:29 pm hi, great series!!can you publish the code uses? Reply Abhishek Murali says: February 1, 2017 at 5:41 pm I just had 1 question: When we split the data using split command, we basically make the first 75 as training and rest 75 as testing. However, if we don't give training data of the 3rd label, how is it classifying that as well? Or am I interpreting the split function wrongly?Great videos though. As a beginner, these are really helping me. Reply Devender Shekhawat says: February 10, 2017 at 5:42 am what if the spliting method takes all data related to one flower (in iris example) and assign it to test data. can we select the order/randomness Reply Raghunandan Kavi says: February 12, 2017 at 7:23 pm sklearn.cross_validation is deprecated. need to change to sklearn.model_selection. Using a IDE is always better than typing code on NotePad Reply MW says: February 12, 2017 at 9:50 pm I'm loving this series Reply Ajinkya Jumbad says: February 15, 2017 at 6:07 pm I have great trouble keeping up with the syntax, any suggestions ? Reply Ashish Kumar says: March 5, 2017 at 7:03 pm sklearn.cross_validation will be obsolete soon. Reply Mārtiņš Mālmanis says: March 11, 2017 at 9:27 am Something doesn't work for me – some DeprecationWarning appears… What should I do in this case? Here is print screen: https://www.screencast.com/t/jwEiWyAGa6V Reply Manan Mehta says: March 12, 2017 at 8:41 pm For line number 9 I am getting invalid syntax in Spyder IDE. Please help Reply Dennis Nicholson says: March 21, 2017 at 7:16 pm Well done, thank you. Reply Jean Lee says: April 9, 2017 at 12:05 am I am really moved for this lecture thanks for great videos~ Reply Turkey Sandwich says: April 14, 2017 at 12:15 pm 10/10 quality production. Teacher speaks clearly and is easy to understand / enthusiastic. Content is well organized and can be followed each step of the way. I found this series to be the most valuable guide on machine learning on Youtube at the moment. All of the code has worked for me in Python3.4 as well. Reply Varuna says: April 15, 2017 at 1:14 pm Strange smiling after every sentence. Why do big tech company employees seem like cult members? Reply akbar alam says: April 17, 2017 at 7:20 pm great explanation … you are a rockstar Reply huy nguyen says: April 29, 2017 at 4:10 am could u make a video about Neural Network, its hard to understand this concept Reply kunal pawar says: May 7, 2017 at 7:13 am i love this guy Reply Emilio Duarte says: May 15, 2017 at 6:44 pm note that cross_validation has changed to model_selection Reply Akshat giri says: May 25, 2017 at 5:30 am You are the bomb. Reply Gandluri Sai Kishan 13BCE1039 says: June 4, 2017 at 8:29 am please help me out i am getting this error File "<ipython-input-116-f9c2da5a35bb>", line 6, in <module> x = iris.dataAttributeError: 'function' object has no attribute 'data' Reply Tiago says: June 6, 2017 at 11:18 am I have created a github repo with all of the code for all of the recipes of this series. I've used Python3 for all recipes. I've also updated all of the libraries and have added some things to the code here and there. Check it out: https://github.com/TheCoinTosser/MachineLearningGoogleSeries Reply Fistro Man says: June 10, 2017 at 5:50 pm About features in knn:The features are finite.So you can create all combinations of them, and then see if we quit one, the results dont change too much… ok ok it could be a lot of computing power, but is the machine deciding by its own rules, no human interaction.This is just good if some of hour features are good, if all are bad dont solve anything 🙂 Reply CookingAndJava says: June 11, 2017 at 6:52 pm Notice here, our accuracy was over 9000 Reply JH C says: June 13, 2017 at 4:02 pm where can I get the source code? Reply 丰存翰 says: June 15, 2017 at 12:02 pm It's a good leason,thank u Reply 常Bright says: June 22, 2017 at 9:04 am My major is Statistics and I want to apply for a PhD position in Statistics. But after seeing this series, I have changed my mind! Reply Rex Asabor says: June 23, 2017 at 12:41 am Would we select the classifier with the most accuracy after we test? Also, after we test, shouldnt we feed the testing data too, to increase accuracy? Reply Akash Mishra says: June 28, 2017 at 9:39 pm This is Awesome Reply Its Neroli says: June 30, 2017 at 2:44 pm Why is this a voiceover ( a random chat Reply Mayank Gupta says: July 12, 2017 at 8:02 pm Hi All, I created a nicely formatted repository containing the code from this video, but updated to work with new packages.https://github.com/officialgupta/MachineLearningRecipesLike this so people can see it! Reply Suharsh Tyagi says: August 2, 2017 at 8:46 pm This is goood Reply Uygar Yılmaz says: August 16, 2017 at 7:47 pm 3 seconds from here 5:16 Reply Michelle Elodie says: August 17, 2017 at 7:14 pm Awesome series, straight to the point and very clear. Keep going! Reply TimePass says: September 8, 2017 at 11:20 pm +Josh Gordon Hey, I am getting an "value error: too many values to unpack" error on executing.I have tried using model_selection instead of cross_validation, and still same error pops up.Can you help me out? Reply Manveer Singh says: September 13, 2017 at 11:54 am When you moved that line making red come to the right. I must say it was the magical moment! Reply Bagus Sulistyo says: September 19, 2017 at 4:52 pm Thank Josh…, this is help me to understandmachine learning basically 🙂 Reply cihangir mercan says: September 22, 2017 at 9:41 pm this one is good Reply [email protected] [email protected] says: October 2, 2017 at 2:20 pm Doing it on Python 3? Don't want to pause the video and write? Find the code here: https://github.com/akanshajainn/Machine-Learning—Google-Developers Reply TatTvamAsi says: October 9, 2017 at 3:01 pm OMG, I finally see a reason for learning math in high school. I'm so happy I took the time to learn about equation of a line and finding slopes. XD Reply ulti72 says: October 17, 2017 at 10:56 am does sklearn automatically classifies what is data and target, if so can we change target Reply Anshul Sharma says: November 12, 2017 at 12:05 pm cross_validation will be deprecated soon, we can use model_selection module now. Reply Matthew Jewell says: November 24, 2017 at 11:40 am Why do we use a capital X and lowercase y? Reply Robin Dong says: December 29, 2017 at 4:18 am great video. thanks. Reply Akadehmix says: January 16, 2018 at 4:28 pm If anyone is watching this when cross_validation becomes deprecated, replace cross_validation with model_selection. The classes and functions should work the same, as they are being refactored and moved to this namespace. Reply prashant vaishla says: January 20, 2018 at 3:47 am Their a lots of different classifier algorithm available . but how once can select suitable algo for classification. What should be the criteria for the selection of classification algorithm Reply Jose Ney Gandica Cardenas says: January 23, 2018 at 10:14 pm Excellent thanks!! you just opened my mind about machine learning… I was stuck on the concept Reply Tomás Seeber says: February 4, 2018 at 12:06 pm Body language tells this guy licks shoes to climb. Disgusting. Reply Dr. Rizz says: February 14, 2018 at 10:52 am Went through it all…Where is the pipeline? 🙂 Reply Omar Salim says: February 18, 2018 at 2:40 pm what if the new dot is neither red nor green , How can the classifier recognize that, and return with value 'false' instead of wrong prediction ? I'm working on face recognition project and I'm using this sklearn library … any ideas how can i recognize the face that it's not in the training data ? thanks Reply Jalal Bahmed says: February 21, 2018 at 11:58 pm Excellent series, very helpful. Reply sudhindra srinivas says: February 22, 2018 at 4:02 am Very well presented! Reply M15H4 says: March 19, 2018 at 7:14 pm if you have troubles executing this…1) make sure you have "sklearn.model_selection" instead of "sklearn.cross_validation"2) If your dataset is undefined, check spelling. Uppercase X and lowercase Y used continuously in this example Reply abhishek gowlikar says: March 20, 2018 at 10:31 am Great work Gosh, keep it up, a pretty gift to the world from google. Reply Steven Kuo says: March 25, 2018 at 12:42 pm Really a Edu genious can make up something like that. Thanks mate. Reply Alpesh Patel says: March 25, 2018 at 8:04 pm Great Videos. Keep it up! Reply anierenimmay says: March 30, 2018 at 6:29 pm I find myself wondering if he is a real person Reply Xitiz Shrestha says: March 31, 2018 at 10:23 am What we did in whole year project is in this video ,lol Reply Pranav Desai says: April 17, 2018 at 4:43 am Can we achieve more accuracy or probably even 100% accuracy by making the classifier more complex or have more parameters. Example– We could classify the dots(more random) better by having a more complex function such as a cubic or a bi-quadratic one right? Reply Denise Dias says: April 29, 2018 at 12:42 am 👌👌👌 Reply karan jakhar says: May 5, 2018 at 9:33 pm amazing video , very well explained . Thank you Sir. Reply nebulousJames12345 says: May 13, 2018 at 8:00 pm I put this on at night and slept to 12 in the afternoon. I put it back on 3 hours after I woke up and fell asleep again for 2 hours Reply The travel of time says: May 16, 2018 at 1:50 am the code @3:21 doesn't work unless i also include: from sklearn import neighbors Reply aman mishra says: May 18, 2018 at 11:08 am replace sklearn.cross_validation with sklearn.model_selection as cross_validation has been deprecated. Reply aayush singla says: May 24, 2018 at 8:15 pm These 7-8 min videos are better then hours of "so called" tutorials. watching in 2018. Reply Olaseni Odebiyi says: May 30, 2018 at 9:20 pm from sklearn.cross_validation import train_test_split/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Reply Joyjit Chatterjee says: July 4, 2018 at 3:30 pm Great. Here is my code for classifying the Iris Flower dataset using the Random Forest Classifier~ from sklearn.datasets import load_irisiris=load_iris()X=iris.dataY=iris.target from sklearn import ensemblefrom sklearn.cross_validation import train_test_split X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.5) clf=ensemble.RandomForestClassifier()clf.fit(X_train,Y_train)predictions=clf.predict(X_test)print("Using Random Forest Classifier, Predictions are:")print(predictions) from sklearn.metrics import accuracy_score print("Accuracy Score in percent is:")score=accuracy_score(predictions,Y_test)print(score*100) Reply dario27 says: July 9, 2018 at 4:13 pm If you get the deprecation warning, simply replace:from sklearn.cross_validation import train_test_splitwithfrom sklearn.model_selection import train_test_split Reply Utof says: July 11, 2018 at 9:36 am finally i understand machine learning 😍😍😍 Reply AD ForKnowledge says: July 30, 2018 at 3:41 am Very nice videos I liked all..!!! 🙂 The way you are presenting the example it triggered me to learn Python.. U made it look simple 🙂 I am android developer and have total interest in machine learning…. 🙂 Thanks for the good content… 🙂 Reply jack vicky says: August 4, 2018 at 1:44 pm Simply Awesome Video. Thanks much Love ML Reply Xavier X says: August 5, 2018 at 1:01 pm aaaaaaaaaaaaah! ridiculous pace.hint: watch these videos at 0.5 speed or slower.press pause frequently to digest whats going on. Reply Rasul Turganov says: August 22, 2018 at 11:21 pm Thanks a lot, Josh and Google Developers for these awesome episodes. Finally, I've understood ML. You're the best! Reply ashwini tayade says: August 23, 2018 at 7:50 pm The video was, so incomplete Reply Alex Senchenko says: September 5, 2018 at 9:42 am Google's translator don't know what is Scikit in subtitles) Reply Ahmed Alhisaie says: September 16, 2018 at 4:26 am this is very good tutorial, Thanks a lot Josh Reply Abdullah Aghazadah says: October 1, 2018 at 3:23 am Here is a quick summary of the video: – scikit-learn has a handy function for splitting data sets into a training and a testing set– it's sklearn.model_selection.train_test_split(data_set_features,data_set_labels,test_fraction)– this function will return 1) training_features 2) testing_features 3) training_labels and 4) testing_labels– i.e. it returns a tuple of 4 elements– note, the test_fraction argument specifies the fraction of the data you want to use for testing– so if you put 0.5, it means you want to use half the data for testing (and the other half for training obviously) – recall that the .predict() method returns a list of predictions for the list of examples you pass it– you can use sklearn.metrics.accuracy_score(test_labels,predicted_labels) to compare two list of labels essentially – supervised learning is also known as function approximation because ultimately what you are doing is finding a function that matches your training examples well– you start with some general form of the function (e.g. y = mx+b) and then you tune the parameters such that it best describes your training examples (i.e. change m and b until you get a line that best splits your data) Key thing to take away from the video:Supervised learning is just function approximation. You start with a general function and then tweak the parameters of the function based on your training examples until your function describes the training data well. Reply Robin Dong says: October 18, 2018 at 8:51 pm Josh, you are not only knowledgeable of all these ML, but also a outstanding instructor. Simplified all these complicated methods. Cant thank you enough. Reply JoRouss says: November 16, 2018 at 5:05 am Why is "X" capital and not "y"? Reply RPGtogether says: February 23, 2019 at 9:58 am Are you an ML? Reply Constantin Philippou says: February 24, 2019 at 7:54 am great great videos!! Reply Marudhu Paandian Krishna Kumar says: April 9, 2019 at 3:20 pm the content is great but requires an update Reply LadyWinter says: April 10, 2019 at 10:29 am from sklearn.model_selection import train_test_split Reply Ahmed Lachtar says: April 29, 2019 at 10:55 pm extremely helpful and simple (y) (y) Reply Roger Datt says: May 21, 2019 at 3:16 am Android 5.0 questions is def spam Reply Rupatai Lichode says: May 31, 2019 at 2:43 pm Thanks excellent series good work 🙂 Reply Mithilesh Thakkar says: June 18, 2019 at 7:10 am I am getting this error while doing accuracy check: accuracy_score() missing 1 required positional argument: 'y_pred' may someone help me to sort out. Reply ananthoju sriharsha says: June 30, 2019 at 4:52 am i thought this video was about pipeline in sklearn WTF Reply Ananya Pandey says: July 12, 2019 at 7:50 am did I miss something ? Where is the pipeline ? Reply Fennec Besixdouze says: July 14, 2019 at 4:01 am By the way, cross_validation module has been renamed model_selection. Lesson 0: learn to go read the documentation of modules you use. Stuff changes constantly. Reply A Random Programmer says: August 25, 2019 at 10:42 pm print predictions should be print(predictions) Reply ANUBHAV SOOD says: November 1, 2019 at 12:51 pm He did not import the pipeline library, forget using it.Title of the video is misleading Reply lavagod says: November 1, 2019 at 8:44 pm Thanks Reply 최윤미 says: November 3, 2019 at 7:48 am helo Reply Kishen Sharma says: November 9, 2019 at 9:20 pm This has nothing to do with pipelines Reply Leave a Reply Cancel reply Your email address will not be published. Required fields are marked *Comment Name * Email * Save my name, email, and website in this browser for the next time I comment.