Heroes of Deep Learning: Andrew Ng interviews Andrej Karpathy

Heroes of Deep Learning: Andrew Ng interviews Andrej Karpathy

November 16, 2019 47 By Stanley Isaacs


Welcome Andrej, I’m really glad you could join me today Yeah, thank you for having me So a lot of people already know your work in deep learning, but not everyone knows your personal story So let us instead start by you know telling us. How did you end up doing all this work in deep learning? Yeah, absolutely So I think my first exposure to deep learning was when I was an undergraduate at the University of Toronto And so Geoff Hinton was there and he was teaching a class on deep learning And at the time it was restricted boltzmann machines trained on MNIST digits And I just really like the way Kind of Geoff talked about training the network like the mind of the network and he was using these terms and I just thought it was a kind of a flavour of something magical happening when when this was training on those digits and And so that’s kind of like my first exposure to it although I didn’t get into it in a lot of detail at that time And then when I was doing my master’s degree at the University of British Columbia I took a class with Nando de Freitas, and that was again on machine learning and that’s the first time I kind of dealt kind of deeper into these networks and so on and Kind of what was interesting is that I was very interested in artificial intelligence, and so I took classes in artificial intelligence but a lot of what I was seeing there was just very not satisfying like it was a lot of kind of a depth first search breadth first search or alpha-beta pruning all these things I was not understanding how like I was not satisfied and so when I was seeing neural networks for the first time like in machine learning which is kind of this term that I think is more technical and not as well known in kind of you know most people talk about artificial intelligence machine learning was more kind of a technical term I would almost say. And so I was dissatisfied with artificial intelligence when I saw machine learning I was like this is the AI that I want to kind of spend time on. This is what’s really interesting and that’s kind of what took me down those directions. Is that this is kind of almost a new computing paradigm I would say Because normally humans write code, but here in this case the optimization writes code and so you’re creating input/output specification and then you have a lot of examples of it and then the optimization writes code and sometimes it can write code better than you and so I thought that was just a very new kind of way of thinking about programming and That’s what kind of intrigued me about it Then through your work one of the things you’ve come to be known for is that you are now the human benchmark for the ImageNet image classification competition How did that come about? So basically the ImageNet challenge is kind of a It’s sometimes compared to kind of the world cup of computer vision So a lot of people kind of care about this benchmark and number— our error rate kind of goes down over time And it was not obvious to me kind of where a human would be on the scale And I’ve done a similar smaller scale experiment on CIFAR-10 dataset earlier So what I did in CIFAR-10 is I just was looking at these 32 by 32 images And I was trying to classify them myself at the time this was only 10 categories So it’s fairly simple to create an interface for it And I think I had an error rate of about 6% on that And so that was and then based on what I was seeing and how hard task was I think I predicted that the lowest error Rate we’ve achieved would be like Okay, I can’t remember the exact numbers. I think I guess like 10% and we’re now down to like 3 or 2% or something crazy So that was my first kind of a fun experiment of like human baseline and and I thought it was really important for the same purposes that you kind of point out in some of your lectures I mean you really want that number to understand You know it, how well humans are doing so we could compare machine learning algorithms to it and for Imagenet It seemed that there was a discrepancy between how important this benchmark was and how much focus there was on getting a lower number and us not understanding even how humans Are doing on this benchmark, and so I created this Javascript interface And I was showing myself the images and then the problem with ImageNet is you don’t have just ten categories you have a thousand And so it almost like a UI challenge of obviously I can’t remember a thousand categories So how do I make it so that it’s something fair? And so I listed out all the categories, and I gave myself examples of them and so for each image I was crawling through a thousand categories and just trying to kind of a You know see? Based on the example I was seeing for each category what this image might be and I thought it was a just an extremely instructed exercise by itself I mean I was not I did not understand that like a third of imageNet was dogs and like dog species And so that was kind of interesting to see that networks spend on unique amount of time caring about dogs. I think a third of its performance comes from dogs Yeah, so this was kind of something that I did for maybe a week or two I put everything else on hold. I thought it was kind of a very fun exercise I got a number in the end, and then I thought that one person is not enough I wanted to multiple other people and so I was trying to organize Within a lab to get other people to kind of do the same thing, and I think people are not as willing to contribute Say like a week or two of like pretty painstaking work You know just like yeah sitting down for like five hours and trying to figure out which dog breed This is as I was not able to get like enough data in that respect, but we got at least like some Approximate performance which I thought was was fun, and then this was kind of picked up And it’s uh it wasn’t obvious to me at the time. I just watched know the number, but this became like a thing And people really like the fact that that this happened And I’m referred to jokingly as what the reference human and of course that’s kind of a hilarious to me. huh, yeah Well you were you surprised when? You know software deep nets finally surpass your performance? Absolutely, so yeah, absolutely. I mean especially I mean sometimes. It’s really hard to see in the image where it is It’s just like a tiny blob of like the black black dog is obviously somewhere there, and I’m not seeing like you know I’m guessing between like 20 categories and the network just gets it and I don’t understand how that comes about So there’s some super humanist to it but also for the I think the network is extremely good at these kind of like statistics of like four types and textures I just I think in that respect. I was not surprised that the network could better measure those fine statistics across lots of images in many cases I was surprised because some of the images required you to read like it’s just a bottle and you can’t see what it is But actually tells you what it is in text and so as a human I can read it and fine But the network would have to learn to read to identify the object because it wasn’t obvious just from from it. You know one of the things you become well-known for in the deep learning community has been grateful to you for has been your teaching the [inaudible] and put that online Tell me about how that came about Yeah, absolutely so I think I felt very strongly that Basically this technology was transformative and that a lot of people want to use it It’s somewhat like a hammer and what I wanted to do I was in a position to randomly kind of hand out this hammer to a lot of people and I just found that very compelling It’s not like necessarily advisable from the perspective a phd students because you’re putting a research on hold I mean this became like hundred and twenty percent of my time and I had to put all of research on hold Maybe I mean I talked to class twice on each time. It’s maybe four months and so That time is basically spent entirely on the class so it’s not super advisable from that perspective But it was basically the highlight of my phd is not even like related to research I think teaching the class was definitely the highlight of my phd Just just seeing the students just the fact that they were really excited. It was a very different class Normally, you’re being taught things that were discovered in 1800 or something like that But we were able to come to class and say look There’s this paper from like a week ago or even like yesterday, and there’s new results And I think the undergraduate students and the other students They just really enjoyed that aspect of the class and the fact that they actually understood so there’s not You know so you don’t have to this is not nuclear physics or rocket science? This is like you need to know calculus and linear algebra And you can actually kind of understand everything that happens under the hood and so I think just the fact that it’s so powerful the fact that it’s that it keeps changing on a daily basis as people kind of felt like they’re on the forefront of something big and I think that’s why people like really enjoyed that class a lot yeah And you’ve really helped a lot of people and handed a lot of hammers You know as someone there’s been in doing deep learning for quite some time now um The field is evolving rapidly. I’d be curious to hear. How is your own thinking? How does your understanding of deep learning change over these you know many years? Yeah, it’s basically like when I was seeing restricted boltzmann machines for the first time on digits It wasn’t obvious to me how this technology was going to be used And how big of a deal it would be and also when I was starting to work on computer vision Convolutional networks they were around but they were not something that a lot of the computer vision community kind of participated using Anytime soon the I think the perception was that this works for small cases, but would never scale for large images And that was just extremely incorrect and so Basically I’m just surprised by how general technology is and how good the results are that was my largest surprise I would say and it’s not only that So that’s one thing that it worked so well and say like ImageNet but the other thing that I think no one saw coming Or at least for sure I did not see coming is that you can take these pre train networks and that you can transfer You can fine-tune them on arbitrary other tasks because now you’re not just solving Imagenet and need millions of examples This also happens to be very general feature extractor and I think that’s kind of a second insight that I think fewer people saw coming and You know there were there were these papers that are just like here are all the things that people have been working on in computer vision scene classification, action recognition, object recognition You know place attributes and so on and people are just kind of crushing each Task just by fine-tuning the network and so that to me was Very surprising Yeah and somehow I guess Supervised learning gets most of the press and even though featuring fine-tuning their transfer learning is actually working very well people seem to talk less about that for some reason. Right, yeah exactly Yeah, I think what what has not worked as much are some of these hopes are on unsupervised learning which I think has kind of Been really why a lot of researchers have gotten into the field and around 2007 and so on And I think the promise of that has still not been delivered And I think I found that I find that also surprising is that the supervised learning part worked so well and the unsupervised learning is still kind of in a state of uh yeah it’s still a lot obvious how it’s going to be used or how that’s going to work even though a lot of people are still deep believers. I would say to use the term in this area So I know that you know one of the persons who has been thinking a lot about the long-term future of AI Do you want to share your thoughts on that? So I spent the last maybe year and a half at OpenAI I kind of thinking a lot about these topics and It seems to me like the Field will kind of split into two trajectories One will be kind of a kind of applied AI which is kind of just making these neural networks training them Mostly with supervised learning potentially unsupervised learning and getting better say image recognizers or something like that, and I think The other will be kind of the artificial general intelligence directions. Which is kind of how do you get? Neural networks that are entire kind of dynamical system that thinks and speaks and can do everything that a human can do and is intelligent in that way and I think that what’s been interesting is that for example in computer vision the way we approach it in the beginning I think was wrong in that we tried to break it down by different parts So we were like okay humans recognize people humans recognize scenes you recognize objects so we’re just going to do everything that humans do and Then once we have all those things and now we have what different areas and once we have all those things we’re going to figure Out how to put them together, and I think that was kind of a wrong approach and we’ve seen that How that kind of played out historically and so I think There’s something similar that’s going on likely on the higher level of with AI so kind of people are asking well Okay
people plan people do Experiments to figure out how the world works or people talk to other people so we need language and we are trying to decompose it By function accomplish each piece and then put it together into some kind of brain And I just think it’s kind of a just incorrect approach and so what I’ve been a much bigger fan of is having not decomposing that way, but having a single kind of Neural Network there is a complete dynamical system that you’re always working with a full agent and Then the question is how do you actually create objectives such that when you optimize over the weights to make up that brain you get intelligent behaviour out and so That’s kind of been something that I’ve been thinking about a lot at OpenAI I think there are a lot of kind of different area ways that people have thought about approaching this problem So for example going in a supervised learning direction. I have this essay online. It’s not an essay It’s kind of a short story that I wrote and the short story kind of tries to cover the hypothetical world of what it might? look like if the way we approach this AGI is just by scaling up supervised learning which we know works and and so that gets into something that looks like Amazon mechanical turk where people ssh into lots of robot bodies and they perform tasks and Then we train on that as a supervise learning data set to imitate humans and what that might look like and so on and so Then there are other directions Like I’m surprised learning From algorithmic information theory things like AIXI Or from artificial life things that look more like artificial evolution and so that’s kind of where I spend my time I think a lot about and I think I had a correct answer, but I’m not going to reveal it here At least I can learn more but reading your blog post Oh yeah, absolutely So you’ve already given out a lot of hammers And today there are a lot of people still wanting to enter the field of AI, the deep learning So for people in that position. What advice. Do you have for them? Yeah absolutely so I think when people talk to me about CS231 And why they thought it was a very useful course what people what I keep hearing again and again is Just people appreciate the fact that we got all the way to the low level details and they were not working with a library they saw the raw code and they saw how everything was implemented and implemented chunks of it themselves, and so just going all the way down to the understanding everything under you and never It’s really important to not abstract away things like you need to have a full understanding of the whole stack And that’s where I learned the most myself as well when I was learning this stuff It’s just implementing it myself from scratch was the most important It was the piece that that I felt gave me The best kind of a bang for buck in terms of understanding, so I wrote my own library it’s called convnet js it was written in Javascript and implements convolutional neural networks that was my way of learning about back propagation and So that’s something that I keep advising people is that that you Not work with tensorflow or something else you can work with it once you have written it something Something yourself on the lowest detail you understand everything under you and now you are comfortable to you now It’s possible to use some of these frameworks that abstract some of it away from you but you know what’s under the hood and So that’s been something that helped me the most That’s something that people appreciate the most when they take 231N and that’s when I would advise a lot of people Rather than you [ ] Yeah, yeah, and it’s some kind of a sequence of layers, and I know that when I add some drop out layers It makes it work better, but that’s not what you want in that Case you’re you’re not going to be able to debug effectively you’re not going to be able to improve a models effectively with that answer I’m really glad that the deep learning dot AI course started off with many weeks of python programming first Yeah good good Thank you very much for sharing your insights and advice you’re already a hero to many people in the deep learning world So really glad really grateful. You could join us here today. Yes. Thank you for having me.