MIT 6.S094: Introduction to Deep Learning and Self-Driving Cars

MIT 6.S094: Introduction to Deep Learning and Self-Driving Cars

November 20, 2019 100 By Stanley Isaacs


Alright. Hello everybody. Hopefully you can hear me well. Yes? Yes. Great! So, welcome to Course 6.S094. Deep Learning for Self-Driving Cars. We will introduce to you the methods of deep learning, of deep neural networks using the guiding case study of building self-driving cars. My name is Lex Fridman. You get to listen to me for a majority of these lectures and I am part of an amazing team with some brilliant TAs. Would you say brilliant? (CHUCKLES) Dan Brown. You guys want to stand up? They’re in the front row. Spencer, William Angell. Spencer Dodd and all the way in the back. The smartest and the tallest person I know, Benedict Jenik. Well you see there on the left of the slide is a visualization of one of the two projects that one of the two simulations, games that we’ll get to go through. We use it as a way to teach you about deep reinforcement learning but also as a way to excite you. By challenging you to compete against others if you wish to in a special prize yet to be announced. Super secret prize. So you can reach me and the TA’s at [email protected] if you have any questions about the tutorials, about the lecture, about anything at all. The website cars.mit.edu has the lecture content. Code tutorials, again like today, the lectures slides for today are already up in PDF form. The slides themselves, if you want to see them just e-mail me but there are over a gigabyte in size because they’re very heavy in videos so I’m just posting the PDS. And there will be lecture videos available a few days after the lectures were given. So speaking of which there is a camera in the back. This is being videotaped and recorded but for the most part the camera is just on the speaker. So you shouldn’t have to worry. If that kind of thing worries you then you could sit on the periphery of the classroom or maybe I suggest sunglasses and a moustache, fake mustache, would be a good idea. There is a competition for the game that you see on the left. I’ll describe exactly what’s involved in order to get credit for the course you have to design a neural network that drives the car just above the speed limit sixty five miles an hour. But if you want to win, we need to go a little faster than that. So who’s this class is for? You may be new to programming, new to machine learning, new to robotics, or you’re an expert in those fields but want to go back to the basics. So what you will learn is an overview of deep reinforcement learning, of convolutional neural networks, recurring neural networks and how these methods can help improve each of the components of autonomous driving – perception, visual perception, localization, mapping, control planning and the detection of driver state. Okay, two projects. Code named “DeepTraffic” is the first one. There is, in this particular formulation of it, there is seven lanes. It’s a top view. It looks like a game but I assure you it’s very serious. It is the agent in red, the car in red is being controlled by a neural network and we’ll explain how you can control and design the various aspects, the various parameters of this neural network and it learns in the browser. So this, we’re using ConvNet.JS which is a library that is programmed by Andrej Karpathy in javascript. So amazingly we live in a world where you can train in a matter of minutes a neural network in your browser. And we’ll talk about how to do that. The reason we did this is so that there is very few requirements to get you up and started with neural networks. So in order to complete this project for the course, you don’t need any requirements except to have a Chrome browser. And to win the competition you don’t need anything except the Chrome browser. The second project code name “DeepTesla” or “Tesla” is using data from a Tesla vehicle of the forward road way and using end-to-end learning taking the image and putting into convolutional neural networks that directly maps “or aggressor” that maps to a steering angle. So all it takes is a single image and it predicts a steering angle for the car. We have data for the car itself and you get to build a neural network that tries to do better, tries to steer better or at least as good as the car. Okay. Let’s get started with the question, with the thing that we understand so poorly at this time because it’s so shot in mystery but it fascinates many of us. And that is the question of: “What is intelligence?” This is from a March 1996 Time magazine. And the question: “Can machines think?” is answered below with, “they already do.” So what if anything is special about the human mind? It’s a good question for 1996, a good question for 2016, 2017 now, and the future. And there’s two ways to ask that question. One is the special purpose version. Can an artificial intelligence system achieve a well defined, specifically, formally defined finite set of goals? And this little diagram from a book that got me into artificial intelligence as a bright-eyed high school student they are artificial intelligence to modern approach. This is a beautifully simple diagram of a system. It exists in an environment. It has a set of sensors that do the perception. It takes those sensors in. It does something magical. There’s a question mark there. And with a set of affectors acts in the world, manipulates objects in that world, and so special purpose. We can, under this formulation, as long as the environment is formally defined, well defined; as long as a set of goals are well defined. As long as the set of actions, sensors, and the ways that the perception carries itself out as well defined. We have good algorithms which will talk about that can optimize for those goals. The question is, if we inch along this path, will we get closer to the general formulation, to the general purpose version of what artificial intelligence is? Can it achieve poorly defined, unconstrained set of goals with an unconstrained, poorly defined set of actions and unconstrained, poorly defined utility functions rewards. This is what human life is about. This is what we do pretty well most days. Exist in an undefined, full of uncertainty, world. So, okay. We can separate tasks into three different, categories, formal tasks. This is the easiest. It doesn’t seem so, it didn’t seem so at the birth of artificial intelligence but that’s in fact true if you think about it. The easiest is the formal tasks, playing board games, theory improving. All the kind of mathematical logic problems that can be formally defined. Then there is the expert tasks. So this is where a lot of the exciting breakthroughs have been happening where machine learning methods, data driven methods, can help aid or improve on the performance of our human experts. This means medical diagnosis, hardware design, scheduling, and then there is the thing that we take for granted. The trivial thing. The thing that we do so easily every day when we wake up in the morning. The mundane tasks of everyday speech, of written language, of visual perception, of walking which we’ll talk about in today’s lecture is a fascinatingly difficult task on object manipulation. So the question is that we’re asking here, before we talk about deep learning, before we talk about the specific methods, we really want to dig in and try to see what is it about driving, how difficult is driving. Is it more like chess which you see on the left there where we can formally define a set of lanes, a set of actions and formulate it as there’s five set of actions – you can change your lane, you can avoid obstacles. You can formally define an obstacle. You can the formally define the rules of the road. Or is there something about natural language, something similar to everyday conversation about driving that requires a much higher degree of reasoning, of communication, of learning, of existing in this under-actuated space. Is it a lot more than just left lane, right lane, speed up, slow down? So let’s look at it as a chess game. Here’s the chess pieces. What are the sensors we get to work with on an autonomous vehicle? And we get a lot more in-depth on this especially with the guest speakers who built many of these. There’s radar. There’s the Rays sensors. Radar lidar. They give you information about the obstacles in their environment. They’ll help localize the obstacles in the environment. There’s the visible light camera and stereo vision that gives you texture information, that helps you figure out not just where the obstacles are but what they are, helps to classify those, has to understand their subtle movements. Then there is the information about the vehicle itself, about the trajectory and the movement of the vehicle that comes from the GPS an IMU sensors. And there is the rich state of the vehicle itself. What is it doing? What are all the individual systems doing that comes from the canned network. And there is one of the less studied but fascinating to us on the research side is audio. The sounds of the road that provide the rich context of a wet road. The sound of a road that when it stop raining but it’s still wet, the sound that it makes. The screeching tire and honking. These are all fascinating signals as well. And the focus of the research in our group, the thing that’s really much under-investigated is the internal facing sensors. The driver, sensing the state of the driver, were they looking? Are they sleepy? The emotional state. Are they in the seat at all? And the same with audio. That comes from the visual information and the audio information. More than that. Here are the tasks. If you were to break into modules the tasks of what it means to build a self-driving vehicle. First, you want to know where you are. Where am I. Localization and mapping. You want to map the external environment. Figure out where all the different obstacles are, all the entities are, and use that estimate of the environment to then figure out where I am, where the robot is. Then there is scene understanding. It’s understanding not just the positional aspects of the external environment and the dynamics of it but also what those entities are. Is it a car? Is it a pedestrian? Is it a bird? There is movement planning. Once you have kind of figured out to the best of your abilities your position and the position of other entities in this world, it’s figuring out a trajectory through that world. And finally, once you’ve figured out how to move about safely and effectively through the world it’s figuring out what the human that’s on board is doing because as I will talk about the path to a self-driving vehicle and that is, hence, our focus on Tesla may go through semi-autonomous vehicles. Where the vehicle must not only drive itself but effectively hand over control from the car to the human and back. Ok, quick history. Well, there’s a lot of fun stuff from the eighty’s and ninety’s but the big breakthroughs came in the second DARPA Grand Challenge with Stanford Stanley, when they won the competition. One of five cars that finished. This was an incredible accomplishment in a desert race. A fully autonomous vehicle was able to complete the race in record time. The DARPA Urban Challenge in 2007 where the task was no longer a race to the desert but through an urban environment and CMU’s “Boss” with GM won that race and a lot of that work went directly into the acceptance and large major industry players taking on the challenge of building these vehicles. Google, now “Waymo” self-driving car. Tesla with its “Autopilot” system and now “Autopilot 2” system. Uber with its testing in Pittsburgh. And there’s many other companies including one of the speakers for this course of nuTonomy that are driving the wonderful streets of Boston. Ok. So let’s take a step back. We have, if we think about the accomplishments in the DARPA Challenge, and if you look at the accomplishments of the Google self-driving car which essentially boils the world down into a chess game. It uses incredibly accurate sensors to build a three dimensional map of the world, localize itself effectively in that world and move about that world in a very well-defined way. Now, what if driving… The open question is: if driving is more like a conversation, like in natural language conversation, how hard is it to pass the Turing Test? The Turing Test, as the popular current formulation is, can a computer be mistaken for a human being in more than thirty percent of the time? When a human is talking behind a veil, having a conversation with their computer or a human, can they mistake the other side of that conversation for being a human when it’s in fact a computer. And the way you would, in a natural language, build a system that has successfully passes the Turing Test is, the natural language processing part to enable it to communicate successfully? So, general language and interpret language, then you represent knowledge the state of the conversation transferred over time. And the last piece and this is the hard piece, is the automated reasoning, is reasoning. Can we teach machine learning methods to reason? That is something that will propagate through our discussion because as I will talk about the various methods, the various deep learning methods, neural networks are good at learning from data but they’re not yet, there is no good mechanism for reasoning. Now reasoning could be just something that we tell ourselves we do to feel special. Better to feel like we’re better than machines. Reasoning may be simply something as simple as learning from data. We just need a larger network. Or there could be a totally different mechanism required and we’ll talk about the possibilities there. Yes. (Inaudible question from one of the attendees) No, it’s very difficult to find these kind of situations in the United States. So the question was, for this video, is it in the United States or not? I believe it’s in Tokyo. So India, as is a few European countries, are much more towards the direction of natural language versus chess. In the United States, generally speaking, we follow rules more concretely. The quality of roads is better. The marking on the roads is better. So there’s less requirements there. (Inaudible question from one of the attendees) These cars are are driving on one side? I see. I just- Okay, you’re right. It is because, yeah- So, but it’s certainly not the United States. I spent quite a bit of googling trying to find in the United States and it is difficult. So let’s talk about the recent breakthroughs in machine learning and what is at the core of those breakthroughs is neural networks that have been around for a long time and I will talk about what has changed. What are the cool new things and what hasn’t changed and what are its possibilities. But first a neuron, crudely, is a computational building block of the brain. I know there’s a few folks here, neuroscience folks, this is hardly a model. It is mostly an inspiration and so the human neuron has inspired the artificial neuron the computational building block of a neural network, of an artificial neural network. I have to give you some context. These neurons, for both artificial and human brains, are interconnected. And the human brain, there’s about, I believe 10,000 outgoing connections from every neuron on average and they’re interconnected to each other, are the largest current, as far as I’m aware, artificial neural network, has 10 billion of those connections. Synapses. Our human brain, to the best estimate that I’m aware of, has 10,000X that. So one hundred to one thousand trillion synapses. Now what is an artificial neuron? That is the building block of a neural network. It takes a set of inputs. It puts a weight on each of those inputs, sums them together, applies a bias value on each neuron and using an activation function that takes its input, that sum plus the bias and it squishes it together to produce a zero to one signal. And this allows us a single neuron to take a few inputs and produces an output a classification for example, a zero one. And then we’ll talk about, simply, it can serve as a linear classifier so it can draw a line. It can learn to draw a line between, like what you’d seen here, between the blue dots and the yellow dots. And that’s exactly what we’ll do in the iPython Notebook that I’ll talk about but the basic algorithm is you initialize the weights on the inputs and you compute the output. You perform this previous operation I talked about sum up and compute the output. And if the output does not match the ground truth, The expected output, the output it should produce, the weights are punished accordingly and will talk through a little bit of the math of that. And this process is repeated until the perceptron does not make any more mistakes. Now here’s the amazing thing about neural networks. There are several and I’ll talk about them. One on the mathematical side is the universality of neural networks with just a single layer if you stack them together, a single hidden layer, the inputs on the left, the outputs on the right. And in the middle there is a single hidden layer, it can closely approximate any function. Any function. So this is an incredible property that with a single layer any function you could think of, that you could think of driving as a function. It takes its input, the world outside as output to control the vehicle. There exists a neural network out there that can drive perfectly. It’s a fascinating mathematical fact. So we can think of this then these functions as a special purpose function, special purpose intelligence. You can take, say as input, the number of bedrooms, the square feet, the type of neighborhood. Those are the three inputs. It passes that value through to the hidden layer. And then one more step. It produces the final price estimate for the house or for the residence. And we can teach a network to do this pretty well in a supervised way. This is supervised learning. You provide a lot of examples where you know the number of bedrooms, the square feet, the type of neighborhood and then you also know the final price of the house or the residence. And then you can, as I’ll talk about through a process of back propagation, teach these networks to make this prediction pretty well. Now some of the exciting breakthroughs recently have been in the general purpose intelligence. This is is from Andrej Karpathy who is now at OpenAI. I would like to take a moment here to try to explain how amazing this is. This is a game of “pong”. If you’re not familiar with “pong”, there are two paddles and you’re trying to bounce the ball back and in such a way that prevents the other guy from bouncing the ball back at you. The artificial intelligence agent is on the right in green and up top is the score 8-1. Now this takes about three days to train on a regular computer, this network. What is this network doing? It’s called the Policy Network. The input is the raw pixels. There’s slightly a process and also you take the difference between two frames but it’s basically the raw pixel information. That’s the input. There’s a few hidden layers and the output is the single probability of moving up. That’s it. That’s the whole system and what it’s doing is, it learns. You don’t know at any one moment, you don’t know what the right thing to do is. Is it to move up? Is it’s moved down? You only know what the right thing to do is by the fact that eventually you win or lose the game. So this is the amazing thing here is, there’s no supervised learning. There’s no universal fact about anyone stay being good or bad. And anyone actually being good or bad in the state but if you punish or reward every single action you took, every single action you took, for an entire game based on the result. So no matter what you did, if you won the game, the end justifies the means. If you won the game, every action you took in every every action state pair gets rewarded. If you lost the game, it gets punished. And this process, with only two hundred thousand games where the system just simulates the games, it can learn to beat the computer. This system knows nothing about “pong”, nothing about games, this is general intelligence. Except for the fact, that it’s just a game “pong”. And I will talk about how this can be extended further, why this is so promising and why we should proceed with caution. So again, there’s a set of actions you take up, down, up, down, based on the output of the network. There’s a threshold given the probability of moving up, you move up or down based on the output of the network. And you have a set of states and every single state action pair is rewarded if there’s a win and it’s punished if there’s a loss. When when you go home, think about how amazing that is and if you don’t understand why that’s amazing, spend some time on it. It’s incredible. (Inaudible question from one of the attendees) Sure, sure thing. The question was: “What is supervised learning? What is unsupervised learning? What’s the difference?” So supervised learning is, when people talk about machine learning they mean supervised learning most of the time. Supervised learning is learning from data, is learning from example. When you have a set of inputs and a set of outputs that you know are correct or called Ground Truth. So you need those examples, a large amount of them, to train any of the machine learning algorithms to learn to then generalize that to future examples. Actually, there’s a third one called Reinforcement Learning where the Ground Truth is sparse. The information about when something is good or not, the ground truth only happens every once in a while, at the end of the game. Not every single frame. And unsupervised learning is when you have no information about the outputs. They are correct or incorrect. And it is the excitement of the deep learning community is unsupervised learning, but it has achieved no major breakthroughs at this point. I’ll talk about what the future of deep learning is and a lot of the people that are working in t he field are excited by it. But right now, any interesting accomplishment has to do with supervised learning. (Partially inaudible question from one of the attendees) And the wrong one is just has the [00:33:29] (Inaudible) solution like looking at the philosophy. So basically, the reinforcement learning here is learning from somebody who has certain hopes and how can that be guaranteed that it would generalize to somebody else? So the question was this: the green paddle learns to play this game successfully against this specific one brown paddle operating under specific kinds of rules. How do we know it can generalize to other games, other things and it can’t. But the mechanism by which it learns generalizes. So as long as you let it play, as long as you let it play in whatever world you wanted it to succeed in long enough, it will use the same approach to learn to succeed in that world. The problem is this works for worlds you can simulate well. Unfortunately, one of the big challenges of neural networks is they’re not currently efficient learners. We need a lot of data to learn anything. Human beings need one example often times and they learn very efficiently from that one example. And again I’ll talk about that as well, it’s a good question. So the drawbacks of neural networks. So if you think about the way a human being would approach this game, this game of “pong”, it would only need a simple set of instructions. You’re in control of a paddle and you can move it up and down. And your task is to bounce the ball past the other player controlled by AI. Now the human being would immediately, they may not win the game but they would immediately understand the game and would be able to successfully play it well enough to pretty quickly learn to beat the game. But they would need to have a concept of control. What it means to control a paddle, need to have a concept of a paddle, need to have a concept of moving up and down and a ball and bouncing, they have to know, they have to have at least a loose concept of real world physics that they can then project that real world physics on to the two dimensional world. All of these concepts are concepts that you come to the table with. That’s knowledge. And the kind of way you transfer that knowledge from your previous experience, from childhood to now when you come to this game, that something is called reasoning. Whatever reasoning means. And the question is whether through this same kind of process, you can see the entire world as a game of “pong” and reasoning is simply the ability to simulate that game in your mind and learn very efficiently, much more efficiently, than 200,000 innovations. The other challenge of deep neural networks and machine learning broadly is you need big data and efficient learners as I said. And that data also need to be supervised data. You need to have Ground Truth which is very costly for annotation. A human being looking at a particular image, for example, and labeling that as something as a cat or dog, whatever objects is in the image, that’s very costly. And particularly for neural networks there’s a lot of parameters to tune. There’s a lot of hyper-parameters. You need to figure out the network structure first. How does this network look, how many layers? How many hidden nodes? What type of activation function for each node? There’s a lot of hyper-parameters there and then once you’ve built your network, there’s parameters for how you teach that network. There’s learning rate, loss function – meaning bad size – number of training iterations, gradient updates moving and selecting even the optimizer with which you solve the various differential equations involved. It’s a topic of many research paper, certainly it’s rich enough for research papers, but it’s also really challenging. It means you can’t just pop the network down it will solve the problem generally. And defining a good lost function, or in the case of “pong” or games, a good reward function is difficult. So here’s a game, this is a recent result from OpenAI, I’m teaching a network to play the game of coast runners. And the goal of coast runners is you’re in a boat the task is to go around the track and successfully complete a race against other people you’re racing against. Now this network is an optimal one. And what is figured out that actually in the game, it gets a lot of points for collecting certain objects along the path. So you see it’s figured out to go in a circle and collect those those green turbo things. And what is figured out is you don’t need to complete the game to earn the award. And despite being on fire and hitting the wall and going through this whole process, it’s actually achieved at least the local optima given the reward function of maximizing the number of points. And so it’s figured out a way to earn a higher reward while ignoring the implied bigger picture goal of finishing the race which us as humans understand much better. This raises, for self-driving cars, ethical questions. Besides other quick questions. (CHUCKLING) We could watch this for hours and it will do that for hours and that’s the point: It’s hard to teach, it’s hard to encode the formally defined utility function under which an intelligent system needs to operate. And that’s made obvious even in a simple game. And so what is – Yup, question. (Inaudible question from one of the attendees) So the question was: “what’s an example of a local optimum that an autonomous car, similar to the cost racer, what would be the example in the real world for an autonomous vehicle? And it’s a touchy subject. But it would certainly have to be involved the choices we make under near crashes and crashes. The choices a car makes want to avoid. For example, if there’s a crash imminent and there’s no way you can stop to prevent the crash, do you keep the driver safe or do you keep the other people safe. And there has to be some, even if you don’t choose to acknowledge it, even if it’s only in the data and the learning that you do, there’s an implied reward function there. And we need to be aware of that reward function is because it may find something. Until you actually see it, we won’t know it. Once we see it, we realize that oh that was a bad design and that’s the scary thing. It’s hard to know ahead of time what that is. So the recent breakthroughs from deep learning came several factors. First is the compute, Moore’s Law. CPUs are getting faster, hundred times faster, every decade. Then there’s GPU use. Also the ability to train neural networks and GPUs and now ASICs has created a lot of capabilities in terms of energy efficiency and being able to train larger networks more efficiently. Well, first of all in the in the 21st Century there’s digitized data. There’s larger data sets of digital data and now there is that data is becoming more organized, not just vaguely available data out there on the internet, it’s actual organized data sets like Imagenet. Certainly for natural languages there’s large data sets. There is the algorithm innovations, Backprop. Back propagation, Convolutional Neural Networks, LSTMs. All these different architectures for dealing with specific types of domains and tasks. There is the huge one, is infrastructure. It’s on the software and the hardware side. There’s Git, Ability to Share and Open Source Way software. There are pieces of software that make robotics and make machine learning easier. ROS, TensorFlow. There is Amazon Mechanical Turk which allows for efficient, cheap annotation of large scale data sets. As AWS and the cloud hosting, machine learning hosting the data and the compute. And then there’s a financial backing of large companies – Google, Facebook, Amazon. But really nothing is changed. There really has not been any significant breakthroughs. Convolutional networks have been around since the 90s, neural networks has been around since the 60s. There’s been a few improvements but the hope is, that’s in terms of methodology, the compute has really been the work horse. The ability to do the hundred fold improvement every decade, holds promise and the question is whether that reasoning thing I talked about, all you need is a larger network. That is the open question. Some terms for deep learning. First of all deep learning, is a PR term for neural networks. It is a term for utilising deep neural networks for neural networks to have many layers. It is symbolic term for the newly gained capabilities that compute has brought us. That training on GPUs have brought us. So deep learning is a subset of machine learning. There’s many other methods that are still effective. The terms that will come up in this class is, first of all, Multilayer Perceptron (MLP) Deep neural networks (DNN), Recurrent neural networks (RNN), LSTM (Long Short-Term Memory) Networks, CNN and ConvNet (Convolutional neural networks), Deep Belief Networks. And the operational come up is Convolutional, Pooling, Activation functions and Backpropagation. Yes, you’ve got a question? (Inaudible question from one of the attendees) So the question was, what is the purpose of the different layers in neural network? What is the need of one configuration versus another? So a neural network, having several layers, it’s the only thing you have an understanding of, is the inputs and the outputs. You don’t have a good understanding about what these layer does. They are mysterious things, neural networks. So I’ll talk about how, with every layer, it forms a higher level. A higher order representation of the input. So it’s not like the first layer does localization, the second layer does path planning, the third layer does navigation – how you get from here to Florida – or maybe it does, but we don’t know. So we know we’re beginning to visualize neural networks for simple tasks like for ImageNet classifying cats versus dogs. We can tell what is the thing that the first layer does, the second layer, the third layer and we look at that. But for driving, as the input provide just the images the output the steering. It’s still unclear what you learned partially because we don’t have neural networks that drive successfully yet. (Points to a member of the class) (Inaudible question) So the question was, does a neural network generate layers over time, like does it grow it? That’s one of the challenges, that a neural network is pre-defined. The architecture, the number of nodes, the number of layers.
That’s all fixed. Unlike the human brain where the neurons die and are born all the time. A neural Network is pre-specified, that’s it. That’s all you get and if you want to change that, you have to change that and then retrain everything. So it’s fixed. So what I encourage you is to proceed with caution because there’s this feeling when you first teach a network with very little effort, how to do some amazing tasks like classify a face versus non-face, or your face versus other faces or cats versus dogs, its an incredible feeling. And then there’s definitely this feeling that I’m an expert but what you realize is we don’t actually understand how it works. And getting it to perform well for more generalized task, for larger scale data sets, for more useful applications, requires a lot of hyper-parameter tuning. Figuring out how to tweak little things here and there and still in the end, you don’t understand why it work so damn well. So deep learning, these deep neural network architectures is representation learning. This is the difference between traditional machine learning methods where, for example, for the task of having an image here is the input. The input to the network here is on the bottom, the output up on top, and the input is a single image of a person in this case. And so the input, specifically, is all the pixels in that image. RGB, the different colors of the pixels in the image. And over time, what a network does is build a multiverse solutional representation of this data. The first layer learns the concept of edges, for example. The second layer starts to learn composition of those edges, corners, contours. Then it starts to learn about object parts. And finally, actually provide a label for the entities that are in the input. And this is the difference in traditional machine learning methods where the concepts like edges and corners and contours are manually pre-specified by human beings, human experts, for that particular domain. And representation matters because figuring out a line for the Cartesian coordinates of this particular data set where you want to design a machine learning system that tells the difference between green triangles and blue circles is difficult. There is no line that separates them cleanly. And if you were to ask a human being, a human expert in the field. to try to draw that line they would probably do a Ph. D. on it and still not succeed. But a neural network can automatically figure out to remap that input into polar coordinates where the representation is such that it’s an easily, linearly separable data set. And so, deep learning is a subset of representation learning, is a subset of machine learning and a key subset artificial intelligence. Now, because of this, because of its ability to compute an arbitrary number of features that are at the core of the representation. So if you are trying to detect a cat in an image, you’re not specifying 215 specific features of cat ears and whiskers and so on that a human expert will specify you allow and you’ll know it discover tens of thousands of such features, which maybe for cats you are an expert but for a lot of objects you may never be able to sufficiently provide the features which successfully will be used for identifying the object. And so, this kind of representation learning, one is easy in the sense that all you have to provide is inputs and outputs. All you need to provide is a data set the care about without [00:53:39] features. And two, because of it’s ability to construct arbitrarily sized representations, deep neural networks are hungry for data. The more data we give them, the more they are able to learn about this particular data set. So let’s look at some applications. First, some cool things that deep neural networks have been able to accomplish up to this point. Let me go through them. First, the basic one. AlexNet is for- ImageNet is a famous data set and a competition of classification, localization where the task is given an image, identify what are the five most likely things in that image and what is the most likely and you have to do so correctly. So on the right, there’s an image of a leopard and you have to correctly classify that that is in fact the leopard. So they’re able to do this pretty well given a specific image. Determine that it’s a leopard. And we started, what’s shown here on the x-axis is years on the y-axis is error in classification. So starting from 2012 on the left with AlexNet and today the errors decreased from 16% and 40% before then with traditional methods have decreased to