# But what is a Neural Network? | Deep learning, chapter 1

This is a three. It’s sloppily written and rendered at an extremely low resolution of 28 by 28 pixels. But your brain has no trouble recognizing it as a three and I want you to take a moment to appreciate How crazy it is that brains can do this so effortlessly? I mean this this and this are also recognizable as threes, even though the specific values of each pixel is very different from one image to the next. The particular light-sensitive cells in your eye that are firing when you see this three are very different from the ones firing when you see this three. But something in that crazy smart visual cortex of yours resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas But if I told you hey sit down and write for me a program that takes in a grid of 28 by 28 pixels like this and outputs a single number between 0 and 10 telling you what it thinks the digit is Well the task goes from comically trivial to dauntingly difficult Unless you’ve been living under a rock I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present into the future But what I want to do here is show you what a neural network actually is Assuming no background and to help visualize what it’s doing not as a buzzword but as a piece of math My hope is just that you come away feeling like this structure itself is Motivated and to feel like you know what it means when you read or you hear about a neural network quote-unquote learning This video is just going to be devoted to the structure component of that and the following one is going to tackle learning What we’re going to do is put together a neural network that can learn to recognize handwritten digits This is a somewhat classic example for Introducing the topic and I’m happy to stick with the status quo here because at the end of the two videos I want to point You to a couple good resources where you can learn more and where you can download the code that does this and play with it? on your own computer There are many many variants of neural networks and in recent years There’s been sort of a boom in research towards these variants But in these two introductory videos you and I are just going to look at the simplest plain-vanilla form with no added frills This is kind of a necessary prerequisite for understanding any of the more powerful modern variants and Trust me it still has plenty of complexity for us to wrap our minds around But even in this simplest form it can learn to recognize handwritten digits Which is a pretty cool thing for a computer to be able to do. And at the same time you’ll see how it does fall short of a couple hopes that we might have for it As the name suggests neural networks are inspired by the brain, but let’s break that down What are the neurons and in what sense are they linked together? Right now when I say neuron all I want you to think about is a thing that holds a number Specifically a number between 0 & 1 it’s really not more than that For example the network starts with a bunch of neurons corresponding to each of the 28 times 28 pixels of the input image which is 784 neurons in total each one of these holds a number that represents the grayscale value of the corresponding pixel ranging from 0 for black pixels up to 1 for white pixels This number inside the neuron is called its activation and the image you might have in mind here Is that each neuron is lit up when its activation is a high number? So all of these 784 neurons make up the first layer of our network Now jumping over to the last layer this has ten neurons each representing one of the digits the activation in these neurons again some number that’s between zero and one Represents how much the system thinks that a given image? Corresponds with a given digit. There’s also a couple layers in between called the hidden layers Which for the time being? Should just be a giant question mark for how on earth this process of recognizing digits is going to be handled In this network I chose two hidden layers each one with 16 neurons and admittedly that’s kind of an arbitrary choice to be honest I chose two layers based on how I want to motivate the structure in just a moment and 16 well that was just a nice number to fit on the screen in practice There is a lot of room for experiment with a specific structure here The way the network operates activations in one layer determine the activations of the next layer And of course the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer It’s meant to be loosely analogous to how in biological networks of neurons some groups of neurons firing cause certain others to fire Now the network I’m showing here has already been trained to recognize digits and let me show you what I mean by that It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image That pattern of activations causes some very specific pattern in the next layer Which causes some pattern in the one after it? Which finally gives some pattern in the output layer and? The brightest neuron of that output layer is the network’s choice so to speak for what digit this image represents? And before jumping into the math for how one layer influences the next or how training works? Let’s just talk about why it’s even reasonable to expect a layered structure like this to behave intelligently What are we expecting here? What is the best hope for what those middle layers might be doing? Well when you or I recognize digits we piece together various components a nine has a loop up top and a line on the right an 8 also has a loop up top, but it’s paired with another loop down low A 4 basically breaks down into three specific lines and things like that Now in a perfect world we might hope that each neuron in the second-to-last layer corresponds with one of these sub components That anytime you feed in an image with say a loop up top like a 9 or an 8 There’s some specific Neuron whose activation is going to be close to one and I don’t mean this specific loop of pixels the hope would be that any Generally loopy pattern towards the top sets off this neuron that way going from the third layer to the last one just requires learning which combination of sub components corresponds to which digits Of course that just kicks the problem down the road Because how would you recognize these sub components or even learn what the right sub components should be and I still haven’t even talked about How one layer influences the next but run with me on this one for a moment recognizing a loop can also break down into subproblems One reasonable way to do this would be to first recognize the various little edges that make it up Similarly a long line like the kind you might see in the digits 1 or 4 or 7 Well that’s really just a long edge or maybe you think of it as a certain pattern of several smaller edges So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges Maybe when an image like this one comes in it lights up all of the neurons associated with around eight to ten specific little edges which in turn lights up the neurons associated with the upper loop and a long vertical line and Those light up the neuron associated with a nine whether or not This is what our final network actually does is another question, one that I’ll come back to once we see how to train the network But this is a hope that we might have. A sort of goal with the layered structure like this Moreover you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks And even beyond image recognition there are all sorts of intelligent things you might want to do that break down into layers of abstraction Parsing speech for example involves taking raw audio and picking out distinct sounds which combine to make certain syllables Which combine to form words which combine to make up phrases and more abstract thoughts etc But getting back to how any of this actually works picture yourself right now designing How exactly the activations in one layer might determine the activations in the next? The goal is to have some mechanism that could conceivably combine pixels into edges Or edges into patterns or patterns into digits and to zoom in on one very specific example Let’s say the hope is for one particular Neuron in the second layer to pick up on whether or not the image has an edge in this region here The question at hand is what parameters should the network have what dials and knobs should you be able to tweak so that it’s expressive enough to potentially capture this pattern or Any other pixel pattern or the pattern that several edges can make a loop and other such things? Well, what we’ll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer These weights are just numbers then take all those activations from the first layer and compute their weighted sum according to these weights I Find it helpful to think of these weights as being organized into a little grid of their own And I’m going to use green pixels to indicate positive weights and red pixels to indicate negative weights Where the brightness of that pixel is some loose depiction of the weights value? Now if we made the weights associated with almost all of the pixels zero except for some positive weights in this region that we care about then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about And, if you really want it to pick up on whether there’s an edge here what you might do is have some negative weights associated with the surrounding pixels Then the sum is largest when those middle pixels are bright, but the surrounding pixels are darker When you compute a weighted sum like this you might come out with any number but for this network what we want is for activations to be some value between 0 & 1 so a common thing to do is to pump this weighted sum Into some function that squishes the real number line into the range between 0 & 1 and A common function that does this is called the sigmoid function also known as a logistic curve basically very negative inputs end up close to zero very positive inputs end up close to 1 and it just steadily increases around the input 0 So the activation of the neuron here is basically a measure of how positive the relevant weighted sum is But maybe it’s not that you want the neuron to light up when the weighted sum is bigger than 0 Maybe you only want it to be active when the sum is bigger than say 10 That is you want some bias for it to be inactive what we’ll do then is just add in some other number like negative 10 to this weighted sum Before plugging it through the sigmoid squishification function That additional number is called the bias So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully active And that is just one neuron Every other neuron in this layer is going to be connected to all 784 pixels neurons from the first layer and each one of those 784 connections has its own weight associated with it also each one has some bias some other number that you add on to the weighted sum before squishing it with the sigmoid and That’s a lot to think about with this hidden layer of 16 neurons that’s a total of 784 times 16 weights along with 16 biases And all of that is just the connections from the first layer to the second the connections between the other layers Also, have a bunch of weights and biases associated with them All said and done this network has almost exactly 13,000 total weights and biases 13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways So when we talk about learning? What that’s referring to is getting the computer to find a valid setting for all of these many many numbers so that it’ll actually solve the problem at hand one thought Experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand Purposefully tweaking the numbers so that the second layer picks up on edges the third layer picks up on patterns etc I personally find this satisfying rather than just reading the network as a total black box Because when the network doesn’t perform the way you anticipate if you’ve built up a little bit of a relationship with what those weights and biases actually mean you have a starting place for Experimenting with how to change the structure to improve or when the network does work? But not for the reasons you might expect Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions By the way the actual function here is a little cumbersome to write down. Don’t you think? So let me show you a more notationally compact way that these connections are represented. This is how you’d see it If you choose to read up more about neural networks Organize all of the activations from one layer into a column as a vector Then organize all of the weights as a matrix where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer What that means is that taking the weighted sum of the activations in the first layer according to these weights? Corresponds to one of the terms in the matrix vector product of everything we have on the left here By the way so much of machine learning just comes down to having a good grasp of linear algebra So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means take a look at the series I did on linear algebra especially chapter three Back to our expression instead of talking about adding the bias to each one of these values independently we represent it by Organizing all those biases into a vector and adding the entire vector to the previous matrix vector product Then as a final step I’ll rap a sigmoid around the outside here And what that’s supposed to represent is that you’re going to apply the sigmoid function to each specific component of the resulting vector inside So once you write down this weight matrix and these vectors as their own symbols you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression and This makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication Remember how earlier I said these neurons are simply things that hold numbers Well of course the specific numbers that they hold depends on the image you feed in So it’s actually more accurate to think of each neuron as a function one that takes in the outputs of all the neurons in the previous layer and spits out a number between zero and one Really the entire network is just a function one that takes in 784 numbers as an input and spits out ten numbers as an output It’s an absurdly Complicated function one that involves thirteen thousand parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squish evocation function But it’s just a function nonetheless and in a way it’s kind of reassuring that it looks complicated I mean if it were any simpler what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Oh? That’s what I’ll show in the next video, and I’ll also dig a little more into what this particular network we are seeing is really doing Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out But realistically most of you don’t actually receive notifications from YouTube, do you ? Maybe more honestly I should say subscribe so that the neural networks that underlie YouTube’s Recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you anyway stay posted for more Thank you very much to everyone supporting these videos on patreon I’ve been a little slow to progress in the probability series this summer But I’m jumping back into it after this project so patrons you can look out for updates there To close things off here I have with me Lisha Li Lee who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called amplify partners Who kindly provided some of the funding for this video so Lisha one thing I think we should quickly bring up is this sigmoid function As I understand it early networks used this to squish the relevant weighted sum into that interval between zero and one You know kind of motivated by this biological analogy of neurons either being inactive or active

(Lisha) – Exactly (3B1B) – But relatively few modern networks actually use sigmoid anymore. That’s kind of old school right ?

(Lisha) – Yeah or rather ReLU seems to be much easier to train

(3B1B) – And ReLU really stands for rectified linear unit (Lisha) – Yes it’s this kind of function where you’re just taking a max of 0 and a where a is given by what you were explaining in the video and what this was sort of motivated from I think was a partially by a biological Analogy with how Neurons would either be activated or not and so if it passes a certain threshold It would be the identity function But if it did not then it would just not be activated so be zero so it’s kind of a simplification Using sigmoids didn’t help training, or it was very difficult to train It’s at some point and people just tried relu and it happened to work Very well for these incredibly Deep neural networks.

(3B1B) – All right Thank You Lisha for background amplify partners in early-stage VC invests in technical founders building the next generation of companies focused on the applications of AI if you or someone that you know has ever thought about starting a company someday Or if you’re working on an early-stage one right now the Amplify folks would love to hear from you they even set up a specific email for this video [email protected] so feel free to reach out to them through that

I watched this video a while ago it's very good. Having watched a lot of other videos on your channel I think you could do very well explaining neural networks as differentiable tensor operations more generally

Dude, this explanation is facking amazing, thanks for it)))

Hmm.

how is it possible that I can lie in my bed on a Sunday and am presented with mind-boggling cutting edge knowledge told by an incredibly soothing voice in a world class manner on a 2K screen of a pocket supercomputer basically for free

the way this woman talks hurts my ears

Bardzo ciekawe .

thx

If bias differ between each layer, does that increase dimensionality? Are biases trained as well?

I'm a 14 years old non english fluent and i understood everything. Is it normal? Is there anyone here like me?

You are the best! And Squishification lol 😂

so neural network is actually a function and choosing right values for its 13,000 parameters ?

and for each set of problems these parameter values will be changed?

Please make full lectures on deep learning we all need

Thank you 🙂 This video really helped me alot!

it should be w(1, n) in the matrix 2nd row by the way and w(k, 0), w(k, 1), .., w(k,n) in order to dot with nx1. Thanks, great video, will send it to anyone interested in NNs and deep learning

Great Video <3

Explain in a very intuitive way and I love this 😀

PS. I wondered why (12:30) : (784×16)+(16×16)+(16×10)+(16+16+10) why not (784x16x16x10)+(16+16+10) ? sorry for my dumb question

This just made it all click for me! Thank you!

I don't know what the fuck did I stumble to. But ill study the shit out of this field and see where it takes me.

Could I use a neural network like this for time series prediction?

so if its linear algebra, its like graphics processing, which is why they use GPUs ?? nice!

Can You Please Create a Whole Series on Math behind All Machine Learning Algorithms. we really need this.

Why are there n bias terms? If each bias is associated with a neuron in the hidden layer, and there are k neurons in the hidden layer, shouldn't there be k biases?

anybody knows how to download these amazing videos?

I'm Japanese. But, I can understand because your videos have Japanese subtitles. Thank you very much( ^ ω ^ )

How long should it usually take to train this relatively simple AI?

14:46 it should be b0-bk vector

I was mindblown when my prof in the lecture showed us that thereis only a 0.34% (or so it was a very low number) error rate for recognizing these handwritten numbers. Because so many numbers look similar in messy handwriting even I have problems to distinguish a messy 1 from a messy 7 or o

I remember watching this awhile ago, and after watching this again after watching VSauce's video on this on Minefield, (It's free now btw) this makes a lot of sense now and makes it seem a lot nicer.

I am a musician and artist. I am GED 6 months of technical college experience. I want to tell you I'm starting to grasp this little by little. I want to learn this. I have been catapulted into this yearning /craving for learning about machine learning via meditation and some. B+ Psilocybin Cubensis experiences. With that said, thank you for sharing your talents and work. I will continue to watch these over and over. Why? I think b/c I'm a creative person and this is creativity. Do we grasp the importance of creativity? In all it's malleable mediums. An individual or group passionate creative process then transcends time and space to connect with the recipient becomming part of their life. Our lives! Traveling some unseen conduit. Landing in our lap's like heart ache and hand grenade's. Art, music and ALL forms of creativity outdate religion, language and government. It's universally personal and the sinew that binds humanity. #ThIsIsNoTThEaLgOrItHiM on YT the Wellrose Hummingbird playlist for my music and slippery.sliding.slope on IG for art. Which I've been using images of neurological images, sound wave images etc to layer my art. Here's a song

Everything is Aligned https://youtu.be/Ftiw0rUq4es

And a song I wrote about nature and forgiveness

https://youtu.be/oWFk98pY-e8

🌬️🍄🧠⤵️🕳️. 🌬️ ⤴️🕳️👥

✨. ✨

✨. ✨

✨🌬️📡⤴️

3:35 take 10 seconds to appreciate how awesome this animation is

10:50부터 한국어 자막이 안나와요빨리 고쳐주세요

14：42 there is a mistake.bn—>bk

This is an amazing explanation that helped me a lot… Thanks from a german engineer!!!

so i’m colorblind, (duetranopia, occurs in around 10% of white males) so the green and red colors weren’t really distinguishable for me around the ~10 minute mark. just something to think about in the future when making videos that need color

Thanks for simplifying this 🙂

👎 Thumbs down from me from the super annoying background piano while you are talking. Unfortunately, you are not the only one. There are videos about physics which also use this stupid background elevator-style music when someone is talking. WTF. Whoever artist thinks this is necessary, curse you for ruining an otherwise great video.

👎 Do you also have a version of this video without the pointless piano noise? It's absolutely useless.

Q = Quantum Mechanics

Q = Quinn Michaels

Q = Quantum

Machine learning = Rahula Club Follow instructions. If you want truth it's out there.<– (Added)

#tyler

#sirisys

#alice

#brainknuckles

#maytheswartzbewithyou

thank you for the tutorials

Man, you are the best teacher I have ever come across.

10:15

Très beau travail malgres que je ne peux prétendre connaître le sujet j'ai aimé votre façon d'expliquer, mais laissez moi vous dire que tous ces programmes sont dangereux pour l'avenir. C'est utile mais ce n'est pas celà là vrai vie , là on tourne vers le chaos dans un futur proche avec l'IA qui es amélioré sans cesse.

PS: Regarder le film Enigma c'est un bon film. D'ailleurs comme on peut le voir dans ce film les algorithmes servait à nous défendre à prévoir maintenant il vont servir à contrôler le monde pour avoir le pouvoir ultime comme "Dieu" l'oeil et les oreilles partout et presque même au delà car tous ces calculs servent même à anticiper. J'avais besoin d'écrire tout cela

So what you’re telling me is that those “Are You A Robot?” captchas are useless now

Me: Can't read my own handwritten numbers

Neural network: I'll help you out

hi! what happened to this teasered video series on probabilities? 😉

You are the best: you taught unteachable stuff in 19 minutes

Question: 14:49: should vector [b0,…,bn] (n-dimensional) not be [b0,….,bk] (k-dimensional)?

Welp i successfully created the most complicated random number generator i've ever seen.

7549115417p

15:00 should the bias vector not be of length k? (k x n)(n x 1) = (k x 1) vector

Thank you very much for explaining such a complex topic.

Great video. Very good visualization

Hey Mr. Grant , I am a great fan of yours….I've always found your

videos so informative and clear..I want to connect with you whether you

are on Facebook or whats app…..Please e-mail me if you are seeing this

comment….

email id- [email protected]

Hey Mr. Grant , I am a great fan of yours….I've always found your

videos so informative and clear..I want to connect with you whether you

are on Facebook or whats app…..Please e-mail me if you are seeing this

comment….

email id- [email protected]

Hey Mr. Grant , I am a great fan of yours….I've always found your

videos so informative and clear..I want to connect with you whether you

are on Facebook or whats app…..Please e-mail me if you are seeing this

comment….

email id- [email protected]

videos so informative and clear..I want to connect with you whether you

are on Facebook or whats app…..Please e-mail me if you are seeing this

comment….

email id- [email protected]

well i found this lecture lot better than that of Andrew NG… much clearer and easier to understand.

I ain't gonna lie, you, kind sir, are doing an awesome job! 🙂 Thank you!!!!

Wow!

It is a good video series that gives a lot of background and a rough understanding, but it is not possible to program a neural network from this.

I read some of the articles in the description and wrote a simple net in Rust converting 4 bit binary numbers into decimal numbers displayed on a 7-segment display with my son. ( https://github.com/etok414/simple_nn )

To help me explaining the mathematical details to my son (and as an exercise in Latex) I wrote a 6 page note about backpropagation.

The note treats the simplest possible network (2 input, 2 neurons and 2 outputs) in painstaiking detail.

Hope it'll help someone:

https://github.com/etok414/simple_nn/blob/master/latex/Back_propagation.pdf

Brilliant animation

The video has a lot of unnecessary information. It would be nice if you can cut them off.

in 15:07 shouldn't the b matrix be like : [b0 b1 … bk]? instead of bn?

Best explanation of the concept of deep neural networks in the history of humanity. Kudos!

Good job and very helpful

wonderful explanation!

This video is amazing. Thanks a lot. You managed to show it better in 20 minutes than MIT videos in 40.

if RELU is simpler and quicker than Sigmoid so why are we learning sigmoid function in this video?

Hi great video and highly informative. I am currently studying Neural Networks and have written my own Neural Network in Java. However, I wish to take it to the next level and implement image recognition.

My question is: how can I obtain the grayscale value from a single pixel?

Stuff like this is a great insight to how the human brain can have things like optical illusions.

You're my HERO!

Are you a god am I in the heaven

Or this some high tech mit lecture going on. YouTube give him applause 👏 👏

You remind me of why I loved learning

now I'm glad I took calculus

When you weight between 0 and 1, I'm seeing an analogy to quantum theory. Are there models that make a difference on discreteness of weights? Is a "quantized" weight the very definition of the artificial in artifical/cylon intelligence opposed to natural/human intelligence?

Your videos are so good and educative that it makes me feel bad not paying you for having the right to watch them.

It is getting very deep and boring….

Start was good though, idea is also good.

Please don't have music in background, sometimes it's distracting.

Wish Youtube had multiple parallel audio tracks that users can choose to turn on/off this background music.

best video to understand neural net on youtube

I am amazed for how much room there is for education to be improved O_O

You have such a charming way of teaching. Good job on the vids!

Would e^x/(e^(x)+1) work istead of teh signoid function?

Good series… Very well explained…

Is there a way to use this process to recognize a agorythom? So if I feed it numbers every week. Eventually a pattern will be shown correct?

很棒的视频6666666666

Your whole work is so helpful, inspiring and Im not exaggerating when Im saying, valuable for whole mankind. The huge amounts of clicks you generate for your genre shows how outstanding your videos and your explaination skills are. Thank you!

Stunningly made video.

I am unable to understand the logic of assigning negative weights to the surrounding pixels. 9:58. I will be grateful if someone can help me with this. 🙂

3blue1brown, you are beyond awesome. So thankful. 🙂

Holy shit. This tutorial is bloody amazing. Everything was explained so clear and intuitively, i understood everything.

Absolutely amazing video, such skills of the teaching!

So it goes down to: feed > elememt breakdown (simplification) > comparison > meaning????

Mind blown!

Just Think how my Mind neurons have suffered in networking of this video😂

I love all your work, thank you so much for this video. The "visual cortex" is an antiquated idea. Brain "areas" have been replaced by networks: vision is thought to be a product of at least 305 connections among 32 different parts of the brain (Van Essen '91) (this doesn't even dive into the enavctism/emboidied congition aspect to vision).

opens the video

writes in notebook- oh,okay! neuron is thing that holds a numbers.

15 mins later…

thank you very much you are doing a good job , i don't know why some people without any mathematic or computer science background dislike such a great job!!!!

2:14 video based neural network already exists. its called autoencoder and used for deepfakes ( i read it on wiki)

seriously, this is the first time i find that ML makes sense! you are amazing

Have you really written the neural network's code or it's just an animation?

The real question is…can it read a doctor's handwriting?!

And amazing to know this is what our brain does constantly and subconsciously…the human brain is amazing…appreciate it

In minute 14:50, shouldn't the b's vector have indexes from 0 to k?