Machine Learning in JavaScript (TensorFlow Dev Summit 2018)

Machine Learning in JavaScript (TensorFlow Dev Summit 2018)

November 17, 2019 60 By Stanley Isaacs


♪ (music) ♪ (applause) Hi, everyone. Thanks for coming today. – My name is Daniel.
– My name is Nikhil. We’re from the Google Brain team, and today we’re delighted
to talk about JavaScript. So, Python has been
one of the mainstream languages for scientific computing, and it’s been like that for a while. And there’s a lot of tools
and libraries around Python. But is that where it ends? We’re here today to talk– to convince you that JavaScript
and the browser have a lot to offer. And TensorFlow Playground
is a great example of that. I’m curious– how many people
have seen TensorFlow Playground before? Oh, wow! Quite a few. I’m very glad. So, for those of you that haven’t seen it, you can check it out after our talk
atplayground.tensorflow.org.It is an in-browser visualization
of a small neural network, and it shows, in real time, all the internals
of the network as it’s training. And this was a lot of fun to make. It had a huge, educational success. We’ve been getting emails
from high schools and universities that have been using this
to teach students about machine learning. After we launched Playground, we were wondering,
“Why was it so successful?” And we think one big reason
was because it was in the browser. And the browser is this unique platform
where the things you build, you can share
with anyone with just a link. And those people that open your app
don’t have to install any drivers or any software, it just works. Another thing is the browser
is highly interactive, and so the user is going to be engaged
with whatever you’re building. Another big thing is that browsers– we didn’t take advantage
of this in the Playground, but browsers have access to sensors, like the microphone and the camera
and the accelerometer. All of these sensors
are behind standardized APIs that work on all browsers. And the last thing,
the most important thing, is the data that comes from these sensors
doesn’t ever have to leave the client. You don’t have to upload anything
to the server, which preserves privacy. Now, the playground that we built
is powered by a small neural network, 300 lines of vanilla JavaScript that we wrote as a one-off library. It doesn’t scale,
it’s just simple for-loops, and it wasn’t engineered to be reusable. But it was clear to us
that if we were to open the door for people to merge
machine learning and the browser, we had to build a library. And we did it! We releaseddeeplearn.js, a JavaScript library
that is GPU-accelerated and it does that via WebGL, which is a standard in the browser, that allows you to render 3D graphics. We utilize it to do linear algebra for us. Anddeeplearn.jsallows you to both
run inference in the browser and training entirely in the browser. When we released it,
we had an incredible momentum. The community tookdeeplearn.jsand took existing models in Python, imported it into the browser and built interactive fun things with it. So, one example is the Style Transfer. Another person imported the character RNN, and then built a novel interface
that allows you to explore all the different
possible endings of a sentence, all generated by the model in real time. Another example
is a font-generative model– there was a post about this. The person that built it
allowed users to explore the hidden dimensions, the interesting dimensions
in the embedding space. And you can see how they relate
to boldness and slantedness of the font. And there were even educational examples,
like Teachable Machine that built this fun little game that taught people
how computer vision models work, so people could interact
directly with the webcam. Now, all the examples I showed you point to the incredible momentum
we have withdeeplearn.js. And building on that momentum,
we’re very excited today to announce thatdeeplearn.js
is joining the TensorFlow family. And with that, we are releasing
a new ecosystem of libraries and tools for machine learning with JavaScript,
calledTensorFlow.js. Now, before we get into the details, I want to go over three main use cases
of how you can useTensorFlow.jstoday, with the tools and libraries
that we’re releasing. So, one use case is you can write models
directly in the browser, and this has huge
educational implications– think of the playground
that I just showed. A second use case is– a major use case is you can take
a pre-existing model, pretrained model in Python, use a script, and you can import it
into the browser to do inference. And a related use case is the same model
that you take to do inference, you can retrain it,
potentially with private data that comes from those sensors
of the browser, in the browser itself. Now, to give you more of a schematic view, we have the browser that utilizes WebGL
to do fast, linear algebra. On top of it,TensorFlow.js
has two sets of APIs: the ops API,
which used to bedeeplearn.js, and we worked hard to align
the API with TensorFlow Python. It is powered by an automatic
differentiation library that is built analogous to Eager mode. And on top of that,
we have a high-level API Layers API that allows you to use best practices and high-level building blocks
to write models. What I’m also very excited
today to announce is that we’re releasing tools
that can take an existing Keras model or TensorFlow SavedModel, import it automatically
for execution in the browser. Now, to show you an example of our API, we’re going to go over a small program that tries to learn the coefficients
of a quadratic function. So the coefficients we’re trying to learn
are a, b, and c from data. So we have ourimport tf
fromTensorFlow.js. For those of you that don’t know, this is a standard
ES6 import in JavaScript– very common. We have our three tensors: a, b, and c. We mark them as variables, which means that they are mutable and the optimizer can change them. We have ourf(x)function
that does the polynomial computation. You can see here familiar API,
liketf.addandtf.square, like TensorFlow. In addition to that API, we also have a Chaining API which allows you to call
these map operations on Tensors itself. And this leads to better readable code that is closer to how we write math. Chaining is very popular
in the JavaScript world. So that’s the feed-forward
part of the model. Now, for the training part,
we need a loss function. So here is a loss function
that is just the mean-squared error between the prediction and the label. We have our optimizer, ansgdoptimizer, and we train the model–
we calloptimizer.minimizefor some number of EPOCHS. And here I want to emphasize, for those of you
that have usedtf.figurebefore or the talk before us, Alex’s talk, the API inTensorFlow.js
is aligned with the Eager API in Python. Alright, so clearly that’s not how
most people write machine learning, because those low-level linear
algebra ops can be quite verbose. And for that, we have our Layers API. And to show you an example of that, we’re going to build
a recurrent neural network that learns to sum two numbers. But the complicated part
is that those numbers, like the number 90+10,
are being fed character by character. And then the neural network has to maintain an internal state
with an LSTM cell. That state then gets passed
into a decoder, and then the decoder has to output 100,
character by character. So it’s a sequence-to-sequence model. This may sound a little complicated, but with the Layers API
is not that many lines of code. We have ourimport tffromTensorFlow.js. We have our sequential model,
which just means it’s a stack of layers. For those of you that are familiar
withtf.layersin Python or Keras, this API looks very familiar. The first two layers are the encoder, the last three layers are the decoder. And that’s our model. We then compile it
with a loss, an optimizer, and a metric we want to monitor, likeaccuracy, and we callmodel.fit, with our data. Now, what I want to point out here
is theawaitkeyword. So,model.fitis an asynchronous call, which means– because in practice that can take
about 30 or 40 seconds in a browser. And in those 30 or 40 seconds, you don’t want the main UI thread
of the browser to be locked. And this is why you get a callback
with a history object after that’s done. And in between, the GPU
is going to do the work. Now, the code I showed you is when you are trying
to write models directly– when you want to write models
directly in the browser. But, as I said before, a major use case,
even withdeeplearn.jswas people importing models
that were pretrained, and they just want to do
inference in the browser. And before we jump
into the details of that, I want to show you a fun little game that our friends
at Google Brand Studio built that takes advantage
of an automatically pretrained model ported into the browser. And the game is called
Emoji Scavenger Hunt. And the way it works– I’m going to show you here
a real demo with the phone. It’s in the browser. Let me see. Can I see here? So you can see I have a Chrome browser
opened up on a Pixel phone. You can see the URL on the top. And the game uses the webcam
and shows me an emoji, and then I have some time,
some number of seconds to find the real version
item of that emoji before the time runs out. So, before we play, Nikhil here is going to help me identify
the objects that this game asks. – You ready?
– (Nikhil) I’m ready. (Daniel) Alright, let’s go. (countdown beeping) – Alright. Watch. Do you have a watch?
– (Nikhil) I have a watch.(phone) Did I spot a velvet?Go on. – (Daniel) Whoo! Yay! We got that!
– (Nikhil) Awesome. (Daniel) Let’s see what our next item is. (countdown beeping) – (Daniel) Shoe.
– (Nikhil) Shoe. (Daniel) You got
to help me out here, buddy. –(phone) Did I spot a–
– (Daniel) Oh, yeah! –(phone) You found shoe.
– (Daniel) Whoo! We got the shoe! (Nikhil) Alright, what’s next? (countdown beeping) (Daniel) Right, it wants a banana. (Nikhil) A banana? Does anyone have a–
this guy’s got a banana! – (Daniel) Hold, what!
– (Nikhil) This guy’s got a banana! – (Daniel) Come over here.
(phone) Am I seeing a wall?– (Daniel) Yay! Alright!
– (Nikhil) Alright! – (Daniel) Look at us!
– (Nikhil) I’m ready. We’re going to have a high score here. – (Daniel) Beer!
(phone) Could that be a hat?(Nikhil) It’s 10:30
in the morning, Daniel! –(phone) Did I spot a velvet?
– (Nikhil) Let’s get back to the talk. – (Daniel) Alright. (chuckles)
(phone) I think I saw a milk can.(Nikhil) Alright, so I’m going to jump
into the technical details of how we actually built that game. The clicker? Yep. So, what we did was we trained
the model in TensorFlow to be an object recognizer
for the Scavenger Hunt game. We chose about 400 different classes that would be reasonable
for a game like this, you know, watches and bananas and beer. So, what we did was we used
the TensorFlow For Poets codelab. And in that codelab,
what you essentially do is you take a pretrained MobileNet model. And if you don’t know what MobileNet is, it’s a state-of-the-art computer
vision model for edge devices. So what we effectively did
was we took that model and we retrained it for these classes. Now we have an object detector
in the Python world. How do we actually
now get this into the browser? Well, we provide a set of tools today
that help you actually do that. Once it’s in, you skin the game
and you make the computer talk and all that kind of fun stuff. Let’s jump into how
we actually can convert that model. So, as Daniel mentioned earlier today,
we actually support two types of models, so we support TensorFlow SavedModels–
we have a converter for that. And we also have a converter
for Keras SavedModels. So you define your model
and you save it with a Save model– this is a standard way to do that. Similarly, this is a code
that you would do that for Keras [inaudible]. The next piece is that
we actually convert it to the Web. Today, we’re releasing a pit package. It’sTensorFlow.js,
you can install that there. There’s a script in there that lets you point
to your TensorFlow SavedModel, and point to an output directory, and that output directory will be where those static
build artifacts will go for the web. Keras is the same exact flow. Point to you HDF5 input
and you have an output directory where those [build artifacts] will go. Now you statically host those
on your website somewhere, you know, just simple static hosting. And on the JavaScript side, we provide
an API that lets you load that model. So this is what it looks like
for a TensorFlow. In the TensorFlow SavedModel
you’ll notice that it’s a frozen model, we don’t right now support
continuing training of this model. While in the Keras case,
we actually let you continue training, and we’re working hard to keep
these APIs aligned in the future. Okay, so under the cover,
what are we actually doing? So we’re doing some graph optimization, which essentially means
that we prune out nodes that you don’t need
to make the prediction, you don’t need that on the web. We optimize waits
for browser autocaching, so we pack in shard in chunks of 4MB which helps your browser be quick
the next time your page loads. Today, we support about 90
of the most commonly-used TensorFlow ops, and we’re working very hard
to support more, like control flow ops. And we support 32 of the most commonly-used
Keras layers today. And as I mentioned, we let
you continue training for Keras models, and we let you do evaluation, as well as make predictions
from that model. Okay, so obviously,
there’s a lot you can do, with just porting your models
to the web for inference. But since the beginning ofdeeplearn.js,
we’ve made it a high priority to make sure that you can
train directly in the browser. This opens up the door for education and interactive tools
like we saw with the playground, as well as let you train with data
that never leaves your client. So this is huge for privacy. So you show off what you can do
with something like this, we’ve built another little game. So, the goal of the game
is to play Pac-Man with your webcam. Now, Daniel’s going to be my helper here. He is much, much better at this game
than I am for some reason. Just say, “Hi!” So, there are three phases of the game. Phase 1– we’re going
to collect frames from the webcam and we’re going to associate those with up, down, left, and right,
these classes. Now Daniel’s going to move
his head up, down, left, and right, and he’s just simply going
to play the game like that. And you’ll notice,
as he’s collecting frames, he’s kind of moving around a little bit. This kind of helps the model
see different angles for that class and generalize a little bit better. So after he’s done
collecting these frames, we’re going to go and train our model. So we’re not actually training
from scratch here when we hit thatTrainbutton. We’re taking a pretrained MobileNet again, porting that to the web, and doing a retraining phase
with that data that’s local, and we’re using the Layers API
to do that in a browser here. Do you want to press thatTrainbutton? Alright. Our loss is going down– looks like we’re learning something. That’s great. So, as soon as we press thatPlaybutton, what’s going to happen is we’re going
to make predictions from the webcam– those are going to get plugged
into those controls and it’s going to control
the Pac-Man game. Ready? Alright, so you can see
in the bottom right, it’s highlighting the class
that it thinks it is. And Daniel, if he moves his head around,
you’ll see it change [by] class. And he’s off. (laughter) So… (chuckling) all of this code is online, and you can go fork it–
we invite you to do so. And obviously, this is just a game. But you can imagine other types
of applications of this, like make a browser extension that lets you control the page
for accessibility purposes. So again, all this code is online. Please go fork it, and play
and make something else with it. – Okay, Daniel. I know this is fun.
– (Daniel) I gotta? Alright. (laughter) (applause) You’ve got to get back to the talk. Okay, so let’s talk a little bit
about performance. So what we’re looking at here
is a benchmark of MobileNet 1.0 running with TensorFlow– this is TensorFlow classic,
not withTensorFlow.js. I want to point out here,
we’re using a batch size of 1. This is important
because we’re thinking about this in the context
of an interactive application, so maybe this Pac-Man game
where you feed in webcam data, you want to know
what the prediction time is for 1. You can’t really batch things. On the top row, we’re looking
at TensorFlow CUDA running on a 1080GTX– this is a beefy machine. It’s about three milliseconds. I want to point out– the shorter the bar,
the faster it is, clearly. On the second row we have TensorFlow CPU, and this is running with AVX-512, and this is on actually
one of these MacBook Pros here. It’s about 60 milliseconds for that frame. Where doesTensorFlow.js
fit into this picture? Well, it depends. If you’re running on a beefy 1080GTX, we’re actually getting
about 11 milliseconds for one pass through this MobileNet model, which is pretty good
if you think about this in the context of an interactive game. So, on the laptop that we just
showed the game with, we’re getting about 100 milliseconds for that inference,
passed through MobileNet. That’s still pretty good. You can build a whole interactive game with something that’s running
at 100 milliseconds. So the web is only going to get
faster and faster. There’s a whole new set
of standards coming, like WebGPU, that will really push the boundary
for these kinds of things. But the browser still has its limitations. You can only really get access to the GPU through WebGL on these APIs. So, how do we scale beyond that? How do we scale beyond the limitations
that we have in the browser? There’s a whole ecosystem
of server-side JavaScript tools usingnode.jsthat we would love
to take advantage of. So today, I’m really happy to tell you
that we’re working onnode.jsbindings to the TensorFlow C API. What that means is you’ll be able to write
that same low-level ops API with the Eager mode we saw with the polynomial example, or the high-level Layers API
which we saw for the Pac-Man example, and bind to TensorFlow C, and run head lists in your TensorFlow
running with CUDA installed. Eventually, that also means we can run
with a TPU on a backend– that same JS code. So these bindings
are underactive developments, so stay tuned for more. Alright, so let’s recap some of the things
that we launched today and that we talked about. So we talked about this low-level ops API which does hardware-
accelerated linear algebra as well as the Eager mode
differentiation for autograd. This was previously known asdeeplearn.js. We’re re-branding that today. We released the high-level Layers API. This is the Keras-inspired API
that mirrors TensorFlow Layers, and we saw an example
of that with the addition RNN, and we saw an example of that
with the Pac-Man demo. We also showed you how you can import
TensorFlow SavedModels, and Keras models for prediction
and retraining in the browser. We also have released a bunch of demos
and examples on Github. These are not the only two– there’s a whole repository
of different examples that can get you started, and they’ll have live links
so you can go and poke around and play. So I invite you to go do that. So we really want
to see you get involved in this project. We have a bunch of links here, sojs.tensorflow.org
is our official website. All of these links,
everything we’ve talked about is there– there’s tutorials,
there’s documentation, etc. Our code is obviously open-sourced
undertensorflow/tfjs. So I invite you to go play there too. And we also started
a community mailing list today– that’s the short link here. And the community
mailing list is for people to post demos and ask questions,
and that kind of thing. So this project
was not just Daniel and myself, this was a larger team effort between many of our amazing
colleagues at Google. So we want to thank them. We also want to thank all of the amazing open-source
contributors fordeeplearn.js. And we’re really excited
to build the next chapter of machine learning
and JavaScript with you. – Thank you
– (Daniel) Thank you. (applause) ♪ (music) ♪