Machine Learning Methods – Computerphile

Machine Learning Methods – Computerphile

November 22, 2019 66 By Stanley Isaacs


Well today, I want to talk data mining which is what I’m really interested in and I want to explain a little bit about the inner workings of data mining a little bit of the sort of terms that you might have heard when you read – the first lecture or the first book I want to talk about supervised learning, unsupervised learning, what exactly are these things, and then I want to get on to something new semi-supervised learning and also What’s the research at the moment in this area? It’s called Machine learning That’s the sort of applied artificial [intelligence] machine learning if you get a data you want to mine the data and Broadly there’s kind of two categories of methods how this works, so if [I] could pull up my prop. Yes, I’ve carefully prepared Here are some items of data that I have brought along the first method may be that I should explain is unsupervised learning Because it perhaps the easier way, it’s called unsupervised learning Because we don’t have any examples that are labeled, so it’s an unlabeled learning yeah I guess the idea is a supervisor knows the answer and we don’t have anybody who knows the answer So we get the data to begin with and we don’t really know anything about it We know obviously the attributes. We know the values, but we don’t know what categories are they let’s say that’s a problem So unsupervised learning very often is just sorting off the data so unsupervised learning very often is just sorting of the data So you get your first date item and you put it somewhere and then comes another data item and you basically go let’s do colors is this similar or is this different and Now this is quite different. We put it there and then comes another date item. Oh It’s this similar or is it different it’s a little bit similar to the yellow ones so we’ll put it a little bit closer to the yellow one and Then comes another data item and no This is obviously quite similar to the yellow one so we put it closer to here and then so over time you get all these Data items in and they might end up a bit like Something maybe a bit like that So what have I done? I’ve done a sorting of the data and the approach I’ve done is something based on similarity measures these unsupervised methods they all use the similarity measure in this case I’ve done kind of by color the other way these methods usually work is to actually start out by saying but how many groups would? You like your data to be in how many clusters would you like it to be in? So let’s say you want them in three clusters Well, then maybe solution might look like this, it’s clustered in by the color three clusters If there would have been four clusters maybe the solution would have looked like this And if there was maybe two clusters it might even looked like this, so you might ask okay? So so what’s the data mining about the sorting of the data well? Once we sorted the data in this way. We can of course have a look at all So what ended up together maybe these things have ended up together? And maybe now we can say oh, this is the light colors. This is the dark colors, and we certainly have two groups I mean we wouldn’t normally sort color cubes You would sometimes saw patients and are they really ill or are they very ill and you know that sort of thing we could sort about this now most of the Unsupervised method spoke exactly like I described to the worker by sorting it the differences that [had] [a] measure the difference between things so is it a statistical similarity is it a Algebraic similarities that your metric measure you can imagine or so many ways you can measure the difference between things Unsupervised learning is sort of quite a simple way of doing it I mean, it’s quite quick the algorithms, but it’s not as powerful as other methods What’s the problem with it? One of the problems with it is actually quite straightforward Let’s say we end up with this solution. Well, is this a good solution, or it’s not a good solution It’s actually really hard. It’s really hard to evaluate because we obviously don’t know about the data We don’t know so we’re looking at it Going which looks okay? But maybe not and then very often what happens actually if you look at the data from one way It looks like a good solution, but now I do my reveal we sort of turn the data a bit And you know suddenly we have another angle on the data and like actually now. It’s a mess They’re not really sorted variable at them or are they well often what happens? That’s often what happens with unsupervised learning you sort them in one way, and they look quite good But then we look at the data differently and actually this hasn’t quite worked And it’s not so great the other downside with Unsupervised learning is the algorithms really only work when you tell them how many groups you want to data to be in two groups, three groups, four groups For some problems you might notice maybe you have like I say ill patients and healthy patients And you know there is two groups but very often actually how many groups you have is the whole question so you can’t really use these methods that well, if you want to know some technical terms Kmeans for example, it’s a classic unsupervised method That’s very popular. So if you can look it up, you’ll learn a bit more about it now… Second way of doing learning would be the supervised way We said unsupervised there must be that must be a supervised way. Here the difference is that you have some data which has some answers attached to it already so you can learn from it From this data really learn from it and a classic way of doing it is [them] well well neural networks forms one of the best-known ones. How does that work, okay? Well? So have some date again, and this time let’s say we want to do something a bit different We want to just sort them in light colors and dark colors for example And what happens is I get my data in and already somebody has labeled the data for me they said these are light colors, these are dark colors so we already know the answer for this data We don’t know it for some other data, but we know it for this This is our training data And now I’m going to do a new learning neural network the first data item comes in it goes here The next occurred item comes in and goes here And I keep doing this and maybe I end up with something that looks like this And now of course I can assess the quality of the solution and go… oh well algorithm, you’ve done Okay, but you haven’t done it really well because these two should be over there this one should really be there fix the function a bit and do it again [okay] back And we might end up like this. It’s like Okay, that was better But he’s still got one wrong fix the function again and do it against this called back propagation neural Network And we’ll do this again and of course if you do this long enough eventually the algorithm will learn the perfect function how to sort things and then the idea is a new data item comes along and It will go to the same function and because the function is now perfect it you will end up exactly the right place no problem and then ah and then no problem so It’s supervised because we have labels and because [of] labels we can assess the quality and in neural networks it’s the classic way of doing this and in general supervised learning is very powerful because As long as we have enough data with enough labels, we can always learn the function, and then it should work really well But well there wouldn’t be research if we’re finished with it So there’s obviously a problem with this as well. The problem with this is that it can lead to overfitting What does overfitting mean? Means like tight jeans you know. No, not that. It means that you have Too much emphasis on getting the function right you make it too right. So the function is absolutely perfect in fact it’s so perfect, it’s brittle it’s it’s it’s just not good anymore So what happens is a new data item comes along one that you’ve not seen before I got one And the unsupervised method wouldn’t have a problem with this because it just goes by similarity and we’ll go It’s kind of a light color you probably end up here But a supervised method has never seen this color before and the function goes like what do I do with this and it Pftttt breaks or it puts you just at a random place like maybe here so supervised learning is really good But if you overdo it, then you’ve overfitted and the problem is that you actually make the system worse again. You made it brittle The other downside of supervised learning is you must actually have enough data with labels which for some problems you have it’s fine but for some problems, you don’t really have it, so Let’s talk about a practical problem that I was working on so I was working with doctors in a hospital Clinicians who look after colon cancer patients and they took many years to collect the data of about 500 patients of classic medical data so we’ve got age, critical medical history we’ve got genetic values, blood values, and so on and so on and so on and They get diagnose the different categories of illness some more serious some less serious and the doctors wanted some help with this categorization the most serious cases and the least serious cases they’re quite clear, but it’s just this whole group in the middle And I wanted to make sure can we split them a bit better And so we were working with this with them, and so this is a classic problem And in that case there was 500 patients that were already categorized as in what category of illness they were in so actually a supervised approach was really good because we could learn from those 500 and build up a picture and as long as we’re careful to not overdo it we’ll be fine But then what actually happened and this leads me on to what my research is at the moment What happened is…. not for all the 500 patients did they have all the labels because some of the technology has been changing over the years So there’s more modern things now that I didn’t have ten years ago so actually for the last 50 patients they had some additional labels that I didn’t have for all the others and So we were talking about what to do with this And there’s a method called semi-supervised learning which is kind of what the research is on Why can we take the best of both worlds and maybe combine it a bit so what if you’ve got a few labels? It’s not enough to learn perfectly, but maybe we can do something so what we’ve done is a semi-supervised method And it’s kind of a mixture of the two You get your data and let’s just say we want to split them in light and dark colors It’s basically our more serious patients and our less serious patients And you might end up sorting the data something like that because it’s an unsupervised approach first of all we don’t know exactly how good this is But then for some of the data items we have a label and we can look up What’s the number on them and because for some of you have a label now We can say okay all the ones have the same label or with a similar label are there in the same group so suddenly We can assess the quality of this So we don’t have a label for all of these, but have a label for some of them are they in the same group Yes, and then the same labels are in the same group Yeah That looks like a good solution semi-supervised learning is probably the future because as data sets get bigger and bigger and bigger You don’t have labels anymore for everything because nobody has time to label everything and computers can’t really label things very well so you’ll have the experts labeling a few things and semi-supervised learning will be where this is going but Then the next step really would be to have it interactive that would be even better So that’s kind of what we’re working on right now. It’s called a man in the loop or human in the loop learning where You maybe have no labels at all or maybe just very few and then you do some sorting of the data and then we asked the expert has the sorting gone well? Has it not gone well? Well, what about this one item, what would be the label you would give it and it sort of a bit interactive And I think they’ll be much better because then you can you know there is [more] in real time and you can actually also Latest developments can come in tacit knowledge that you might not even have in the data So that’s like spot checking? Yeah, exactly it’s like spot checking it and but then putting that knowledge back into the algorithm So the algorithm can learn from it again and it’s a sort of reinforce a bit That’s a single-car. That’s basically controlling the robot twice 864 processes. Which is more than a robot will usually get. Where are we going now? I’ll show you the big machine. That’s it.