Full presentation: Demystifying AI, David Hoyle at the Eye News Symposium 2020.

As was said, I used to be a group leader
at the University of Manchester in the biology/biological science department,
also the medical school but I now work in the commercial sector for a company
called dunnhumbly. I work as a research data scientist but I still do some
lecturing on the ophthalmology master’s course at the University of Manchester,
so hence my interest in ophthalmology. What I’m actually gonna be talking about
is AI, specifically talking about: Demystifying AI, trying to remove some of
the hype. I’ll be sort of explaining some of the technical details about how some
of these AI algorithms actually work, but don’t worry, there’s absolutely zero maths,
there’s one mathematical symbol in the entire talk and I’m not even going to
refer to it. Right so let’s start off. What I’m gonna sort of talk about,
broadly is, I’m gonna talk about AI. More specifically I’m going to talk
about machine learning because most of what you see in the press actually
around AI stories is actually machine learning; it’s a subfield of AI.
So I’ll go into the different types of AI. I’ll explain a bit about how some of these AI
models, these machine learning models, are trained, that’s the technical
language we use for sort of building the models, and then I’m going to talk
specifically about neural networks because neural networks are a very
specific kind of machine learning model that’s been used heavily in image
analysis and has been applied to analysis of retinal images and then
finally I’ll just wrap up with a bit of an overview of some resources that are out there, the typical kind of resources you use when doing machine learning. So let’s start with AI you know what What is it about? So why are you here?
I don’t mean that in an existential sense I mean you’re probably here because
you’ve heard something about AI and you want to know a bit more. More than likely
you might have heard some sort of news story possibly about self-driving cars
or even worse some stories about killer robots and how they’re going to take
over the world and replace humans or perhaps, a bit more specific to this
profession, about AI and medicine and how you’re all out of a job. I mean hopefully
Tareq’s talk this morning might we’ll know sort of reassured you on that
aspect and I’m gonna I’m essentially going to
do the same the one thing all these stories have in common is an awful lot
of hype and that’s what I want to sort of unpack a bit with the rest
of my talk. So we have the headlines the reality behind these headlines is a lot
more simpler perhaps a lot less sexy and a lot less interesting but the hype is
worrying and the fact it’s so interesting or so it’s also worrying that even
researchers in the field now are beginning to get concerned about how the
image of AI is being portrayed to the general public. So you get talks like
this from a professor of computer science at Princeton sort of giving guidance on
how to spot the snake oil from the non-snake oil in AI and part of the reason
for that is a lot of what is presentative as AI is in fact just a
very simple computation. When you lift off the hood but find out what the
culprit is underneath if it turns out that it’s just a relatively simple
amount of computation and that’s what I want to get across today a lot of AI
that you all read about is a simple mathematical computation and part of the
problem of how a AI has been presented that difficulty in the fact
that a lot of AI is relatively simple computation but in the press it gets
presented as something much more stronger or radical is there are
actually two general types of AI that we normally refer to.
We have what’s known as weak AI and strong AI. So strong AI or general intelligence (AGI) is where
we’re trying to use computers to build systems that mimic the whole range of
cognitive functions that are human has. In other words things like the
ability to reason, the ability to adapt to many many tasks with limited
information, the ability to reason by analogy. Obviously, a very laudable aim,
but most of the examples you see in the press in those stories are actually
examples of weak AI or narrow intelligence and these are situations
where we’ve designed a computer system to learn to do just one specific task. Typically it might be something like
make a prediction make some sort of prediction or make a diagnosis based
upon an image input. That’s a very different thing, doing weak AI or
narrower AI focusing on one task is a very different thing from being able to
sort of display general intelligence. So when you see these stories about
computers or Google being able to predict breast cancer more accurately
than doctors or detect retinopathys more accurately more accurately than
doctors you’re actually seeing examples of weak AI. It gets presented in the
press as essentially as strong AI computers are now going to take and take
over the world. Very much not the case. Okay. You can’t take a system which is
designed for one task and easily drop it in and expect it to do very general
tasks. That is a capability that essentially only humans have at the
moment. So let’s just sort of recap a bit. We started off with this general AI
field and what we’ve said is actually the bits of AI you see in the press are
weak AI they’re very narrow AI. It’s actually what a, you know, so it’s we’ve
reduced down our scope it’s actually slightly even narrower than that. Most of
the examples as I said in the you see in the press most of the stories are
actually stories about a form of AI of a subfield of AI called machine learning
and actually most of the stories the big stories, the sexy parts of machine
learning that you read about are examples of deep learning. So most of the
stories are actually focused on one small narrow bit of AI. So what I’m
actually gonna talk about for the rest of this talk is just this part. I’m not
even going to talk about general AI I’m not going to be talking about just even
the broader weak AI. I’m gonna be talking mainly about machine learning and
explaining that because that is the what is behind the stories that you all be
reading about. Okay so let’s delve into a bit of machine learning.
What is machine learning? In a very loose definition machine
learning is an attempt to use historical data in order to be able to make some
predictions about future events or future scenarios, future outcomes and the
way we do it is we start with some historical data we go through a process
of training to build a mathematical model. So here I’ve got an example of
some data I’ve got some x values and some y values each little black dot
represents a data point my machine learning model is actually just a
mathematical function that tries to describe the main patterns in that data
set. In this case the pattern in the data set is relatively simplistic. It’s just a
general trend upwards. So in this case my machine my machine learning model is
simply just a curve and the process of training that machine
learning model is to try and adjust the the curve to get the best description of
the data. So in this case the red line you know it’s not a particular good fit throughout that data set so I haven’t got a particularly good model there. The blue one isn’t very good either but for a process of iteration I can adjust that. I can use the historical
data to come up with a better model. If you’re wondering what this symbol here
is that looks like an oil drum, it’s not, it’s a computer scientists use that
symbol to represent a database or a data store. So that just means a collection of
historical data in other words it just means the the data points here and what
we all do is we will use some mathematical process to just adjust that
line and ultimately we may come up with something like that, the Green Line. It’s
a much better description of that data set It picks out the general behavior in
the data set it’s picking out the main trend That’s what we would refer to as a
trained model. Okay. So what I can do now I’ve got that trained model is I can
take a new data point, for example, I could take a new image and the inputs
into the model might be the individual intensities at the individual pixels or
it might be some sort of summary quantity that I have distilled from that
image, so for example, in this case we’ve only got one input we’ve got one axis so
I might measure some particular aspect from that image and I would read off on
the axis on the x-axis see what my model says and just read across. So the trained
model once I’ve gone through that training process once I’ve gone through
that process of adjusting the parameter of the model once I’ve gone through
adjusting the parameter model which in this case is just how quickly goes up
that allows me to make predictions. Okay The reasons computer scientists get
excited by this is well it looks essentially like we’ve motion to do so
we’ve learnt from data we’ve learned a useful functionality there to be able to
make predictions. The other thing that we can do is we can automate the whole
process. Which means we can make that process of training or virtually
automatic and we can make the process of prediction virtually automatic, so when
we have situations where we have got lots of new data points, possibly in tens
of thousands, where we need to make predict new predictions for we can
automate that whole process and making those predictions and make it happen
very fast. So we can do things that’s a very large scale. Okay. I know what you’re
thinking you’re looking at this and going I mean I came for a talk for AI
and just showing me a simple curve through the dataset, surely AI, surely
machine learning, is it’s a lot more complicated than that, surely there’s
there’s big conceptual differences between training a neural network and
just fitting a simple trendline through a dataset. Well not really, not in
principle. Yes when we come to more complicated data sets we may have more
inputs so for example here I’ve got an example with two inputs. The reason I
haven’t drawn an even more complicated example with more than three inputs is
humans aren’t very good at visualizing things in a more than about two or three
dimensions so I just got a 2d example there but let’s go with this
with this 2d example I may have much more complicated patterns in my
data, a much more complex variation in my data but it’s still the principles of
training a machine learning model are just the same I have a mathematical
function that is possibly capable of describing or following that variation
in the data and I just I adjust the parameters in my model to make sure that
that surface goes through my data. Sometimes the data sets we have to work
with have lots of variation ,they are very complicated, in which case we need
mathematical functions which are capable of having lots of oscillations, lots of
complexity to them. Sometimes we work with datasets which are very simple and
have very little variation but still the principle is always the same, we take
some historical data, we take a mathematical function and we adjust the
parameters of that mathematical function so the shape if you like the slope the
variation in that mathematical function passes through the data. That’s the
process of training any machine learning model. Once we’ve got that we’ve got a
trained model we can start to use it for prediction Okay. So I’m going to delve a
bit more in detail into this machine learning. There are actually a number of
different types of and machine learning broadly you have also known as
unsupervised learning, supervised learning and reinforcement learning.
Actually a lot of this sort of interest in machine machine learning over the
last couple of years has actually been around this reinforcement learning but
I’m not going to explain any of it today because it’s a bit more of a difficult
topic to explain and in the interest of time I’m just gonna leave it alone so
I’m just going to sort of explain more about the unsupervised and supervised.
When we do things like image classification. When we build models to
make diagnosis from images it’s actually the supervised learning that we’re
actually talking about but let’s start with the unsupervised learning. So
unsupervised learning, we have data, so for example, I might have a number of
samples, maybe taken from patients, each row representing a sample
and I might have a number of variables number of things I’ve measured about
each patient. These are called features in machine learning, just think
of them as variables. So for each patient I’ve got four numbers represented by the
four columns. In unsupervised learning we’re essentially trying to discover
structure or patterns that are already in this data, that are present in this
data, so for example, I might have this data set here but if I look at this data
set in a particular orientation I might be able to see that it splits naturally
into two groupings, two distinct clusters. If the clusters aren’t there I won’t see
them but if they are there then these unsupervised machine learning algorithms
will find me the best orientation within the data in order to actually reveal
that distinct grouping that structure that is there. No matter, it might be
useful to me because that immediately tells me, wait one minute, I’ve got two groups
here, they’re very separated from each other that might be an indication of
some sort of stratification in my samples So assuming I’ve got some useful
variables, some useful features that I measured, some you know, useful uh you
know biomarkers, clinical features that are actually extracted from my
images. Then this tells me okay there are two distinct patterns of
patients or samples. This may lead to differences in diagnosis, this may lead
to differences in prognosis, I don’t know, but certainly there are two distinct
groups. It also allows me to simplify the data a bit. So I started off with four
dimensional data, we’ve got four measurements on on each patient, four
values but when we orientate the data in the right way, we can see that actually
it separates into two different groups in a nice two dimensional plane, in a
plot, a two dimensional plot. So what the algorithm has done for me is actually
discovered two new variable two new features which describe all the
interesting variation in the data. I don’t need four numbers to describe
these patients, I only need two and that’s what the algorithm will do for me.
It will discover what those two new variables are. Okay. That’s unsupervised learning. We can actually apply a lot of these ideas to
non tabular data. So even if you have data which isn’t in a in a table form, so
if you have things like text data, so data from say electronic health records,
as long as we have some way of comparing say one sentence to another. Then we can
do things like apply these clustering algorithms to identify similar documents,
similar concepts, similar terminologies similar topics. So again we can do
actually all this a lot of this these algorithms on and what we call as
unstructured data. Okay, so let’s move on to supervised learning. Exactly the same
data set as I had before but now I’ve I’ve got some additional information. I’ve got
a label associated with each row. I’ve got information about whether that row
is orange or blue. Now that may not sound very useful information, knowing whether
a data point is orange and blue. It’s a binary label. That could equally be 0 1 or it
could be no diabetic retinopathy. diabetic retinopath. It’s a
classification but once I’ve been given that information,
the machine learning algorithm, supervised machine learning algorithm
will try and learn that mapping from the features to the actual outcome. It’ll try
and work out which of those variables put you in the blue group or in the
orange group. Why is that useful? Well obviously once you’ve worked out their
mathematical rule to, you know, go from those numbers those input measurements
to a grouping, essentially you can make that diagnosis. You can predict what
group someone’s going to be in. Okay, the reason we call it supervised
learning is because we’ve got this extra information, these labels in that
historical data, the algorithm is being guided by those labels. We’ve
taken the input data and we’re trying to find what mathematical function
helps us separate oranges from blues because we know some, because we’ve got
these historic examples of people who were in the blue group and people who
are in the orange group, that information that those historic examples are guiding
us. Okay, so that’s what supervised learning means. It’s being guided by the
previous historical examples of you know these people over here will blue these
people over here were orange. Contrast that were the unsupervised learning, in
the unsupervised learning, we just had the numerical values here we had no
additional information on top of that we had no labels to sort of guide us to the
structure we just discovered the structure which was there. So supervised
learning is about being guided by some additional information, by labeling of
each data point again as I said once you’ve built the model we can use it to
make predictions about new data points new patients that we haven’t seen before. The other form of supervised learning
that you all may come across is where instead of having a map trying to learn
a mapping from features to an outcome we are learning a mapping from features to
a value. In other words something continuous rather than a categorization,
you know, blue, orange or yes/no or diabetic retinopathy, no diabetic
retinopathy. We’re trying to actually learn and mapping to some continuous
value, something which can vary over a range of of values. So in this case it’s
represented, what we’re trying to learn is represented by this additional column
called the response variable but essentially the principle is exactly the
same that’s with the classification in the case. We are using that extra
information the response variable to learn the mathematical function and the
mapping. What kind of things do we use to learn that mapping? Well anything that’s
capable of fitting some sort of function for a data point and there are lots of
different mathematical functions you can use. If the variation in your data is
relatively simple, then you might use something like a linear model, that you
might be familiar with, from classical statistics. So have you ever used
something like Excel, when you’ve fitted a trendline, a linear trending
through your data, then you’re doing something like supervised
learning, you’re doing the regression on your data set. Where we’ve got a lot of
variation in our data, where our data might be going up and down and having
them display a lot of complexity, then we might want to use a different
mathematical function, something which is much more flexible and maybe a little
much more generic. Neural networks are very good at this, they are very flexible
and capable of being applied to lots of different kinds of problems. One of the
downside of that flexibility is sometimes neural networks can appear to
fit patterns in the data which aren’t really there, they are actually fitting
to noise. So when you’ve got this extra flexibility sometimes you have to be
very careful in how you use it. So I mentioned at the very beginning we go
for this process of training neural networks and machine learning algorithms
I’m just going to explain about how this actual training process actually works.
It’s an iterative one and it’s a also very computationally intensive one,
most of the time, but what I mean by iterative is we start with some sort of
guess and then we try and improve it. So what I’ve got here is a another toy
example again each point representing a data point and the red line represents
my initial guess for my machine learning model. In this case my models ,again
extremely simple, it’s just a straight line that may look like it’s a pretty
good initial guess but I can possibly you know, I might be able to improve upon
that initial guess but how do I go about that process of improvement. Well the
first thing I do is I’ve got to come up with a number which tells me how good my
guess is and the way we come up with a number for how good this line is through the data set is well we can see this point is done pretty badly the model predicts
about there the data is down there I just measure the difference that’s the
error that model is currently making for that data
point. To come up with a single number of how good that model is for the whole
dataset all I do is just add up all the errors across all the data points.
Well not quite. What I actually do is add up the sums of
the squares of errors because if I added this error which is negative to this
error here. I would get pluses and minuses canceling each other out and I could have a model which came up with what
looked like zero error but actually passed nowhere near any of the data.
So that’s where we actually just add up the sums of the squares but ultimately
adding up across the whole dataset allows me to come up with a single
number that tells me how good that red line describes that that data set. So the
process of improvement goes something like this. I start with some initial
guess, so for example, I might start with the Green Line is my initial guess for
my model and you can see it’s pretty poor, its misses most of the
data points, so it’s got a high total errors Its way off or I might start my initial guess here with a blue line again
equally bad and for the process of adjusting the parameters of this modern
occurs and this case is only one parameters just the slope of that model.
I eventually might end up with the red model and that one has almost got the
lowest possible error, notice it’s not quite zero but basically the process of
training is the process of adjusting the parameters in my model, in this case of
slope until I get the minimal total error. Okay. To find how to get down to
that minimum point, well I just follow a slope of this curve all the way down.
This is known as gradient descent and virtually every machine learning
algorithm uses some form of gradient descent to try, you know, in its process
of training. What I’ve described there that process of starting with, taking
a mathematical function, defining some sort of total error and then
adjusting the parameters in the model until I get to the minimum of that total
error. That’s a very, very generic process. In
fact is so generic I can apply it to, as I have said, to any machine learning model or
training, any machine learning model that that I want to and consequently you
can even apply this approach to training neural networks. In fact this is how
effectively neural networks are trained. There’s a lot more, if you like, technical
details and rules of thumb that go into training a neural network but
essentially this is what we are doing. So what looks like complicated processes
that you read in the news stories about AI learning you know learning how to
replace clinicians in diagnosing breast cancer, diagnosing, you know,
retinopathy. No, it’s a very simple process it’s a mathematical
function with some process for adjusting parameters in that mathematical function
until we get a small total error on some historical data set. That is what you’re
actually reading about behind those those headlines. As I mentioned, they can,
this model training process can be applied to neural networks so I’m now
going to actually dive a bit more in detail to what actually a neural network
is. Anybody come across a neural network before, used one, other than Tariq?
Okay. So neural networks are a particular machine
learning algorithm. Again they just take inputs and give us a number and output
at the end. The reason they’re called neural networks is they were inspired by
the study of biological neurons. So not surprisingly when the sort of people who
started the field of AI and machine learning started about you know trying
to understand how do you arrive at intelligence. They looked for, you know,
what was going on in the human brain and tried to mimic the structures in the
human brain because they thought there’s something about that structure that obviously leads to intelligence. So machine learning and AI researchers came up with
these network structures. Initially they called them artificial neural networks (ANNs). Nowadays because they are so ubiquitous
in computer science and they’ve got so much attention virtually no one
actually uses the word artificial infront of them. So you just have people
referring to them as neural networks. Sometimes that can get confusing
particular when you’re talking in a conference about using an
artificial neural network to study some structures, some data that’s
been taken from, you know, from brain studies because then you’re
talking about a neural network and no one actually knows
whether you’re talking about a real neural network or an artificial neural network.
I’m gonna be talking about artificial neural networks and an artificial neural
network is just a mapping from some numerical values, some input values to
some numerical output value. So the inputs here, that are coming in here, they
might be the intensities that are at every pixel in an image. So in
Tariq’s talk this morning he mentioned about what an actual piece of
machine learning sees. It sees a number at every pixel, the intensity of that
pixel. So at input one we would have the intensity at pixel number one at input
two we might have the intensity, the value, intensity value at pixel number two
or it might be that your inputs are actually something else. Theymight be
some numbers that you’ve derived from an image or they might be information about
your patients, they might be some sort of demographic data, so for example, input
number one might be the age of the patient you’re treating, input number two
might be a binary variable yes/no indicating whether that patient has
received some particular treatment, you know, an injection or particular drug but
in all cases the inputs are numerical values and although the
overall mathematical transformation going from here to here is a complex one.
It is actually built up by a series of very very simple steps. So let’s take
this, what we call a node, here. What’s going on here mathematically is this
node takes all the values that are coming in these various input
points and just adds them up. So if these were pixel intensities this node would
essentially be adding up all the pixel all the intensities from all the pixels
in an image. It actually adds them up with some sort of weighting, so it’s taking
a weighted sum. Once it’s got that sum, it just applies a very simple mathematical
transformation, say for example, it puts them into a sigmoid function, which this
essentially squashes that sum down slightly and that gives us the output
value for this node. At the second node here we’re doing exactly the same. We’re
summing up all the input values with some weights, different weights in this
case, they take the weights of different values and we get an output value once
we’ve put it through that nonlinear transformation, that sigmoid squashing.
So we get a slightly different value out here and then we just repeat the process.
So this node here will take all the inputs from all the values coming out of
all these nodes and it’s a feed-forward process, so you read these networks
left-to-right. That’s all network neural network is
doing. At every stage the mathematical calculation is litrally just, add the
things together, do some very small minor transformation. That’s what a network, a
neural network is composed of. When we’ve got lots of these layers, so when you’ve
got lots of these hidden layers it becomes very very, if you like, deep and
it’s called a deep neural network. Nothing particularly dramatic in the
name. I said it was a mathematical transformation going from left to right
is a mathematical transformation just like our example where we were adjusting
the slope of other model. What we adjust in this case, our parameters in this case,
are the weights that feed into these sums. So those are the
parameters we adjust in order to alter the shape of the surface that
this mathematical function is capable of producing. Okay. I want to
unpack that neural network a bit more. Okay
If I had a very simple mathematical transformation. If I was looking a very
simple model. Let’s take our straight line example again. If we had our
straight line model our transformation would literally be take the inputs, add
them together, that’s what you do when when you fit a simple linear
regression, when you’re fitting straight lines through data you are just
adding the inputs up together. So if we were trying to represent our
simple model, our simple linear model in this network structure. We wouldn’t
have these hidden layers. We would just have literally connections straight from
the inputs to some output. That’s what a a simple model would look like. So let’s
now just focus on the the last layer of that Network. We’ve just got a simple
connection from this node, so from the value here to that output value. So
what that means is that last layer is actually just a very simple model but
instead of a very simple model almost like a linear regression acting upon the
original input, it’s acting upon some new values, it’s acting upon the
things that have come in here. So what this is telling us is that last part of
that neural network is actually just a very crude simple model that you know
sorts of things you could build in Excel. The rest of the neural network
which preceded it, the early parts of it, we’re computing new features, new
variables that enable this simple part of the calculation to actually be
effective, to work well. So the later part of the network is doing a very simple
calculation. The early parts of the network are learning new features. That
the simple part, this latter part, can actually use. So they learnt, the early
parts are in your network are learning new features, new things, which are going
to be useful. So that gives us a hint of where neural networks are useful and
when they are useful. Neural networks are very good where you have lots of data
and you know the data, the information, the data is relevant but you don’t
necessarily know how to combine the information in the data. So
when you’re working on a particular study, as Tariq mentioned this morning,
probably from domain knowledge from your years of training you will have an idea
about what’s relevant to the particular condition you’re trying to study. So you
might be working with, oh you know, looking at diabetic retinopathy and
you’ve got an idea from the, you know, about what features, what’s relevant in a
particular image. There are going to be some examples, there can be some situations,
where you’re trying to study and you know that the information is there
but you haven’t got that domain knowledge, there isn’t a necessary an extensive
literature around, looking the most relevant things about the image to the
actual problem and under study and this is where neural networks are good.
Neural networks are good at working out what those new features should actually be
but they do require a lot of data in order to do it. So neural networks are
great for things like this. Yeah I know, you know, came to a talk on AI, and
you never thought you’d be seeing pictures of cats and dogs. Okay. Problems
like this trying to predict whether that you know, trying to work out whether that
is an image of a cat or whether that is an image of a dog, it’s actually a study,
it’s actually a problem that was studied quite intensely intensively by computer
scientists in machine learning. Probably for a number of, couple of, a number of
reasons, first of all, if you’re needing a lot of data to train a machine learning
model well on the internet there’s an awful lot of pictures of cats and dogs.
The other reason is it just represents a very simple, simple problem and as with
anything when you’re doing research you start with simple problems. Now you and I
can look about picture on the left hand side and go that’s a picture of a cat.
I’m assuming no one doubts that, though that looks okay but if I
picked on someone I said right what was it about that image that makes that a
cat. Okay, you’ve got all the picture pixels you’ve got all the information
but what is it about the precise things the precise intensity values that
precise RGB values in each pixel that makes that a cat. No one could
tell me the sort of the mapping between the pixel values and catness. You
might get okay it’s got fur. Well the dogs got fur. It’s got two eyes. Well the
dogs got two eyes. Okay, it looks cute. Well define cuteness. Okay but we know
all the information is in there all we need to do is work, you know, have lots of
examples where we’ve got all the pixel values put that in your network and then
let the neural network work out what is the mapping between the pixel
values that make something cat-like. Okay So as I said it’s obvious it’s a cat, the
information is present, working out the mapping is difficult, provided we’ve got
lots of examples, training examples, a big historical data set then the neural
network can do that. Okay so neural networks are very very good at image
classification tasks, specifically there’s a very specific type of neural
network, which you know, which there have been very successful at these image
classification tasks. They are called convolutional neural networks or more
often are not, deep convolutional neural networks and I’ve got an example here of
the structure of a deep convolutional neural network. The deep bit, as I refe,
the deep part of the deep convolutional neural network as I said refers to the
fact that there are lots of layers lots of stages to this neural network and the
neural network is divided into two parts Most of the neural network is actually deriving new features, so it’s working
out that mapping from, if you like, the original pixels to what something, you
know, looks like, you know, what kind of you know, which of those pixels
are going to be useful. The last part of it the last layers of the neural network
just do the classification bit. So at this point we’ve worked out with new
variables which are going to be useful and from here to here we work out, we
take those new variables and we essentially work out, is it going to be a
cat or is it going to be a dog. So essentially most of the neural network is actually just computing the useful features, it’s
not doing the classification. The classification is just done by a couple
of layers of that neural network. The fact that these neural networks were
very good at image classification has in part driven this explosion of interest
in deep learning neural networks and not surprisingly they’ve been applied to
analysis of retinal images. So I’m actually going to go into some of the
details about how they get applied to the analysis of retinal images but to
start that okay I’m going to have to I’m going to talk
about something called transfer learning Okay
So transfer learning is where we take what we have learned when studying one
task and try and use that information when we are studying a smaller task. So
it might be that we’ve trained some neural network, a deep convolutional
neural network, on a very big large image database. As I said, most of the neural
network will actually be just deriving features, the last part of the neural
network is only used for the actual making the prediction, for making the
classification. It’s the early parts this first part it’s deriving things which
are useful for analysis of any image. What that means is if I train this
network here on a large database of images I can take that neural network
and I know anything that gets, comes out of this pot it’s still going to be
useful for other image analysis problems. So that means all I have to do when I
apply my plot one I want to study our different tasks, task B, all I have to
do is retrain just this small of it because I’m retraining a smaller bit
I’ve got less parameters to work out I need a lot less data to work with.
So whilst this problem might have taken something like a million images to train,
solving task B might only take something like 10 to 20,000 images to train
because I am freezing this part of the network because this part the network
has is about deriving features which are useful in any image classification task
and this is what we mean about transfer learning. We are transferring some of the
learnings we derived here to our new problem.
Okay.There are a couple of these really big pre-trained deep convolutional neural
networks which are publicly available people like Google have trained deep
convolutional neural networks on large image databases which are probably
available and then Google has made these the parameter values, if you like, the
neural network available for general use. So inception free is one example, there’s
another common pre-trained neural network available from an academic group
in the University of Oxford. Unsurprisingly both of those have been used in the
analysis of retinal images. So people have taken those starting neural
networks which were trained on things like pictures of cats, dogs, buildings,
bananas, pickup trucks, mountains etc and using that and they’ve used those
networks as starting points to train on images of, you know, OCT images etc to try
and predict whether someone’s going to be have diabetic retinopathy. So you’ve
taken a network that has trained on a very general task, an image classification
task still, and then adapted it to a retinal problem. I’m not aware of any
group that has built a deep convolutional neural network from
scratch on a large database of retinal images and made that available for the
public use. So you may say, well why are they why were these groups starting off
with with a neural network trained on pictures of cats, dogs, you know, cars and
mountains etc that’s because those were the only large data neural networks
which were available. These are the ones which are publicly available. Right. I’m
going to just touch on a couple of things Tariq mentioned in his talk that you
don’t actually need to program if you want to get into machine learning AI.
No you don’t you just wanted to use the output, if you want to use up you
know the output from a diagnostic system no you don’t need to program. If you want
to start building your own models, yes, you probably do need to start
programming to some degree. There are a number of tools which do what’s called
Auto ML, automatic machine learning, which supposedly enable you to just tip your
data in, press a button and you get your machine learning model trained up at
the end of it. They are good to a certain degree, some of them are proprietary,
some you have to pay for, some of them are free to use. Personally, I find that
you still need to understand something about what is going on. If you don’t
you’re going to run into problems. So ultimately you’re going to have to
learn how to program to some degree if you really want to get under the hood
with some of these details and the two main programming languages used are
things like Python and R. R is a statistical language which a lot of academic
statisticians use. Right. I’m gonna just skip the interest of time skip to my
last couple of slides. One question I what sort of the common question I get
approached with when, you know, sort of dealing with AI machine learning is.
How do we, how do we build a neural network for this problem? How do we use
ML for this machine learning, for this problem? Well wrong question. The question
you should always be asking is should I use ML for this problem or rather what
you should think about is, if I built this system how am I going to use the
outputs of it. Are the outputs of that machine learning model going to be any
use because ultimately your machine learning model, the outputs of it, are
just part of some wider process and it might be that the things that happen
to the outputs of the machine learning model by the time it comes to this end
outcome, the benefits of the extra accuracy of your machine learning model
have been completely lost. So let’s say you were trying to do, I don’t know,
understand a care pathway or understand the impact of say an NHS trust level and
you’re doing that by building models of, making predictions at an individual
patient level, well extra accuracy at patient level won’t necessarily transfer
into extra level of understanding at NHS trust level. So think about how, what is
the thing you’re actually trying to understand and what you’re going to do
with the outputs of that machine learning model. So the first question
should always be: What am I going to do with the outputs? Do I need something
like a neural network here or could I get away with something simpler
and there’s something maybe a bit more robust? Okay. So finally the one thing I
want you to go away with: AI is not magic! Artificial intelligence is not magic.
Typically what you’re seeing are examples of weak AI and usually it’s
just actually a machine learning, it’s a fitting of some sort of mathematical
surface to this some historical data and the learning, is not learning in the
human sense, as I said, it’s just actually fitting that mathematical
function through some sort of trend line. Machine learning is
data and computer hungry. It’s very intensive, you need a lot of
data for most examples, so that’s why we have to make use of things like transfer
learning because often we don’t have big enough datasets to actually train the
kinds of machine learning models that we’d like to.There has been some
successes in machine learning, as I said, image classification and also things
like reinforcement learning but just because we are making successful strides
in the application of neural networks to image classification does not mean we’re
making very, you know, huge strides in general artificial intelligence. We are
not going to be replacing humans any time soon simply because we’ve learned
how to classify cats and dogs or we’ve learned to how to pick out features in
an in a mammogram. Okay and with that finally I will say
thank you and any questions

Leave a Reply

Your email address will not be published. Required fields are marked *