Faculty Colloquium: Dr. Rao S. Govindaraju

Faculty Colloquium: Dr. Rao S. Govindaraju

August 25, 2019 0 By Stanley Isaacs


– Good morning everybody. Welcome to our continuing colloquia on recognizing our senior faculty. Today, this is a program that was
developed a few years ago to, again, have our senior,
namely full professors, who have been in rank for
more than seven years, to have a chance to present their work, talk about their experiences,
talk about their, the way they got to where they are. And then actually once they
present this colloquium, they get a chance to talk
to the department head. Of course in this case, you know. But the dean for sure, and talk about the next seven years. And so, today, we have a great pleasure of having Professor
Govindaraju who is your, our department head in Civil Engineering. So, he got his PhD in 1989, from University of California, Davis. And before he came to Purdue, he worked as an assistant
and associate professor at Kansas State. He joined Purdue in 1997, and currently he’s the
Bowen Engineering Head of Civil Engineering. And Christopher B. and Susan S. Burke Professor of Civil Engineering. His primary area of research include surface and subsurface hydrology, contaminant transport,
watershed hydrology, and a statistical hydrology. And I think I will stop right there before I go any further. And so, we’re really excited to hear what he has to share with us today. G.S. – All right. Thank you very much Claude, and thank you all for being here. I think we have been to
several of these symposia. I have attended all the ones that civil faculty were presenting, and including some from
outside departments. So it’s been a good learning experience. One other thing that Claude points out is typically after this presentation, the person has a conversation
with the department head. So Claude, I promise
I’ll have a very stern conversation with myself. (laughs) This actually gives me a chance to sort of marshal my
thoughts, as Claude mentioned, see what I have been doing, take stock of where I am. And also perhaps try
and do some forecasting as to what I think I’ll
be doing in the future. I would like to actually also
mention that in this talk, even though as faculty,
we do a lot of things which fall in the category of discovery, of learning and engagement, or research, teaching, and service. I have focused more on the research part primarily because that is where almost all of my scholarship is. That is where I have done research, published papers and so on. I also perhaps should put the
word experiments under quotes to indicate that this is not
just physical experiments, but also we’re talking
about numerical experiments, theoretical experiments and so on. So, as Claude mentioned, my broad areas of research interest, surface and subsurface
hydrology contaminant transport, and related topics, but I’ll be focusing mostly on surface and subsurface hydrology in this talk. My research drivers, what I think of are typically stochastic processes. I look at scaling behavior, varied interference spatial heterogeneity, uncertainties and risk. So some of the topics
that I will talk about will have borrowed from these areas. And I’ll present a mix of experimental, theoretical, and numerical
work in some of these areas. And as I’m doing this, I’ll perhaps take some examples with some of the graduate
students who have worked with me, and I’ll try and recognize
them along the way. Okay. I also want to first
start by acknowledging some of the funding agencies, not all. I have been fortunate to have my research supported by
a diverse set of agencies, including some international
funding agencies. Even though I was very instrumental in preparing proposals and so on, I think also the funding
the came to me finally was for travel and me for
conducting experiments there. It did not per se
support graduate students and my summer support, but it did provide me
with lots of opportunity to do very interesting work. So I wanted to recognize them as well. So let me start with something
on pore scale mechanics. Those of you who work in porous media will perhaps recognize this. This is Jack Chan, one of my master’s and PhD students,
way back, very bright chap. And some of the things that he was doing were essentially using
mathematical morphological operations and image analysis techniques. For those of you in the geomatics area, you’ll be familiar with
this, when we look at images we can do operations
like erosion and dilation to essentially extract
features from images. And he was essentially doing this to study pore scale properties in porous media. And this is an example
of essentially one cube, so this is just an image, 3D image, with voxels with one pore. And he’s looking at what
happens at different pressures, how water enters the pore and how water comes out of the pore. And the red portion that you
see is what is the air phase. So if you have air and water in a pore, at different water
pressures air will actually go and invade the pore space. So it’s all saturated for
some it’s draining now, that means air is slowly
entering at a different pressure, as you can see, so the value
of d indicates pressure, how air enters that
particular single pore. And the reverse process, wetting, as the water pressure increases, how basically water will
come and invade the pore and eventually dry out,
but even with a single pore you are able to demonstrate
hysteresis effect, sometimes called the ink
bottle effect in hydrology. So this is for a single
pore, at a pore scale. What Jack then did was
essentially constructed a porous medium in the
computer, it’s an image, it’s a 3D image, where
essentially all the pore space is conceptualized as intersecting spheres. So any pore space that you get, I can approximate it as closely
as I want using spheres. And this is just one image of that, so the blue portion that you see is essentially the pore space, okay? And then, using again
image analysis techniques, what we were able to do
is show how a soil that is completely saturated will
slowly be invaded by air and become unsaturated. And therefore this allows
us to take care of not only ink bottle effects, but
how well the porous pores are connected and so on. Which is something that we
have not been able to do in previous methods, okay? So what you are seeing is
at different pressures, how basically air goes and
invades the pore space. And the reverse, when
wetting is happening, how water will enter back and drive the air out of the pore space. So these are fairly
fundamental things that we need for understanding sub-surface hydrology. So this is all in images first, right? So then, what we did was, how do we go from these
images to actually talking about soil properties? So as an example, this
is an image that is taken of a loam soil, so this is an SEM image, you can take MRI Images, using the actual image and then using the theory we developed, we were able to develop what is called a soil water retention curve, which talks about what the
water content is in the soil at different water pressures, it’s a very fundamental property. These symbols that you see, these are actually
experimental measurements. What you see are different,
all competing theories, but these theories,
the prediction is based on fitting the data that you have, whereas the theory we developed with Jack actually uses the image and
predicts what this should be. So these experiments are
fairly long and painstaking, to do a soil water retention
curve pressure plate apparatus can take as much as six months to a year to do one single graph like this. And from there we can
actually also predict what the hydraulic properties
of the soil will be at different levels of saturation. And what you see of course, these are the symbols, this is our theory, so from here to translate to here the other curves tell you what the existing theories would show you. So you can argue, well
G.S. you chose one example where your theory worked very well, so I want to say that
Jack actually did this for 120 different soils. So for 120 different
soils, basically using stochastic theories of
either impenetrable spheres or fully penetrable spheres
we can actually predict, based on when bulk properties
like the soil porosity, and the interfacial surface area, which are easily measurable, what soil properties should be. So this is… – [Audience Member] Professor,
what’s the difference? It seems like the same. – Yes. – [Audience Member]
Impenetrable and penetrable. – That’s right. So not much different. They are different models
but they’re trying to address the same problem. They’re a different way, or a
different conceptualization, of what the pore scale should be like. So the should be giving similar results. Fully penetrable spheres,
however, gives you much more realistic
representation of the pore space. So the model that I had
showed you of the soil, that was a fully penetrable spheres model. So basically from pore
scale to how do we get to bulk properties of soil. So from pore scale, let me move to another sort of experiments which I have been doing
with my colleagues in Italy. And this is perhaps one of my most longstanding collaborations. These are essentially what you would call lab scale experiments, lab bench scale. So this is about one and a
half meters by 75cm by 75cm. So we create sandboxes,
very well-controlled, as homogeneous as possible,
to essentially understand when rain occurs on soil what
is happening to the water. How can we quantify it. So we have various places
where we can collect subsurface water uses and
while its running down they are going to
different depths in these arterial probes that
measure soil water content. And we also did experiments
where grew grass on that surface to
examine different effects. This is Emily Anderson,
she was a masters student, in fact she was here sometime last week to re-visit campus and talk
about some of her experiences. So in her masters work,
and some other students, we did a lot of work over here. These are my colleagues in Italy, faculty members, staff member and so on. But I have spent a lot of
time over there, you know, doing these experiments. So the lab scale. So some of the results that we look at are essentially when the
rainfall event occurs how much surface water we are
getting and when rain stops how surface water essentially vanishes. What is happening in deep flow? That means as water enters
the soil we are able to collect it from below. And this is what the behavior
of that water looks like. How does the water content
change within the sandbox, to sort of understand rainfall
and run-off properties, and if we grow grass on the surface, so we will have a very
strong deep flow component, a smaller surface flow component and again water content’s being
measured in the soil. So we did over 100 experiments like this. We analyzed them and one
of the things that we found was our existing theories of infiltration, which we think we understand very well, they’re not adequate to explain some simple behavior over slopes. Okay, so some fundamental
questions are the mechanism of unexpectedly long recession. So for clay soils, essentially
once rainfall stopped we still collected quite
a lot of surface water, but as all theories would
say once the rain stops within a couple of seconds you should not be collecting any surface water. We also found that our existing theories do not explain very well what happens when water is moving
over a sloping surface. So all our celebrated
theories that we have, none of them were actually
explaining the data very well. So this is something that
we are still working on, and still trying to explain this. Some other experiments that we conducted, or we continue to conduct in Italy, through my collaborations
is from the lab scale we now move to what we would call a small plot or a field scale. So this is a nine meter
by nine meter area, where we have simulated
rainfall experiments we can do. We do natural rainfall experiments. We essentially collect surface flows, we also have soil moisture
probes which tell us how the water content is beneath the soil. We are also able to
catch the deep drainage and work with that. So a much larger scale. And Richa was a PhD student. So one of the things that
Richa was interested in, where she used some of this information, is the problem of scaling. Scaling, I must tell you,
means different things to different people. So in this context, what
we are looking at is say this is the plan view of
the soil that we just saw. Nine meters by nine meters. We know that soil hydraulic
properties tend to be highly variable in space. So this is just a conceptualization,
these different colors show the amount of variability in this supposedly homogeneous soil. The saturated hydraulic
conductivity typically varies a lot. And trying to essentially
determine water movement over such a heterogeneous
surface for a rainfall event is fairly complex. So if we want to run a numerical model, it would take a lot of
effort, a couple of days on a very powerful computer
to do one rainfall experiment, because you have to model
the heterogeneous soil. So in the scaling behavior
what she is looking at is a problem which we are
frequently facing is what is, how is the water content
or moisture content of the soil surface changing with time. So if we knew the behavior of how saturated water conductivity varies, let’s say M1, M2, M3 are three locations where you have soil probes,
soil moisture probes, where you are measuring the water content. And for a rainfall event,
we essentially have data which show how the water
content change with time at these three locations. And because they are spatially variable, the way that they change in
time will be very different. So the idea of scaling is, if we have this kind of information, are we able to use the
physics that we know to collapse all this
into one reference curve, and, having the reference
curve, are we able to then determine at some unmeasured location, if I know just the saturated
conductivity’s location, how water content will change with time. Just from this reference. So not having to solve the
full surface flow equations which are extremely complex. So that is scaling. And I made the problem a
little simpler than it sounds, she did that, she went back to that field. And this is what essentially
her scaling results look like. So at three locations in that field we had measurements of surface waters, soil moisture content. And then what she does is
she uses the measurements at let’s say two locations
to essentially predict what it would be at the third location. So the symbol are
essentially the measurements, and the green line is the scaling model. So, similarly, if she’s
trying to predict over here, she uses the measurements obtained at these two locations to predict what it is at a third location. If you are able to do
that, it’s a huge savings, because, as I said, trying
to do this numerically is a huge challenge. And when she’s doing these predictions, what you see is a scatter
plot which shows her several different events
how well we were able to predict this quantity just
through scaling relationships. So this is more like a plot
scale kind of analysis. Then, she further worked on the problem of aggregation and disaggregation, which is also fairly important for us. The problem of aggregation
essentially says, well if I have these three measurements, and I have a scaling theory, can I predict what the field-scale average
soil moisture would be? Because at that scale
that is of interest to us. And what this show is, for
different kinds of cases, if you have numerical results
we can use those and see how well the average is
predicted by our scaling problem. The reverse problem is if I’m
given the field-scale average, can I then use that to
predict what is happening to soil moisture at any given location. Because this is the problem we face, when we do remote sensing, we
are sensing a very large area and getting one average value for that, for ground truth, however,
we go and measure at a point. So how do we reconcile
between these two scales. So when she does this
disaggregation with her theory, you can see that we struggle,
we do fine but once rain stops it’s very difficult to get those to agree. So disaggregation; that
means given the average to predict individual behavior, that is a much harder problem. So this was you know, more like
lab scale results and so on. Let me to watershed scales. So for watershed scales,
I’m going to start with talking about Latif Kalin’s work. Latif was also a PhD student here. He was looking at a very
interesting problem. So this is essentially a
map view of a watershed, a watershed is an area where, essentially, when rain falls through
the stream network, it funnels all the water
and essentially moves them downstream through the stream network. So when we do management strategies we look at watershed scales. And watersheds are divided
into sub-watersheds, and within each sub-watershed we assume that properties are homogeneous. And then we try to model the behavior. What we typically have
to contend with is that our measurements are usually made just at the watershed
outlets, and measuring flow, we are measuring how much
sediment is going on, a very important problem for
us is, with this measurement, let’s say we have
sediment we are measuring, where is it originating in the watershed? So can we do that back calculation? So this inverse problem tends to be very difficult to deal with. So if I use one rainfall event,
so this is a rainfall event where we had rainfall and then we had sediment essentially come out. What you see are the model
results, which is a solid line, the circles are the
experimental observations. And then is another event, this
is another event and so on. So if I use each event
and try to use this data to figure out where the
sediment would have originated, with each different event
we get a different answer as to which area was
contributing to sediment. So that is the nature of
the experimental data, we have the fact that our models are not perfectly accurate, and of course that the inverse problem tends
to be an ill-posed problem. So what essentially we
conceptualized with him is we will treat the
sediment-generating potential of each of these sub-areas
as a random variable, because that’s the only way we could get our minds around this problem. And so with each experiment,
the values of let’s say the erosion potential that we
get in each of these sub-areas is one value this random
variable is taking, one realization. Once you are able to do that, then if we have many of these events, we have many samples of
these random variables, and then we can use
statistical methods to compare how significantly different
one area is from the other in terms of its
sediment-generating potential. So basically had to re-think the problem, think outside the box
to be able to address, but we do need many rainfall events to be able to do this well. So a lot of data to be
able to do this well. And that’s usually a challenge for us. Mazdak Arabi, another PhD student, also working on watershed-scale problems, he was doing optimization studies. So when we do water
quality in stream networks, we are very much
concerned about, you know, what is the status of the watershed, what is the health of the watershed, are water quality standards being met? So for sediment, 20
milligrams per liter let’s say is the concentration that
EPA or some other body says is what should be acceptable. If you go beyond that we
are violating a standard. One way to address these kinds of problems is we essentially have
best management strategies that we place in the watershed, either in the planned
areas or in the stream. And these essentially help reduce the load that is coming out of the watershed. And we have various options for these, grass waterways, wetlands,
parallel trellises and so on. So one of the things Mazdak
did was essentially use optimization methods to say
how they should be placed in the watershed to obtain best results. Best results either in
terms for a given cost how to distribute them to obtain
the lowest concentrations, or if we want to meet a
concentration standard, how to essentially place the BMPs
to achieve those standards. And again, so very large
optimization problems, also fairly challenging. So some of the questions that we have been interested in are listed over here. So what role the best
management practices play, how do we use water quality data to assess the overall health of
a watershed and so on. So this is another
basically sub-watershed, this shows you what the land use is. And he used fairly advanced techniques, like generalized likelihood,
uncertainty estimation, regionalized sensitivity analysis, tree structure density
estimation, and these are fairly advanced concepts for the time that he was working on on these problems. But what they would
allow you to do is take this very large optimization problem, but give managers an indication
of what kind of practice should be placed where in the watershed to achieve best results. But still, a fairly complicated problem. Then I want to talk a little bit about a larger scale than
watersheds, regional scale, state level, country level. And here, this is Shivam Tripathi, another of our PhD students,
his main focus was essentially how to engage uncertainty that
we have with measurements. And this is a very
important problem for us. In this case he basically was working with the latent variable approach
in a Bayesian framework, using graphical models such
as Hidden Markov models, we will talk about this. And the idea is encapsulated
over here: that measurement is always an approximation or
estimate of the measurand. We measure something, if we
know our instrument well enough we also have a measurement
error that comes along with it. A lot of time we leave the
measurement error alone. And this particularly
a problem in hydrology. So I’ll give a couple of examples. So sea surface temperatures,
so those of you who are into global circulation models
and how we do forecasting and so on, one of the primary imports to all these large
models that work on this is sea surface temperature. So El Niño, La Niña, they’re all based on sea surface temperature values. So sea surface temperature
is a very important boundary condition, it influences atmospheric
variability and so on. It is used for long range
climatic forecasting and general circulation models,
in climate change studies. And what this picture is
trying to show you is not the sea surface temperature
but the uncertainty associated with the sea
surface temperatures. So over time, sea surface
temperature data have been measured or estimated using
remote sensing platforms, through ships passing through,
they measure temperature, buoys that are placed in water, they gather temperature data. And this is showing you
four snapshots in time, May 1850, 1900, 1950, 2000. What you can see is how
the density of measurements has changed with time, but what this is also showing is what is the variability that we have, what standard deviation, how much error we associated with each of these sea surface temperature estimates. So we have this information, but currently none of the GCMs use that. None of the models use that,
it’s too complex a problem. Similarly if I look at other data sets, we did quite a bit of work over India. So this is essentially rainfall data, \ so that data when it’s generated,
when it’s made available, the error or signal-to-noise
ratio is also provided to us, but nobody uses that information, people feel it’s too complicated to deal with the uncertainty information. So this was essentially, or
has been, continues to be, one of the focus of Force I. What we did was develop models which would explicitly account for this uncertainty. And these are graphical pictures. So in very simple terms, graphical models, we have essentially an x variable, we can shut the model,
we have the model error. We use Bayesian non-linear
principal component analysis or noisy principal component analysis, principal component analysis
to reduce dimensionality. Or this is RVM, V and RVM,
relevance vector machines. Again, variable, noisy
relevance vector machines. These are essentially for
regression and BN correlation, Bayesian noisy correlation to essentially do correlation studies. I guess the important thing
with each of these is, with the variables that we are measuring we associate an error, but we
assume that we know the error. And then how do we implement it? So these are actually fairly
standard things that we do in statistical hydrology,
in fact all of us do it. When we do correlation, you fit
something for y which has x, how many of us actually use
the error information is x if it’s available, or
the error information y, if that is available? If you did have that information, your strategy for correlation
would change a lot. So I’ll show you some
examples of how this works, and in machine learning,
what they have is if you want to do something, they
give you bench data sets, benchmark data sets. You have to show your algorithm,
how well does it perform on these benchmark data sets. So, for example. This is the sinc function, the
sinc function looks like this a solid blue line. What is provided to us, so
that is what we are trying to reconstruct, if you will, what is provided to us are these symbols. So these are the measurements, and they come with a lot of error. But this is the original
function that they were supposed to represent. What we do know is the measurement value and the error associated with it. If I use relevance vector machine, which was state-of-the-art,
it’s a very good technique, this black line is what
I would reconstruct from these error measurements
as the true signal. But, if I can use the variational noisy relevance vector machine, which now incorporates the
error in this data explicitly, then this green line is essentially what I would reconstruct. So the fact that I’m
given error information helps me greatly in
reconstructing the series. So if I have missing values
and so on I can do very well. So these are benchmark sets. This is another benchmark set
that we have to worry about when we deal with data. This is the actual data,
this is the image that I would be trying to reconstruct. What I am given is noisy
and incomplete data, not only are there errors,
there are gaps in the data. I need to fill this to be
able to do my analysis. So probabilistic principal
components, DINEOF, regularized EM, these were
the state-of-the-art models. And this shows you, if
I apply these methods, how well I can reconstruct this image. But if I’m able to incorporate
the error information which is provided, then the
method that we came up with reconstructed much, much better. Another example of a benchmark data set is when we use dimensionality reduction. So this is essentially a
data set that was created by essentially giving 100 examples. The data is supposed to
have, it’s 20 dimensional. So there are 20 points in this direction, it has only five independent vectors. But it has noise, and so
when we do data reduction we want to be able to extract this. If I use standardized
principal components, or probabilistic principal components, this is what I extract. If I use Bayesian noisy
principal components, we get those exact five,
only those five vectors back. But that’s because none
of these methods would ingest the uncertainty information. So we basically leave it behind, and I think we can do so much better. So let’s look at some actual data sets. So this is essentially over India, this is the All India
Summer Monsoon Region, GCMs will provide you
data on all these grids, and GCMs also do all sorts
of ensemble averaging, which means many GCMs are run and somehow their average is taken. Computationally very intensive. And this is our state
of the art, right now. So if we do that, this is
essentially trying to show you how well it works. This is time in years, this
is the rainfall anomaly, and the box plot and the spread is essentially from the ensemble, the observed values are
essentially the crosses. So even the GCM ensembles
we don’t do all that well. If we use relevance vector machines, we don’t do great but
we are better than GCMs. And that is reasonably well-known. GCMs are still very complicated, difficult to do prediction with those. Some other examples, if
I’m trying to forecast what is happening let’s say for All India Summer Monsoon
for the month of May, our existing methods would
give this as the forecast, so where this red line is observation, the blue is the mean of our prediction and this gives you an idea
of what the spread is. With more advanced methods you get perhaps a slight improvement. The table below shows you
what the error statistics are. I should also point out that
when we go into testing phase our performance is actually not great. It’s pretty weak. But really that is our prediction skill, with the best methods possible. Ganesh is sitting right here. He works in Hidden Markov models. So you know when we talk to our
phone when you talk to Siri, the speech recognition
software is actually a Hidden Markov model, or it used to be a Hidden Markov model. Now they have more advanced
deep learning techniques like long short term
memory units and so on. But HMMs were used. So what we use them is we actually observe let’s say rainfall as a time series. We want to predict droughts, or we want to be able to
characterize droughts. So we treat droughts as
hidden states, not observed. What we are observing is rainfall, the hidden states are droughts, and then we use the Hidden Markov model to essentially characterize
these drought states and do a probabilistic classification. What probabilistic classification says is, if I look at my phone it says
20% chance of rain tomorrow, and I make a decision, should
I get an umbrella or not, should I wear a coat or not? For droughts, that is not the case. You go to the US drought monitor, it says you’ll have a D2 drought. D2 drought is a drought of
a certain level of severity. D4 is a very severe drought. But it doesn’t tell you anything about what percentage chance, it just says D2. They could be off by a wide margin, but you have no way of knowing that. A 20% chance or rain at
least gives you an idea of what to do with it, if
you are just going to say it may rain tomorrow,
what are you going to do with that information? So probabilistic classification
helps us with that. And this essentially shows
you a little about the model. And this is just an
example of how it differs from the standard techniques. So what you are seeing
over here is essentially that’s the rainfall series,
in both cases,the blue line. This graph essentially
is a probability scale and it shows that each different year, that the standard method would give you one drought classification. So basically over here for this year, it’s moderate and that’s
with entire probability one. Whereas if we use a
probabilistic classification at each time, the height
of the bar tells you what probability will
belong to each class. So your prediction may be, well we are in moderate
drought with percentage, it could be a severe drought
with this percentage, or it could be a mild
drought with this percentage. That is much more graded information, which watershed managers then can use to divert resources more confidently. It also shows you the differences
that you are going to get between the two methods, because they come form different ideas. If the precipitation is very low, we should be actually thinking
of a very extreme drought, which standard method may
not be able to capture. So there are some nuances
that we deal with it. Similarly let’s say we
are trying to predict extreme droughts in India. Our standard method, by definition, must give us uniform value everywhere, it doesn’t give you a chance to say this area is more prone
to droughts than that different area. Those comparisons are
not available because this method was not designed
for that comparisons. However, in some of these
more advanced models we can show that some parts,
northwest part of India, is more prone to droughts. Another example in the
monsoon-affected regions, is to start a monsoon
breaks and active breaks, active spells and so on,
or breaks in the monsoon. And we have online and offline methods. The online method is what we propose, which says after we have sort
of figured out the model, as new data becomes
available it keeps updating. So it is basically useful to
do a continuous prediction. Whereas the standard methods that we had, the offline method, you would have to give it all the data at once
and let the model decide. So you do not know how well
it performs on unseen data. Coming back to Indiana, Shih-Chieh Kao was another
very bright student. We were working droughts in Indiana, and this is an example
of where we used again very advanced statistical techniques, copulas for joined behavior and so on. And many of you will perhaps
remember the 1988 drought, that was a very severe drought. So some of the results that we
were able to obtain for use, so the state of Indiana
in such a deep drought, how much rainfall would be needed to get to normal conditions? So most of the state would have required seven inches of rain, it’s
very difficult to get. What we were also able to then say is, what is the probability of
getting seven inches of rain? So basically between 0.1 and 0.3. So very little chance of
getting out of this drought, because we need a lot of rain, our probability of recovery is very small. And we were able to do
forecasts for one month, six months and so on. Very useful for water planners. Meenu Ramadas is another PhD student, she was also working on drought, she’s talking about drought precursors. How can we use our existing knowledge, what we know right now, in
terms of various variables like soil moisture,
precipitation, run-off and so on, to say what kind of a drought
we will get in the next month. And what these graphs are showing is, for different variables, let’s say if I take the
month of March, for these three variables at least
there is some gradation, this is going to be a very severe drought, where it says a mild drought. Other variables like
evaporation, wind speed, sea level temperatures, they do not have enough
resolution to tell you what kind of a drought you will get. Because they don’t contain that much information about droughts. So this was part of Meenu’s work. We also, one of the things that we look at is when we have these variables and we are trying to do forecasting. So this is where let’s
say the calibration data, let’s look at the validation data for each of these variables. What this scatter shows is
that our predictive ability is actually very weak. Maybe 10% with each variable. So it’s very difficult
to make predictions. However if you have multiple variables, and you’re confident in each
of them as ten variables, and all these variables
are pointing towards the same direction, then you can combine their effect to get a much higher confidence level. And usually that is
what we have to rely on, because the processes are so complex, that working with a single variable, unless it’s an extremely strong predictor, you really don’t have much to work with. In which case you have to start pooling a lot of other knowledge to
make reliable predictions. So I’m goig to do a small diversion and take some time to
acknowledge the students. I talked about the work
of some of my students, these are essentially many of the students that have worked with me over time, and some of my current
students, graduate students, very important to my work, they have contributed
a lot to my learning, and some post docs and
visiting scholars also, as you can see over here. Several of the students, they
have all been doing well, several of them are in academic positions, some of them are professors, you’ll notice that one
person is a professor and a head elsewhere. So students are doing
well, and that’s great. I also want to essentially
acknowledge the students by listing some of the awards
that we have got with students. Three of my students have
got best dissertation awards, I’ll point to some of the more
let’s say prestigious awards with my students. Shivam Tripathi, let’s start with him. KDD is Knowledge, Data and Discovery, it’s one of the machine learning
prestigious conferences, computer science people go to this, and they had a challenge problem. And Shivam, we talked about
some of his algorithms, he was essentially awarded
a best challenge paper award with that problem. Shivam also was recipient
of the Alfred Nobel prize, which is a joint society award with ASCE, AIME, IEEE and WSE. All these societies get
together and pick one, and this is one of the ASCE awards. Shih-Chieh Kao got a best paper award which was decided by the
European Geophysical Union, which looks at all hydrology
papers in all journals, and picks one, typically based
on how well it has been cited and so on. So very fortunate to have
worked with many students who have done very well. Current topic. So some of the things that we
are doing, for instance, is, this is essentially the upper
Mississippi river basin, Ohio river basin and so on. When we do have all these
stations where we have flow and water quality data, water quality is very sparsely sampled. So what you see in this graph are, these are the symbols for water quality. Using these advanced methods
that we talked about, we can reconstruct that series, and we also have the error
information about it. Then, from that we are essentially
able to do scatter plots of how well we predict water observations. We’re also able to use this
water quality data to figure out what the resilience is of this watershed. In other words, how soon does it recover when a violation occurs? And, this is essentially a histogram, because we have uncertainty
associated with it, we get a histogram of resilience values. So this was also
something that Yamen Hoque was essentially working on. So if you have a watershed
we measure water quality at different stations, we measure different
water quality parameters, they all have different standards. So you may have alachlor,
ammonia, atrazine, total suspended solids, different
measurements, very sparse. So you can reconstruct the series, you can come up with a
composite water quality index and error around it to
essentially describe the watershed health. So a lot of work, I
think, that we are doing, is in reliability, resilience and vulnerability of watersheds
based on these concepts. Current topics, so you have two students, Abhishek and Anubhav, we are
now looking at how we operate in ungaged basins, where
we have no measurements. So we use some machine learning techniques from measured locations, see how well we can do to
predict what is happening at unmeasured locations in
terms of watershed health. And so when we test these
methods we have essentially areas where we do have
measurements, but we don’t use them, we only use them to see how well we do, and then we do a scatter
plot to get an estimate of how close we are in making
these kinds of predictions at ungaged locations. Some of the current problems
I’m interested in is when we make measurements of infiltration and soil properties
from point measurements, the instruments give
conflicting estimates, they are different from each other. So I’m trying to understand why, because we use these instruments a lot, but we are not fully able to explain that, and so this is one of the
topics that I am interested in. Some other topic that
I have been working on, I want to get back to is
when we do have droughts, and I was working with a large team, how does that affect urban growth? How can we design our
cities to be more resilient to water shortages? Now that we have the capability
to predict water shortages, how do we essentially work with that? So these are topics that
are of interest to me. Some of the open questions that I would like to
address going forward is: should the analysis of uncertainty depend on objectives of the study? How do we actually deal with
prediction and explanation? So the strategies for
both should be different. I would like to be thinking about this. Something that we do a lot in hydrology, worry about reducing uncertainty
and improving predictions, and I want to get away from our standard method of
doing things and see how we can use predictions of
test data during training, which is a slightly different concept, but we’ll have to change
our way of thinking for how we do these things. We use latent variables in
all our statistical models, we need to be able to assign physical interpretation to them. One of the things that
I’m really interested in, because this is a very
standard problem for us, how do we design models and
parameter estimation methods, when hydrologic data tend to be very multi-dimensional and scarce? So when we use deep
learning algorithms and machine learning algorithms,
these are designed for when you have extensive amounts of data. We have the reverse problem,
we don’t have enough data. We have to therefore re-think,
or adapt, usually modify, these algorithms to work for us. So I find that a very interesting topic. Let me briefly touch upon
some teaching interests. So before I came to Purdue
I was at Kansas State, so these were the courses that I taught, undergraduate or graduate level. At Purdue over time I have taught, apart from special topics courses, courses all the way from 100 to 600 level. The 200 level courses are not taught but I don’t need that to
change, it’s fine the way it is. So teaching is something
that I really enjoy, I thought I would do a
very good diversion and share with you some of my
teaching evaluations to show you. I should also point out
that these evaluations are not meant to be flattering, just that they are interesting. So one of the first earlier
evaluations I got was, Dr. Rao seems to care
about students, I hope this doesn’t affect his tenure. So there was a feeling that
if you’re a good teacher you’re not spending
enough time on research. That is not true, I should let you know. Okay, this was interesting. (audience laughs) I’m not sure this was true: Dr. Rao’s shirts and pant
shave the sharpest crease. He seems to put some
effort into his clothes, but what is with the tie selection? I think this is from my earlier days, when I used to be going to work sometimes my daughter would come and say, Nana please wear this tie. It would have nothing
to do with what I had on but I would still wear it. Okay this is actually from here. I have figured out Dr. G’s
limitation as a teacher – he cannot have a class
go by without at least one mathematical equation on the board. Well just to make a point, I didn’t have any equations in this talk. Let’s see, this says: I started to get differential equations in the groundwater class. Our students generally don’t
like differential equations. I still can’t fathom why, but anyhow. He likes mathematics, nice handwriting and blackboard technique.
So this was good. I paid good money for this course and thus deserve a commensurate grade. I don’t think it quite works that way – [Audience Member] He
doesn’t implicate you at all, I paid good money for this
course, what does that mean? – I don’t know. What he’s saying is I paid good money, I
should get a good grade. (laughs) So this is a student who
wrote several things, I have several points to make
about Professor Govindaraju: he is knowledgeable, appears
confident and relaxed, cares about student learning,
he’s a handsome guy. But note, there are five
points, this is four. The fifth point: all the above except ‘d’. (they all laugh) So, like I said, interesting
still, student comments are like really… No, they’re just fine, I thought it was an interesting comment. But I just wanted to share
some of these with you. Engagement, right now I am
actually the president of American Institute of Hydrology, this is our big licensing organization. So I’m in my two-year term. Editorial board of several journals, I’m the editor-in-chief
of Hydrologic Engineering, Kumaresh did it for, I don’t
know, 20, 30 odd years? But his journal. I have been active in
many technical committees, had lead authorship positions, have chaired many other committees, and I continue to do so. I have other consulting
work and industry engagement that I have been involved with as well. Looking forward, discovery
in terms of research, I’m happy to collaborate on problems where my skills would be useful, and if it makes sense would
be really happy to do that. In terms of learning, I learn
a lot from graduate students. Usually my estimate of how
good a graduate student is is based on how much I learned
working with that student. And, as I said, I’ve been
very fortunate with students. I would like to essentially
teach some advanced courses on infiltration and the runon process, engaging uncertainty in hydrology, because essentially I think I can write books on these topics. We have done enough work
which now prompts me to think I should start putting it together. Moment analysis is
something that I have been interested in and may continue to do that. Engagement: would like to
continue seeking Leadership roles in influential national
committees as I go forward. And always looking for
right graduate students. Always. So this was essentially
some of my thoughts, what I have done, I showed
you some examples of some of my students’ work. It gives you a flavor for the
kinds of things that I do, a little about what I think
I’ll be doing in the future, so with this, 50 minutes, this is how I’m reading my
tea leaves going forward. So thank you very much and I’ll see if you have any questions that I can answer. (applause) – [Audience Member] I have
a question, quick question. – Yes. – [Audience Member] When you
have these models that you have I understand the data that you have, they’re from the past, right? – Yes. – [Audience Member] But,
we’re seeing more and more extreme weather events, 100 year flood taking place every year, so what did you do to prediction? How would you correct them? – So my simple answer is,
it’s not a simple question, it’s actually a very complex question. All strategies that we have had
so far for hydrologic design have assumed that the past is going to be a good representation of what
is happening in the future. So we used to look at
past reports to figure out what would be a 100 year event, based on probability of accedents. In future, we are not able
to make that determination, if we assume that we have climate change and things are going to change. So that is an extremely complex problem. There is really no good
answer, nobody has that answer. All these GCMs can, they do
model predictions for 100 years, but they are doing scenario
analysis, they are saying if carbon dioxide gets
doubled, if land use changes, if this happens, then this model thinks this is what’s going to be in the future. This model is not doing great to basically reconstruct the past to begin with. And then we are saying this is what it’s going to do in the future. So when you see these
IPCC reports and so on, that’s why they use many many models, and try to average them to say, well let’s hope errors are canceling out. And that is their strategy. Actually that’s a very deep question, we don’t have the answer to that yet. We do not know how to
do hydrologic design, we have some ideas, basically
our design is risk-based, fundamentally. When you say 100 year event,
we say society is willing to accept this kind of risk and we’ll design for 100 year event, in principle saying that
a flood of magnitude greater than 100 year flood comes, the structure will fail, but we knew that. That was a risk we were willing to take. How we address this question
of risk in a changing climate, that is a very difficult
question, not simple. Don’t have a good answer. Yes, Mark. – [Mark] So, just following up on that. You mentioned resilience in the watershed. – Yes. – [Mark] And I wanted to know, that seems to be a complex problem. – Yes. – [Mark] So do you consider
plastic state of the watershed? – No. I think our definitions
of resilience, I should say, so being a hydrologist, so
really the definition of resilience and so on, we would
want ecologists to give us. What they tell us, is say
this is the water quality standard you must meet. And then we’ll figure it out. So what we do is, use
that to work backwards. The way we defined
resilience is, in this case, if there is a violation,
what is the probability that the watershed will recover? Or how fast can it recover? That’s how resilient it is. And we use how the water
quality is changing in time, to be able to assess that. So we usually have only
either the watershed is in a failed state, because the standards are not being met, or it’s in a non-failed state because the standards are met. So there is no in between either states. – [Mark] So based on the standards? – Yes. So this is where we essentially
handshake with ecologists. They would have to then tell us how to work with that information. So I’m more in the hydrology
part, not in the ecology part. Any other questions? – [Mark] I have a few other questions, but we’re probably
running out of time, so… – We have five more
minutes, so it’s up to you. – [Mark] Okay, so can we have a question. So earlier you showed your samples, the work you were doing
in Italy, for example. You started of by saying that
you had a very defined badge, very well-controlled sand. – Well, sand, silt and
clay, three kinds of soils. – [Mark] Okay, I heard sand,
so my question was that when you scale it up, took it outside, that was not sand, of course, that was a different kind of soil. – Yes. – [Mark] How well were
you able to translate that experiment out into the field? – So the lab-scale experiments
were essentially done with the idea of trying to
understand rainfall, run-off and infiltration on sloping surfaces. When we go to the field,
we are actually not able to translate information very easily. We have to make field measurements to figure out what is
happening in the field. Because those don’t translate. So the scaling problem
I was talking about is when we go in the field, my measurements are still only at a point scale, the volume that I am
sampling is very small. And I perhaps sample that
in multiple locations. How do we use these
small-scale measurements to talk about field-scale behavior, so that is the scaling
problem that we look at. Otherwise, I think to say
that I can take a soil sample, bring it to the lab, do a
standard permeameter test, and say this is the conductivity, I’m simply not able to apply
that conductivity to the field. So that scaling, I’m
not sure it can be done. You have to be in the field to
make field-scale predictions. – [Mark] Just one quick follow-up on that, that same part of your talk. Later on you showed a map, you had different colors
presenting different types of soils, what
is that, the watershed? – I don’t know which one
you are referring to, so… – [Mark] You had a map for
characterizing the watershed. – This one? – [Mark] Yeah, this one. So I was wondering that,
you had differentiated those watersheds, but you were talking
about characterizing them. – Right. – [Mark] Now, I see similar colors. Does that mean you found
similar properties in them? – No, no. What the gradation scale is showing is, what is the erosion potential of that particular shaded area. So they basically go from zero to 100, so watersheds from zero
to five have low potential of generating sediment. 20 to 25 or higher values
have higher potential for generating sediment and therefore are sources for sediment
that comes down the streams. – [Mark] I ask particularly
because you said they were random samples. – No, I said what we are not able to do, is estimate that or measure that. We are only able to
back-calculate that by looking at data at the outlet of the watershed and trying to do the
inverse problem to say which region could have
generated how much sediment? And when we do that, when I go to a different rainfall event, I get different estimates of which region could have generated that sediment. And hence I said one way to deal with that is to treat it as if
it’s a random variable, and what we are getting
with each rainfall event is one realization of
that random variable. And if I have 50, 60 realizations, then I have some way of characterizing the behavior of that random variable. (applause) Thank you all for coming,
thank you very much.