# Faculty Colloquium: Dr. Rao S. Govindaraju

– Good morning everybody. Welcome to our continuing colloquia on recognizing our senior faculty. Today, this is a program that was

developed a few years ago to, again, have our senior,

namely full professors, who have been in rank for

more than seven years, to have a chance to present their work, talk about their experiences,

talk about their, the way they got to where they are. And then actually once they

present this colloquium, they get a chance to talk

to the department head. Of course in this case, you know. But the dean for sure, and talk about the next seven years. And so, today, we have a great pleasure of having Professor

Govindaraju who is your, our department head in Civil Engineering. So, he got his PhD in 1989, from University of California, Davis. And before he came to Purdue, he worked as an assistant

and associate professor at Kansas State. He joined Purdue in 1997, and currently he’s the

Bowen Engineering Head of Civil Engineering. And Christopher B. and Susan S. Burke Professor of Civil Engineering. His primary area of research include surface and subsurface hydrology, contaminant transport,

watershed hydrology, and a statistical hydrology. And I think I will stop right there before I go any further. And so, we’re really excited to hear what he has to share with us today. G.S. – All right. Thank you very much Claude, and thank you all for being here. I think we have been to

several of these symposia. I have attended all the ones that civil faculty were presenting, and including some from

outside departments. So it’s been a good learning experience. One other thing that Claude points out is typically after this presentation, the person has a conversation

with the department head. So Claude, I promise

I’ll have a very stern conversation with myself. (laughs) This actually gives me a chance to sort of marshal my

thoughts, as Claude mentioned, see what I have been doing, take stock of where I am. And also perhaps try

and do some forecasting as to what I think I’ll

be doing in the future. I would like to actually also

mention that in this talk, even though as faculty,

we do a lot of things which fall in the category of discovery, of learning and engagement, or research, teaching, and service. I have focused more on the research part primarily because that is where almost all of my scholarship is. That is where I have done research, published papers and so on. I also perhaps should put the

word experiments under quotes to indicate that this is not

just physical experiments, but also we’re talking

about numerical experiments, theoretical experiments and so on. So, as Claude mentioned, my broad areas of research interest, surface and subsurface

hydrology contaminant transport, and related topics, but I’ll be focusing mostly on surface and subsurface hydrology in this talk. My research drivers, what I think of are typically stochastic processes. I look at scaling behavior, varied interference spatial heterogeneity, uncertainties and risk. So some of the topics

that I will talk about will have borrowed from these areas. And I’ll present a mix of experimental, theoretical, and numerical

work in some of these areas. And as I’m doing this, I’ll perhaps take some examples with some of the graduate

students who have worked with me, and I’ll try and recognize

them along the way. Okay. I also want to first

start by acknowledging some of the funding agencies, not all. I have been fortunate to have my research supported by

a diverse set of agencies, including some international

funding agencies. Even though I was very instrumental in preparing proposals and so on, I think also the funding

the came to me finally was for travel and me for

conducting experiments there. It did not per se

support graduate students and my summer support, but it did provide me

with lots of opportunity to do very interesting work. So I wanted to recognize them as well. So let me start with something

on pore scale mechanics. Those of you who work in porous media will perhaps recognize this. This is Jack Chan, one of my master’s and PhD students,

way back, very bright chap. And some of the things that he was doing were essentially using

mathematical morphological operations and image analysis techniques. For those of you in the geomatics area, you’ll be familiar with

this, when we look at images we can do operations

like erosion and dilation to essentially extract

features from images. And he was essentially doing this to study pore scale properties in porous media. And this is an example

of essentially one cube, so this is just an image, 3D image, with voxels with one pore. And he’s looking at what

happens at different pressures, how water enters the pore and how water comes out of the pore. And the red portion that you

see is what is the air phase. So if you have air and water in a pore, at different water

pressures air will actually go and invade the pore space. So it’s all saturated for

some it’s draining now, that means air is slowly

entering at a different pressure, as you can see, so the value

of d indicates pressure, how air enters that

particular single pore. And the reverse process, wetting, as the water pressure increases, how basically water will

come and invade the pore and eventually dry out,

but even with a single pore you are able to demonstrate

hysteresis effect, sometimes called the ink

bottle effect in hydrology. So this is for a single

pore, at a pore scale. What Jack then did was

essentially constructed a porous medium in the

computer, it’s an image, it’s a 3D image, where

essentially all the pore space is conceptualized as intersecting spheres. So any pore space that you get, I can approximate it as closely

as I want using spheres. And this is just one image of that, so the blue portion that you see is essentially the pore space, okay? And then, using again

image analysis techniques, what we were able to do

is show how a soil that is completely saturated will

slowly be invaded by air and become unsaturated. And therefore this allows

us to take care of not only ink bottle effects, but

how well the porous pores are connected and so on. Which is something that we

have not been able to do in previous methods, okay? So what you are seeing is

at different pressures, how basically air goes and

invades the pore space. And the reverse, when

wetting is happening, how water will enter back and drive the air out of the pore space. So these are fairly

fundamental things that we need for understanding sub-surface hydrology. So this is all in images first, right? So then, what we did was, how do we go from these

images to actually talking about soil properties? So as an example, this

is an image that is taken of a loam soil, so this is an SEM image, you can take MRI Images, using the actual image and then using the theory we developed, we were able to develop what is called a soil water retention curve, which talks about what the

water content is in the soil at different water pressures, it’s a very fundamental property. These symbols that you see, these are actually

experimental measurements. What you see are different,

all competing theories, but these theories,

the prediction is based on fitting the data that you have, whereas the theory we developed with Jack actually uses the image and

predicts what this should be. So these experiments are

fairly long and painstaking, to do a soil water retention

curve pressure plate apparatus can take as much as six months to a year to do one single graph like this. And from there we can

actually also predict what the hydraulic properties

of the soil will be at different levels of saturation. And what you see of course, these are the symbols, this is our theory, so from here to translate to here the other curves tell you what the existing theories would show you. So you can argue, well

G.S. you chose one example where your theory worked very well, so I want to say that

Jack actually did this for 120 different soils. So for 120 different

soils, basically using stochastic theories of

either impenetrable spheres or fully penetrable spheres

we can actually predict, based on when bulk properties

like the soil porosity, and the interfacial surface area, which are easily measurable, what soil properties should be. So this is… – [Audience Member] Professor,

what’s the difference? It seems like the same. – Yes. – [Audience Member]

Impenetrable and penetrable. – That’s right. So not much different. They are different models

but they’re trying to address the same problem. They’re a different way, or a

different conceptualization, of what the pore scale should be like. So the should be giving similar results. Fully penetrable spheres,

however, gives you much more realistic

representation of the pore space. So the model that I had

showed you of the soil, that was a fully penetrable spheres model. So basically from pore

scale to how do we get to bulk properties of soil. So from pore scale, let me move to another sort of experiments which I have been doing

with my colleagues in Italy. And this is perhaps one of my most longstanding collaborations. These are essentially what you would call lab scale experiments, lab bench scale. So this is about one and a

half meters by 75cm by 75cm. So we create sandboxes,

very well-controlled, as homogeneous as possible,

to essentially understand when rain occurs on soil what

is happening to the water. How can we quantify it. So we have various places

where we can collect subsurface water uses and

while its running down they are going to

different depths in these arterial probes that

measure soil water content. And we also did experiments

where grew grass on that surface to

examine different effects. This is Emily Anderson,

she was a masters student, in fact she was here sometime last week to re-visit campus and talk

about some of her experiences. So in her masters work,

and some other students, we did a lot of work over here. These are my colleagues in Italy, faculty members, staff member and so on. But I have spent a lot of

time over there, you know, doing these experiments. So the lab scale. So some of the results that we look at are essentially when the

rainfall event occurs how much surface water we are

getting and when rain stops how surface water essentially vanishes. What is happening in deep flow? That means as water enters

the soil we are able to collect it from below. And this is what the behavior

of that water looks like. How does the water content

change within the sandbox, to sort of understand rainfall

and run-off properties, and if we grow grass on the surface, so we will have a very

strong deep flow component, a smaller surface flow component and again water content’s being

measured in the soil. So we did over 100 experiments like this. We analyzed them and one

of the things that we found was our existing theories of infiltration, which we think we understand very well, they’re not adequate to explain some simple behavior over slopes. Okay, so some fundamental

questions are the mechanism of unexpectedly long recession. So for clay soils, essentially

once rainfall stopped we still collected quite

a lot of surface water, but as all theories would

say once the rain stops within a couple of seconds you should not be collecting any surface water. We also found that our existing theories do not explain very well what happens when water is moving

over a sloping surface. So all our celebrated

theories that we have, none of them were actually

explaining the data very well. So this is something that

we are still working on, and still trying to explain this. Some other experiments that we conducted, or we continue to conduct in Italy, through my collaborations

is from the lab scale we now move to what we would call a small plot or a field scale. So this is a nine meter

by nine meter area, where we have simulated

rainfall experiments we can do. We do natural rainfall experiments. We essentially collect surface flows, we also have soil moisture

probes which tell us how the water content is beneath the soil. We are also able to

catch the deep drainage and work with that. So a much larger scale. And Richa was a PhD student. So one of the things that

Richa was interested in, where she used some of this information, is the problem of scaling. Scaling, I must tell you,

means different things to different people. So in this context, what

we are looking at is say this is the plan view of

the soil that we just saw. Nine meters by nine meters. We know that soil hydraulic

properties tend to be highly variable in space. So this is just a conceptualization,

these different colors show the amount of variability in this supposedly homogeneous soil. The saturated hydraulic

conductivity typically varies a lot. And trying to essentially

determine water movement over such a heterogeneous

surface for a rainfall event is fairly complex. So if we want to run a numerical model, it would take a lot of

effort, a couple of days on a very powerful computer

to do one rainfall experiment, because you have to model

the heterogeneous soil. So in the scaling behavior

what she is looking at is a problem which we are

frequently facing is what is, how is the water content

or moisture content of the soil surface changing with time. So if we knew the behavior of how saturated water conductivity varies, let’s say M1, M2, M3 are three locations where you have soil probes,

soil moisture probes, where you are measuring the water content. And for a rainfall event,

we essentially have data which show how the water

content change with time at these three locations. And because they are spatially variable, the way that they change in

time will be very different. So the idea of scaling is, if we have this kind of information, are we able to use the

physics that we know to collapse all this

into one reference curve, and, having the reference

curve, are we able to then determine at some unmeasured location, if I know just the saturated

conductivity’s location, how water content will change with time. Just from this reference. So not having to solve the

full surface flow equations which are extremely complex. So that is scaling. And I made the problem a

little simpler than it sounds, she did that, she went back to that field. And this is what essentially

her scaling results look like. So at three locations in that field we had measurements of surface waters, soil moisture content. And then what she does is

she uses the measurements at let’s say two locations

to essentially predict what it would be at the third location. So the symbol are

essentially the measurements, and the green line is the scaling model. So, similarly, if she’s

trying to predict over here, she uses the measurements obtained at these two locations to predict what it is at a third location. If you are able to do

that, it’s a huge savings, because, as I said, trying

to do this numerically is a huge challenge. And when she’s doing these predictions, what you see is a scatter

plot which shows her several different events

how well we were able to predict this quantity just

through scaling relationships. So this is more like a plot

scale kind of analysis. Then, she further worked on the problem of aggregation and disaggregation, which is also fairly important for us. The problem of aggregation

essentially says, well if I have these three measurements, and I have a scaling theory, can I predict what the field-scale average

soil moisture would be? Because at that scale

that is of interest to us. And what this show is, for

different kinds of cases, if you have numerical results

we can use those and see how well the average is

predicted by our scaling problem. The reverse problem is if I’m

given the field-scale average, can I then use that to

predict what is happening to soil moisture at any given location. Because this is the problem we face, when we do remote sensing, we

are sensing a very large area and getting one average value for that, for ground truth, however,

we go and measure at a point. So how do we reconcile

between these two scales. So when she does this

disaggregation with her theory, you can see that we struggle,

we do fine but once rain stops it’s very difficult to get those to agree. So disaggregation; that

means given the average to predict individual behavior, that is a much harder problem. So this was you know, more like

lab scale results and so on. Let me to watershed scales. So for watershed scales,

I’m going to start with talking about Latif Kalin’s work. Latif was also a PhD student here. He was looking at a very

interesting problem. So this is essentially a

map view of a watershed, a watershed is an area where, essentially, when rain falls through

the stream network, it funnels all the water

and essentially moves them downstream through the stream network. So when we do management strategies we look at watershed scales. And watersheds are divided

into sub-watersheds, and within each sub-watershed we assume that properties are homogeneous. And then we try to model the behavior. What we typically have

to contend with is that our measurements are usually made just at the watershed

outlets, and measuring flow, we are measuring how much

sediment is going on, a very important problem for

us is, with this measurement, let’s say we have

sediment we are measuring, where is it originating in the watershed? So can we do that back calculation? So this inverse problem tends to be very difficult to deal with. So if I use one rainfall event,

so this is a rainfall event where we had rainfall and then we had sediment essentially come out. What you see are the model

results, which is a solid line, the circles are the

experimental observations. And then is another event, this

is another event and so on. So if I use each event

and try to use this data to figure out where the

sediment would have originated, with each different event

we get a different answer as to which area was

contributing to sediment. So that is the nature of

the experimental data, we have the fact that our models are not perfectly accurate, and of course that the inverse problem tends

to be an ill-posed problem. So what essentially we

conceptualized with him is we will treat the

sediment-generating potential of each of these sub-areas

as a random variable, because that’s the only way we could get our minds around this problem. And so with each experiment,

the values of let’s say the erosion potential that we

get in each of these sub-areas is one value this random

variable is taking, one realization. Once you are able to do that, then if we have many of these events, we have many samples of

these random variables, and then we can use

statistical methods to compare how significantly different

one area is from the other in terms of its

sediment-generating potential. So basically had to re-think the problem, think outside the box

to be able to address, but we do need many rainfall events to be able to do this well. So a lot of data to be

able to do this well. And that’s usually a challenge for us. Mazdak Arabi, another PhD student, also working on watershed-scale problems, he was doing optimization studies. So when we do water

quality in stream networks, we are very much

concerned about, you know, what is the status of the watershed, what is the health of the watershed, are water quality standards being met? So for sediment, 20

milligrams per liter let’s say is the concentration that

EPA or some other body says is what should be acceptable. If you go beyond that we

are violating a standard. One way to address these kinds of problems is we essentially have

best management strategies that we place in the watershed, either in the planned

areas or in the stream. And these essentially help reduce the load that is coming out of the watershed. And we have various options for these, grass waterways, wetlands,

parallel trellises and so on. So one of the things Mazdak

did was essentially use optimization methods to say

how they should be placed in the watershed to obtain best results. Best results either in

terms for a given cost how to distribute them to obtain

the lowest concentrations, or if we want to meet a

concentration standard, how to essentially place the BMPs

to achieve those standards. And again, so very large

optimization problems, also fairly challenging. So some of the questions that we have been interested in are listed over here. So what role the best

management practices play, how do we use water quality data to assess the overall health of

a watershed and so on. So this is another

basically sub-watershed, this shows you what the land use is. And he used fairly advanced techniques, like generalized likelihood,

uncertainty estimation, regionalized sensitivity analysis, tree structure density

estimation, and these are fairly advanced concepts for the time that he was working on on these problems. But what they would

allow you to do is take this very large optimization problem, but give managers an indication

of what kind of practice should be placed where in the watershed to achieve best results. But still, a fairly complicated problem. Then I want to talk a little bit about a larger scale than

watersheds, regional scale, state level, country level. And here, this is Shivam Tripathi, another of our PhD students,

his main focus was essentially how to engage uncertainty that

we have with measurements. And this is a very

important problem for us. In this case he basically was working with the latent variable approach

in a Bayesian framework, using graphical models such

as Hidden Markov models, we will talk about this. And the idea is encapsulated

over here: that measurement is always an approximation or

estimate of the measurand. We measure something, if we

know our instrument well enough we also have a measurement

error that comes along with it. A lot of time we leave the

measurement error alone. And this particularly

a problem in hydrology. So I’ll give a couple of examples. So sea surface temperatures,

so those of you who are into global circulation models

and how we do forecasting and so on, one of the primary imports to all these large

models that work on this is sea surface temperature. So El Niño, La Niña, they’re all based on sea surface temperature values. So sea surface temperature

is a very important boundary condition, it influences atmospheric

variability and so on. It is used for long range

climatic forecasting and general circulation models,

in climate change studies. And what this picture is

trying to show you is not the sea surface temperature

but the uncertainty associated with the sea

surface temperatures. So over time, sea surface

temperature data have been measured or estimated using

remote sensing platforms, through ships passing through,

they measure temperature, buoys that are placed in water, they gather temperature data. And this is showing you

four snapshots in time, May 1850, 1900, 1950, 2000. What you can see is how

the density of measurements has changed with time, but what this is also showing is what is the variability that we have, what standard deviation, how much error we associated with each of these sea surface temperature estimates. So we have this information, but currently none of the GCMs use that. None of the models use that,

it’s too complex a problem. Similarly if I look at other data sets, we did quite a bit of work over India. So this is essentially rainfall data, \ so that data when it’s generated,

when it’s made available, the error or signal-to-noise

ratio is also provided to us, but nobody uses that information, people feel it’s too complicated to deal with the uncertainty information. So this was essentially, or

has been, continues to be, one of the focus of Force I. What we did was develop models which would explicitly account for this uncertainty. And these are graphical pictures. So in very simple terms, graphical models, we have essentially an x variable, we can shut the model,

we have the model error. We use Bayesian non-linear

principal component analysis or noisy principal component analysis, principal component analysis

to reduce dimensionality. Or this is RVM, V and RVM,

relevance vector machines. Again, variable, noisy

relevance vector machines. These are essentially for

regression and BN correlation, Bayesian noisy correlation to essentially do correlation studies. I guess the important thing

with each of these is, with the variables that we are measuring we associate an error, but we

assume that we know the error. And then how do we implement it? So these are actually fairly

standard things that we do in statistical hydrology,

in fact all of us do it. When we do correlation, you fit

something for y which has x, how many of us actually use

the error information is x if it’s available, or

the error information y, if that is available? If you did have that information, your strategy for correlation

would change a lot. So I’ll show you some

examples of how this works, and in machine learning,

what they have is if you want to do something, they

give you bench data sets, benchmark data sets. You have to show your algorithm,

how well does it perform on these benchmark data sets. So, for example. This is the sinc function, the

sinc function looks like this a solid blue line. What is provided to us, so

that is what we are trying to reconstruct, if you will, what is provided to us are these symbols. So these are the measurements, and they come with a lot of error. But this is the original

function that they were supposed to represent. What we do know is the measurement value and the error associated with it. If I use relevance vector machine, which was state-of-the-art,

it’s a very good technique, this black line is what

I would reconstruct from these error measurements

as the true signal. But, if I can use the variational noisy relevance vector machine, which now incorporates the

error in this data explicitly, then this green line is essentially what I would reconstruct. So the fact that I’m

given error information helps me greatly in

reconstructing the series. So if I have missing values

and so on I can do very well. So these are benchmark sets. This is another benchmark set

that we have to worry about when we deal with data. This is the actual data,

this is the image that I would be trying to reconstruct. What I am given is noisy

and incomplete data, not only are there errors,

there are gaps in the data. I need to fill this to be

able to do my analysis. So probabilistic principal

components, DINEOF, regularized EM, these were

the state-of-the-art models. And this shows you, if

I apply these methods, how well I can reconstruct this image. But if I’m able to incorporate

the error information which is provided, then the

method that we came up with reconstructed much, much better. Another example of a benchmark data set is when we use dimensionality reduction. So this is essentially a

data set that was created by essentially giving 100 examples. The data is supposed to

have, it’s 20 dimensional. So there are 20 points in this direction, it has only five independent vectors. But it has noise, and so

when we do data reduction we want to be able to extract this. If I use standardized

principal components, or probabilistic principal components, this is what I extract. If I use Bayesian noisy

principal components, we get those exact five,

only those five vectors back. But that’s because none

of these methods would ingest the uncertainty information. So we basically leave it behind, and I think we can do so much better. So let’s look at some actual data sets. So this is essentially over India, this is the All India

Summer Monsoon Region, GCMs will provide you

data on all these grids, and GCMs also do all sorts

of ensemble averaging, which means many GCMs are run and somehow their average is taken. Computationally very intensive. And this is our state

of the art, right now. So if we do that, this is

essentially trying to show you how well it works. This is time in years, this

is the rainfall anomaly, and the box plot and the spread is essentially from the ensemble, the observed values are

essentially the crosses. So even the GCM ensembles

we don’t do all that well. If we use relevance vector machines, we don’t do great but

we are better than GCMs. And that is reasonably well-known. GCMs are still very complicated, difficult to do prediction with those. Some other examples, if

I’m trying to forecast what is happening let’s say for All India Summer Monsoon

for the month of May, our existing methods would

give this as the forecast, so where this red line is observation, the blue is the mean of our prediction and this gives you an idea

of what the spread is. With more advanced methods you get perhaps a slight improvement. The table below shows you

what the error statistics are. I should also point out that

when we go into testing phase our performance is actually not great. It’s pretty weak. But really that is our prediction skill, with the best methods possible. Ganesh is sitting right here. He works in Hidden Markov models. So you know when we talk to our

phone when you talk to Siri, the speech recognition

software is actually a Hidden Markov model, or it used to be a Hidden Markov model. Now they have more advanced

deep learning techniques like long short term

memory units and so on. But HMMs were used. So what we use them is we actually observe let’s say rainfall as a time series. We want to predict droughts, or we want to be able to

characterize droughts. So we treat droughts as

hidden states, not observed. What we are observing is rainfall, the hidden states are droughts, and then we use the Hidden Markov model to essentially characterize

these drought states and do a probabilistic classification. What probabilistic classification says is, if I look at my phone it says

20% chance of rain tomorrow, and I make a decision, should

I get an umbrella or not, should I wear a coat or not? For droughts, that is not the case. You go to the US drought monitor, it says you’ll have a D2 drought. D2 drought is a drought of

a certain level of severity. D4 is a very severe drought. But it doesn’t tell you anything about what percentage chance, it just says D2. They could be off by a wide margin, but you have no way of knowing that. A 20% chance or rain at

least gives you an idea of what to do with it, if

you are just going to say it may rain tomorrow,

what are you going to do with that information? So probabilistic classification

helps us with that. And this essentially shows

you a little about the model. And this is just an

example of how it differs from the standard techniques. So what you are seeing

over here is essentially that’s the rainfall series,

in both cases,the blue line. This graph essentially

is a probability scale and it shows that each different year, that the standard method would give you one drought classification. So basically over here for this year, it’s moderate and that’s

with entire probability one. Whereas if we use a

probabilistic classification at each time, the height

of the bar tells you what probability will

belong to each class. So your prediction may be, well we are in moderate

drought with percentage, it could be a severe drought

with this percentage, or it could be a mild

drought with this percentage. That is much more graded information, which watershed managers then can use to divert resources more confidently. It also shows you the differences

that you are going to get between the two methods, because they come form different ideas. If the precipitation is very low, we should be actually thinking

of a very extreme drought, which standard method may

not be able to capture. So there are some nuances

that we deal with it. Similarly let’s say we

are trying to predict extreme droughts in India. Our standard method, by definition, must give us uniform value everywhere, it doesn’t give you a chance to say this area is more prone

to droughts than that different area. Those comparisons are

not available because this method was not designed

for that comparisons. However, in some of these

more advanced models we can show that some parts,

northwest part of India, is more prone to droughts. Another example in the

monsoon-affected regions, is to start a monsoon

breaks and active breaks, active spells and so on,

or breaks in the monsoon. And we have online and offline methods. The online method is what we propose, which says after we have sort

of figured out the model, as new data becomes

available it keeps updating. So it is basically useful to

do a continuous prediction. Whereas the standard methods that we had, the offline method, you would have to give it all the data at once

and let the model decide. So you do not know how well

it performs on unseen data. Coming back to Indiana, Shih-Chieh Kao was another

very bright student. We were working droughts in Indiana, and this is an example

of where we used again very advanced statistical techniques, copulas for joined behavior and so on. And many of you will perhaps

remember the 1988 drought, that was a very severe drought. So some of the results that we

were able to obtain for use, so the state of Indiana

in such a deep drought, how much rainfall would be needed to get to normal conditions? So most of the state would have required seven inches of rain, it’s

very difficult to get. What we were also able to then say is, what is the probability of

getting seven inches of rain? So basically between 0.1 and 0.3. So very little chance of

getting out of this drought, because we need a lot of rain, our probability of recovery is very small. And we were able to do

forecasts for one month, six months and so on. Very useful for water planners. Meenu Ramadas is another PhD student, she was also working on drought, she’s talking about drought precursors. How can we use our existing knowledge, what we know right now, in

terms of various variables like soil moisture,

precipitation, run-off and so on, to say what kind of a drought

we will get in the next month. And what these graphs are showing is, for different variables, let’s say if I take the

month of March, for these three variables at least

there is some gradation, this is going to be a very severe drought, where it says a mild drought. Other variables like

evaporation, wind speed, sea level temperatures, they do not have enough

resolution to tell you what kind of a drought you will get. Because they don’t contain that much information about droughts. So this was part of Meenu’s work. We also, one of the things that we look at is when we have these variables and we are trying to do forecasting. So this is where let’s

say the calibration data, let’s look at the validation data for each of these variables. What this scatter shows is

that our predictive ability is actually very weak. Maybe 10% with each variable. So it’s very difficult

to make predictions. However if you have multiple variables, and you’re confident in each

of them as ten variables, and all these variables

are pointing towards the same direction, then you can combine their effect to get a much higher confidence level. And usually that is

what we have to rely on, because the processes are so complex, that working with a single variable, unless it’s an extremely strong predictor, you really don’t have much to work with. In which case you have to start pooling a lot of other knowledge to

make reliable predictions. So I’m goig to do a small diversion and take some time to

acknowledge the students. I talked about the work

of some of my students, these are essentially many of the students that have worked with me over time, and some of my current

students, graduate students, very important to my work, they have contributed

a lot to my learning, and some post docs and

visiting scholars also, as you can see over here. Several of the students, they

have all been doing well, several of them are in academic positions, some of them are professors, you’ll notice that one

person is a professor and a head elsewhere. So students are doing

well, and that’s great. I also want to essentially

acknowledge the students by listing some of the awards

that we have got with students. Three of my students have

got best dissertation awards, I’ll point to some of the more

let’s say prestigious awards with my students. Shivam Tripathi, let’s start with him. KDD is Knowledge, Data and Discovery, it’s one of the machine learning

prestigious conferences, computer science people go to this, and they had a challenge problem. And Shivam, we talked about

some of his algorithms, he was essentially awarded

a best challenge paper award with that problem. Shivam also was recipient

of the Alfred Nobel prize, which is a joint society award with ASCE, AIME, IEEE and WSE. All these societies get

together and pick one, and this is one of the ASCE awards. Shih-Chieh Kao got a best paper award which was decided by the

European Geophysical Union, which looks at all hydrology

papers in all journals, and picks one, typically based

on how well it has been cited and so on. So very fortunate to have

worked with many students who have done very well. Current topic. So some of the things that we

are doing, for instance, is, this is essentially the upper

Mississippi river basin, Ohio river basin and so on. When we do have all these

stations where we have flow and water quality data, water quality is very sparsely sampled. So what you see in this graph are, these are the symbols for water quality. Using these advanced methods

that we talked about, we can reconstruct that series, and we also have the error

information about it. Then, from that we are essentially

able to do scatter plots of how well we predict water observations. We’re also able to use this

water quality data to figure out what the resilience is of this watershed. In other words, how soon does it recover when a violation occurs? And, this is essentially a histogram, because we have uncertainty

associated with it, we get a histogram of resilience values. So this was also

something that Yamen Hoque was essentially working on. So if you have a watershed

we measure water quality at different stations, we measure different

water quality parameters, they all have different standards. So you may have alachlor,

ammonia, atrazine, total suspended solids, different

measurements, very sparse. So you can reconstruct the series, you can come up with a

composite water quality index and error around it to

essentially describe the watershed health. So a lot of work, I

think, that we are doing, is in reliability, resilience and vulnerability of watersheds

based on these concepts. Current topics, so you have two students, Abhishek and Anubhav, we are

now looking at how we operate in ungaged basins, where

we have no measurements. So we use some machine learning techniques from measured locations, see how well we can do to

predict what is happening at unmeasured locations in

terms of watershed health. And so when we test these

methods we have essentially areas where we do have

measurements, but we don’t use them, we only use them to see how well we do, and then we do a scatter

plot to get an estimate of how close we are in making

these kinds of predictions at ungaged locations. Some of the current problems

I’m interested in is when we make measurements of infiltration and soil properties

from point measurements, the instruments give

conflicting estimates, they are different from each other. So I’m trying to understand why, because we use these instruments a lot, but we are not fully able to explain that, and so this is one of the

topics that I am interested in. Some other topic that

I have been working on, I want to get back to is

when we do have droughts, and I was working with a large team, how does that affect urban growth? How can we design our

cities to be more resilient to water shortages? Now that we have the capability

to predict water shortages, how do we essentially work with that? So these are topics that

are of interest to me. Some of the open questions that I would like to

address going forward is: should the analysis of uncertainty depend on objectives of the study? How do we actually deal with

prediction and explanation? So the strategies for

both should be different. I would like to be thinking about this. Something that we do a lot in hydrology, worry about reducing uncertainty

and improving predictions, and I want to get away from our standard method of

doing things and see how we can use predictions of

test data during training, which is a slightly different concept, but we’ll have to change

our way of thinking for how we do these things. We use latent variables in

all our statistical models, we need to be able to assign physical interpretation to them. One of the things that

I’m really interested in, because this is a very

standard problem for us, how do we design models and

parameter estimation methods, when hydrologic data tend to be very multi-dimensional and scarce? So when we use deep

learning algorithms and machine learning algorithms,

these are designed for when you have extensive amounts of data. We have the reverse problem,

we don’t have enough data. We have to therefore re-think,

or adapt, usually modify, these algorithms to work for us. So I find that a very interesting topic. Let me briefly touch upon

some teaching interests. So before I came to Purdue

I was at Kansas State, so these were the courses that I taught, undergraduate or graduate level. At Purdue over time I have taught, apart from special topics courses, courses all the way from 100 to 600 level. The 200 level courses are not taught but I don’t need that to

change, it’s fine the way it is. So teaching is something

that I really enjoy, I thought I would do a

very good diversion and share with you some of my

teaching evaluations to show you. I should also point out

that these evaluations are not meant to be flattering, just that they are interesting. So one of the first earlier

evaluations I got was, Dr. Rao seems to care

about students, I hope this doesn’t affect his tenure. So there was a feeling that

if you’re a good teacher you’re not spending

enough time on research. That is not true, I should let you know. Okay, this was interesting. (audience laughs) I’m not sure this was true: Dr. Rao’s shirts and pant

shave the sharpest crease. He seems to put some

effort into his clothes, but what is with the tie selection? I think this is from my earlier days, when I used to be going to work sometimes my daughter would come and say, Nana please wear this tie. It would have nothing

to do with what I had on but I would still wear it. Okay this is actually from here. I have figured out Dr. G’s

limitation as a teacher – he cannot have a class

go by without at least one mathematical equation on the board. Well just to make a point, I didn’t have any equations in this talk. Let’s see, this says: I started to get differential equations in the groundwater class. Our students generally don’t

like differential equations. I still can’t fathom why, but anyhow. He likes mathematics, nice handwriting and blackboard technique.

So this was good. I paid good money for this course and thus deserve a commensurate grade. I don’t think it quite works that way – [Audience Member] He

doesn’t implicate you at all, I paid good money for this

course, what does that mean? – I don’t know. What he’s saying is I paid good money, I

should get a good grade. (laughs) So this is a student who

wrote several things, I have several points to make

about Professor Govindaraju: he is knowledgeable, appears

confident and relaxed, cares about student learning,

he’s a handsome guy. But note, there are five

points, this is four. The fifth point: all the above except ‘d’. (they all laugh) So, like I said, interesting

still, student comments are like really… No, they’re just fine, I thought it was an interesting comment. But I just wanted to share

some of these with you. Engagement, right now I am

actually the president of American Institute of Hydrology, this is our big licensing organization. So I’m in my two-year term. Editorial board of several journals, I’m the editor-in-chief

of Hydrologic Engineering, Kumaresh did it for, I don’t

know, 20, 30 odd years? But his journal. I have been active in

many technical committees, had lead authorship positions, have chaired many other committees, and I continue to do so. I have other consulting

work and industry engagement that I have been involved with as well. Looking forward, discovery

in terms of research, I’m happy to collaborate on problems where my skills would be useful, and if it makes sense would

be really happy to do that. In terms of learning, I learn

a lot from graduate students. Usually my estimate of how

good a graduate student is is based on how much I learned

working with that student. And, as I said, I’ve been

very fortunate with students. I would like to essentially

teach some advanced courses on infiltration and the runon process, engaging uncertainty in hydrology, because essentially I think I can write books on these topics. We have done enough work

which now prompts me to think I should start putting it together. Moment analysis is

something that I have been interested in and may continue to do that. Engagement: would like to

continue seeking Leadership roles in influential national

committees as I go forward. And always looking for

right graduate students. Always. So this was essentially

some of my thoughts, what I have done, I showed

you some examples of some of my students’ work. It gives you a flavor for the

kinds of things that I do, a little about what I think

I’ll be doing in the future, so with this, 50 minutes, this is how I’m reading my

tea leaves going forward. So thank you very much and I’ll see if you have any questions that I can answer. (applause) – [Audience Member] I have

a question, quick question. – Yes. – [Audience Member] When you

have these models that you have I understand the data that you have, they’re from the past, right? – Yes. – [Audience Member] But,

we’re seeing more and more extreme weather events, 100 year flood taking place every year, so what did you do to prediction? How would you correct them? – So my simple answer is,

it’s not a simple question, it’s actually a very complex question. All strategies that we have had

so far for hydrologic design have assumed that the past is going to be a good representation of what

is happening in the future. So we used to look at

past reports to figure out what would be a 100 year event, based on probability of accedents. In future, we are not able

to make that determination, if we assume that we have climate change and things are going to change. So that is an extremely complex problem. There is really no good

answer, nobody has that answer. All these GCMs can, they do

model predictions for 100 years, but they are doing scenario

analysis, they are saying if carbon dioxide gets

doubled, if land use changes, if this happens, then this model thinks this is what’s going to be in the future. This model is not doing great to basically reconstruct the past to begin with. And then we are saying this is what it’s going to do in the future. So when you see these

IPCC reports and so on, that’s why they use many many models, and try to average them to say, well let’s hope errors are canceling out. And that is their strategy. Actually that’s a very deep question, we don’t have the answer to that yet. We do not know how to

do hydrologic design, we have some ideas, basically

our design is risk-based, fundamentally. When you say 100 year event,

we say society is willing to accept this kind of risk and we’ll design for 100 year event, in principle saying that

a flood of magnitude greater than 100 year flood comes, the structure will fail, but we knew that. That was a risk we were willing to take. How we address this question

of risk in a changing climate, that is a very difficult

question, not simple. Don’t have a good answer. Yes, Mark. – [Mark] So, just following up on that. You mentioned resilience in the watershed. – Yes. – [Mark] And I wanted to know, that seems to be a complex problem. – Yes. – [Mark] So do you consider

plastic state of the watershed? – No. I think our definitions

of resilience, I should say, so being a hydrologist, so

really the definition of resilience and so on, we would

want ecologists to give us. What they tell us, is say

this is the water quality standard you must meet. And then we’ll figure it out. So what we do is, use

that to work backwards. The way we defined

resilience is, in this case, if there is a violation,

what is the probability that the watershed will recover? Or how fast can it recover? That’s how resilient it is. And we use how the water

quality is changing in time, to be able to assess that. So we usually have only

either the watershed is in a failed state, because the standards are not being met, or it’s in a non-failed state because the standards are met. So there is no in between either states. – [Mark] So based on the standards? – Yes. So this is where we essentially

handshake with ecologists. They would have to then tell us how to work with that information. So I’m more in the hydrology

part, not in the ecology part. Any other questions? – [Mark] I have a few other questions, but we’re probably

running out of time, so… – We have five more

minutes, so it’s up to you. – [Mark] Okay, so can we have a question. So earlier you showed your samples, the work you were doing

in Italy, for example. You started of by saying that

you had a very defined badge, very well-controlled sand. – Well, sand, silt and

clay, three kinds of soils. – [Mark] Okay, I heard sand,

so my question was that when you scale it up, took it outside, that was not sand, of course, that was a different kind of soil. – Yes. – [Mark] How well were

you able to translate that experiment out into the field? – So the lab-scale experiments

were essentially done with the idea of trying to

understand rainfall, run-off and infiltration on sloping surfaces. When we go to the field,

we are actually not able to translate information very easily. We have to make field measurements to figure out what is

happening in the field. Because those don’t translate. So the scaling problem

I was talking about is when we go in the field, my measurements are still only at a point scale, the volume that I am

sampling is very small. And I perhaps sample that

in multiple locations. How do we use these

small-scale measurements to talk about field-scale behavior, so that is the scaling

problem that we look at. Otherwise, I think to say

that I can take a soil sample, bring it to the lab, do a

standard permeameter test, and say this is the conductivity, I’m simply not able to apply

that conductivity to the field. So that scaling, I’m

not sure it can be done. You have to be in the field to

make field-scale predictions. – [Mark] Just one quick follow-up on that, that same part of your talk. Later on you showed a map, you had different colors

presenting different types of soils, what

is that, the watershed? – I don’t know which one

you are referring to, so… – [Mark] You had a map for

characterizing the watershed. – This one? – [Mark] Yeah, this one. So I was wondering that,

you had differentiated those watersheds, but you were talking

about characterizing them. – Right. – [Mark] Now, I see similar colors. Does that mean you found

similar properties in them? – No, no. What the gradation scale is showing is, what is the erosion potential of that particular shaded area. So they basically go from zero to 100, so watersheds from zero

to five have low potential of generating sediment. 20 to 25 or higher values

have higher potential for generating sediment and therefore are sources for sediment

that comes down the streams. – [Mark] I ask particularly

because you said they were random samples. – No, I said what we are not able to do, is estimate that or measure that. We are only able to

back-calculate that by looking at data at the outlet of the watershed and trying to do the

inverse problem to say which region could have

generated how much sediment? And when we do that, when I go to a different rainfall event, I get different estimates of which region could have generated that sediment. And hence I said one way to deal with that is to treat it as if

it’s a random variable, and what we are getting

with each rainfall event is one realization of

that random variable. And if I have 50, 60 realizations, then I have some way of characterizing the behavior of that random variable. (applause) Thank you all for coming,

thank you very much.