UCCSS (University of California Computational Social Science): Hilbert Intro1 Background

UCCSS (University of California Computational Social Science): Hilbert Intro1 Background

October 15, 2019 1 By Stanley Isaacs


Hi and welcome to this University of California
wide introduction to computational social science
I’m very excited about this course for two reasons
first computational social science is a really new field so I am myself and we all are just
learning about it it wasn’t even around when I went to school
or to college so on the go we all are learning together and second knowing about that we
are all learning together what we did with this course with help of the university of
California office of the president is we went to all ten UC campuses and we got important
lectures from leading experts that the University of California has from all these ten campuses
so we are I am learning a lot and I’ve learned a lot doing this and we’re all learning from
each other and I’m very happy to have you on board as well because in this very new
and exciting field we are all learning together What I can promise you in this course is that
it is an introduction to the scientific method we will study ways of how we can create knowledge
how we can systematically create new knowledge so the scientific method basically and I can
promise you that it’s going to be a lot of fun doing that and it’s going to be very
cool because we have very new cutting edge technologies and third of all its going be
extremely relevant. right now of the five most valuable companies
all five of them are basically doing that they’re doing computational social science
when I went to school the most valuable companies on planet earth they were general electrics,
general motors or Mcdonalds or some petroleum companies right
nowadays the most valuable companies on planet earth are studying creating new knowledge
about society they’re doing computational social science so it’s not only going to
be fun and cool and exciting it’s going to be all extremely relevant
and it’s extremely relevant for all of us and for the world as you can see with that
fun figure Now in general of the idea of doing computational
social science is driven by one fundamental fact and that is The Digital Revolution. The digital revolution happened extremely
quickly and profoundly. For example, if you look here at the late
eighties, the white part that you see here is the amount of analog information in the
world technology mediated stored analog information, and the little green part of the bottom is
that amount of digital information in the worlds. In the late eighties, ninety nine percent
of all information that we had as human kind externally stored was in analog format. Format, for example on papers. Then if you can see how that progressed. Well the digital part of information grew
but then after the year two thousands it just exploded. We estimated in the year two thousand and
two, for the first time the world was able to store more digital than analog information. We will have papers around but by now digital
information counts for more than ninety-nine percent of all of the information we actually
have. When it’s in digital format at the same
time we can also analyze it in the same format. Its digitally stored, we can compute it. There you already see the idea of computational
social science. We just dig into that and see what we can
learn about society. Now there’s a whole lot of information we
can dig into. The last time I did this estimation was in
two thousand fourteen. That’s when I updated it. I did it before the two thousand-seven. I don’t update it so much because it’s
quite boring to count bits. If anybody wants to help me happy to update
it again. So in the year thousand fourteen I updated
it for the last time. I found that the world was able to store five
bytes. Wow, that’s a number with twenty one zeros. How far is five set bytes? If you would take all this information, that
you have on your hard-disk and your cell phones and microchips on the back of your credit
card. And you would put this in books, the read
books, and you would make a pile. How high do you think the pile would reach? Would it reach to the moon? Or to the sun? It would reach four thousand- five hundred
times to the sun. So there would be four thousand five hundred
piles from the earth to the sun with books. That is actually to get to the sun. If you would take your car and you would drive
one hundred thirty kilometers an hour. That’s the speed on the German autobahn,
which is famously fast. And you would drive day and night without
a break, twenty four hours, no potty breaks, nothing. Not even getting a coffee. You would have to drive for one hundred and
thirty years to get to the sun. You would pass a lot of books, you could look
into in these hundred thirty years. Day and night driving with high speed on your
car. So there, in four thousand give hundred times. And this amount of information is doubling
every two to three years, so actually as we are talking right now, its more like ten thousand
and more piles of that, that we have available to dig into. Doubling also means that each time we double,
we have as much information as we had since prehistory until now. Right, so imagine now in two thousand, lets
say we would be, back in two thousand and fourteen, we’ve have four thousand five
hundred piles of books. That’s all we were able to accumulate during
human history and now we are doubling it. We are adding another four thousand five hundred. So we restore as much and new information
as all the information we were ever able to. There is a lot of information to dig into
and all the time you produce a lot more and its in digital format. That means that we can compute it artificial
intelligence, machine learning, we can see new kinds of information that we never saw
before. For example, social networks. We will talk a lot about that and we can calibrate
our theoretical models. We will get to all of that but that’s the
first thing I want to leave you with. A lot of information we can dig into about
society as well. And society is producing all this information
to all the time. Even the most ancient things in society are
being transformed. For example, the papacy. That the pope, the Catholic pope, very ancient
thousand year old tradition. This out there was inauguration of the pope
in two thousand and five, of Pope Benedict. And this down here only eight years later
in two thousand and thirteen the inauguration of Pope Francis. There is something changed! What changed? We are all social scientists now. We take pictures all the time, we record social
reality all the time. So while we didn’t have such a complete
picture of the inauguration of the pope in the year, I don’t know eight hundred. Now all of us recording social science activities
all the time with our phones and maybe even do some computations. We are sharing them, getting feedback, we’re
trying to record not only trying to analyze that is about and figuring things out. We all have become social scientists. We all do computation social science. And this transformational, also if you look
at it from a bird’s view is really important for the evolution of human history. For example, so I told you the world is able
to story five bytes or was back in two thousand and fourteen. Let’s put that into this ball, so let’s
say this is the five zbtyes. Now, life itself also runs from a program
called D.N.A and we have it stored in our trillions of cells. So all of the trillions of cells that make
up my body, may have different jobs, skin cells, bones cells, blood cells, brain cells
called neurons. But they all inside they have a little program
they run off, on and that’s called D.N.A. Now, lets take the DNA from each one of the
trillion cells of my body (they all have a copy of this program). It’s a very cool program to produce such
flexibility and create all of this right? So all of the trillion cells , I take each
copy and I take yours as well, and I take the DNA of all human kind in each cell that
human kind as. Well that’s a lot of information. Do you think it’s more or less in these
give zbytes? Its less! We have already passed it. It’s about one. So is its five times more information that
we stored digitally as we store in DNA. And in DNA, we don’t store much more because
we grown as humans but we grow at, I don’t know, its either one percent or one of two
percent per year. Whereas digital information grows thirty percent
per year, twenty five to thirty percent, so it grows much much faster. And that then leads to the idea of does it
actually have an effect on evolution? Alright, because if life basically runs of
information and now with this digital information also and artificial intelligence and so forth. Evolutionary theory actually tells us every
time something happened in life’s history was when there was a major innovation in information
processing. For example, if you look at this graph here,
when we went from RNA to DNA. When DNA was invented that was innovation
and how information was processed. Then the DNA went into the cell and it was
computation program and then the cell to milli-cell. Is called the major transitions of evolution. Every time something significant happened
in the evolution of life was when we discovered a new way to process information. For example, in DNA was a big innovation,
enable a whole new way of doing life and now digital information. It is actually accepted that the digital is
the next stage of evolution of life, of human evolution right. Both ways, we’re creating artificial life
without artificial intelligence and also we use digital to sequence our life and so forth
with emerges. This graphic actually that you can see here,
we publish that with two colleagues in the highest ranked journal of evolution and ecology. I say that to show you that even the most
hardcore evolutionary theorists now take this as a give, yes the next stage of evolution
is the digital and the digital then also enables us to understand better society. And we better because if we’re kind of like
merging into that. We already merged into it in our daily life. For example, on the stock market up to eight
percent of transaction on the stock market are carried out by artificial intelligence. Algorithm trading. The power grid (our energy resource. I mean energy, you can’t store, you have
to be very quick. Almost all of it ninety-nine. Almost all of it is actually managed by artificial
intelligence. This information processor cannot handle so
many decisions which household needs energy when and where. So artificial intelligence manages our energy
resources. And dating as well. The procreation of our species. Between a third up to half of marriages produced
nowadays are the result of this artificial intelligence matching our species on online
dating platforms. So now if you go to me and say look we discovered
this new extraterrestrial species, eighty percent of the distribution of the resources
is managed by this other thing. Ninety-nine percent of the distribution of
the energy and the third of the decision of procreation, well on average. There might be appropriation another way but
yes and there’s also properly match in artificial intelligence. So let’s say a third on average of the decisions
appropriation, you know are also you know mediated by this thing artificial digital
whatever you might call it. I would say, well this two systems are one
already. Right, biological intelligence, artificial
intelligence, digital and the biological are already merged. We already one with the system. And that’s what this actually talks about
the evolution. And that’s why it’s a computational social
scientists is so important because we kind of like use the same digital tools to understand
better how society works, kind of like these things the stock market and so forth, online
dating. We can study that much better and understand
where our species is going and where it is going in the future. Now I have lots to say about these things
and I’m not going to bore you now about how actually the digital evolution changed
society because this guy here talks a lot about that in another course. There’s another course of mine online course
Digital Technology and Social Change. And they are for ten sections, ten weeks,
we talk all about how digital technology changes health, changes education, changes entertainment,
political revolutions and so forth. So you can, I will stop there right now but
it also changes the way knowledge is created. So that’s kind of like also my job as a
researcher and what we do in the University of California, we create knowledge right? Let there be light. That’s the motto, we shed light on unknown
things. Digital technology helps us surely, as well
to do that and that sort of you talk about in this course. In the digital revolution, revolutionized
the scientific method in all its aspects there. For example, we already talked about on empirical
work so the digital footprint that we leave behind. For example, when we take pictures and so
forth, or when we record our digital footprint with our online activities. And we can dig into that and analyze it. But also with regard to theory we can make
powerful computational models about society now; our computer can hold that and compute
it. We kind of like create a digital twin reality;
where we can simulate society, it’s as fun as playing a video game as you can see here
on its as confusing as playing a video game sometimes can be. So we do theoretical worlds, we do theoretical
models of society and in between of course the analytical. So, the bridges of those two, the empirical
and the theoretical. And we want to do a lot of work with different
analysis tools. We use machine learning, artificial intelligence,
as well to make sense of this data. Social network analysis, some of the analytical
tools that we will cover in this course. And all of that will be brought to you by,
as I already said, researchers from all California. So, if you start historically with the University
of California, as a little history on the side here, Berkeley was our first campus,
right, in 1868. Then UCSF joined, well UCSF actually, just
to be correct, existed before Berkeley but it joined the University of California public
system five years later in 1873. Then came Davis, that’s where I am here at
the University of California, Davis which originally was the agricultural school of
the University of California and very quickly converted into one of the biggest campuses. UCSB in 1909, so expanded to Southern California
and after the First World War in 1919 UCLA, after the Second World War 1954 Riverside. Then joined San Diego, Santa Cruz, Irvine
and our most recent member of the UC family the campus in UC Merced. So actually, I had a lot of fun traveling
all around and bringing you all these lectures, bringing them together from the leading experts
in many of the fields that concern computational social science. And well I’m also very grateful to the office
of the president of UC, that they helped us financing and sponsoring this and to bring
you this powerful collection of knowledge that we have here. Well, I am looking forward to exploring everything
together with you. Deeper into exploring all of these cool and
fancy computational tool. Let’s also look at the second part of what
computational social science is about. That is Social Science. So what is this so important, urgent, and
relevant to do social science and why now? Alright so let’s take a little historical
context for that. Warren Weaver was a big advocate of science
in the last century in 1948 and 1950s. He wrote an influential article about science
and complexity and he basically said well there are three kinds of problems that we
have been working in the entire history of science. At the beginning we started out with tackling
what are called “problems of simplicity,” science before 1900 he said was largely concerned
with two-variable problems. So for example in physics, pressure and temperature. In social science, the population and time
– of population change over time. All production and trade and there are these
formulas that you know we look at in high school as just one variable and another variable. It’s pretty simple. It’s like two variable problems. Maybe sometimes even three variable problems
but there are pretty simple keeping it to a small number of often involved variables
that we look at and how they relate. Then he said we take a problem off of averages
basically. We take a problem off of an average just basically
subsequent to 1900 scientists develop powerful techniques of probability theory to deal with
a problem in which the number of variables is very large and one in which each of the
many variables has a behavior which is perhaps totally unknown. So for example, a billiard ball is an air
molecules in thermodynamics are all kind of averages. It is so that the behavior of the individual
is then unknown but we can work with averages and we do that in social science a lot. Through the Law of Large Numbers, so once
we have large numbers we can disregard the details of each individual members of this
large number and we get these nice distributions. For example, around an average and we work
with these distributions. For example we know that the height of people
is mainly average. People have mainly average height. But there are few people who are extremely
tall and very few people who are extremely small. But on average you know, it’s normal. That’s why it’s called a normal distribution. The bell curve on average, most people fall
under these distribution of heights and we can just work with these distribution now. We don’t have to consider the individuals
and all kinds of statistical analysis correlation, regression and so forth. It’s based on this what we could call a
science of averages. It requires some assumptions on how we fitted
to these distributions. So that came then in the 20th century. And then Weaver said there are problems of
complexity, of organized complexity. Dealing simultaneously with a sizable number
of factors which are interrelated into an organic whole problems…problems in the…economic,
and political sciences…cannot be handled with the statisical techinques so effective
in describing average behavior…These new problems and the future of the world depends
on many of them, requires science to make a third great advance, an advance that must
be even greater than the 19th century conquest of problems of simplicity
or the 20th century victory over problems of disorganized complexity. So he said science must over the next fifty
years learn to deal with these problems because future of the world depends on many of them. Well he said that in 1950 right so more than
70 years ago. But hold on, we are getting to it. With computational science. We might be a little behind schedule he gave
us but now we are starting to get to these complex problems and he actually also predicted
that. He said the war time development of new types
of electronic computing devices…will have a tremendous impact on science. They will make it possible to deal with problems
which previously were too complicated, and more importantly, they will justify and inspire
the development of new methods of analysis applicable to these new problems. And that’s what we’re talking about so
this computational science approach comes up because of this wartime development of
electronic devices which nowadays we call computer telecommunication databases and so
forth. And they allow us to develop new methods and
so our computational science or in our case computational social science. And Weaver said that its especially applicable
to problems in the social sciences. Because these are problems many of them don’t
fit. These are not simple, we don’t only have
one or two or three variables but they are also not often easy to build averages because
there’s a lot of diversity and interdependency. Averages often assume that independent people
are interdependent. They are not all the same but they are also
not completely different. They’re somehow different and somehow connected. It’s not like we’re all connected. It’s a really complex thing. Actually the social sciences, if you go to
the pyramid of sciences, it’s the most complex of all the sciences. So maybe start with what was the, you know,
the original science, you know, back in the days, hundreds of years ago, it was, you know,
philosophy or physics so that’s crazy what it was that was humanity of physics that was
science. And physics, yes, the fundamental one, talks
about the particles, the universe, matter and how matter actually organizes and is structured
and how the dynamics of matter happen. Now, if you take a bunch of particles and
put them together, what we can create is a higher level, for example, atoms and molecules. So then we go to chemistry. So chemistry is the connection, is already
a network out of these atoms and these molecules. Now these molecules can form macro molecules. And macro molecules like for example D.N.A
are the basis of higher forms of organizations, for example, cells. And cells with a bunch of macro molecules
then, you know, a cell is an entire little city, a lot of things going on. This kind of like a power production center
in there as free communication center, transport system, a waste disposal system. So in the cells now, the cells then on a higher
level create entire organism – multicellular organism. These are connections of cells and that’s
why biology has different levels as well within biology. And one kind of multicellular organism that
we can create is the human. So now we have the human made out of trillions
of cells in my body and a socialization, actually the network among different humans. So we have a network among different humans. So it’s actually one higher level. We don’t look at one individual human here. We look at a whole of human and this from
a bird’s eye view that’s what we call the social and actually it gets even more
complex because nowadays we have another layer of it. we have technology in there. So it’s actually a social-technological
system that we have here. A big part of the day you’re not communicating
with another biological information processor. You communicate with an Artificial Intelligence;
you communicate with a database if you just go on the internet, most of the communication,
a big part of the communication you do there, maybe not for everybody most, but a big part
is you Google for example. You look for information, you communicate
with a platform that intermediates different pieces of information. So technology has become an intricate part
of this socio-technological whole. Now all of that going down again, society
of course, is based on humans; and humans are multicellular organisms; and that is based
on cells; and that is based on molecules; and that is based on particles. So all the way down and in each one of these
levels, a different level of emergence happens. So there are really different rules on each
one of these levels. I mean chemistry is not applied physics; and
biology is not applied chemistry; and the social science is not applied biology. Yes, yes, something it influences certainly,
but, you know, each level of these emergences… That’s why we structure science like this
and this course is also an introduction to science. We structure science like this, and each level
of them: new laws emerge and we can study them and you can specialize just on this level
of emergence and study this kind of law. You don’t, the interesting thing is you
don’t, really have to know all of the details of the lower levels. Sometimes it might help, but sometimes really
they’re not really related. There might be some principles that apply
to all of them and so forth. But you can see that actually the socio-technological
system is based on them and that makes it extremely complex. There are a lot of moving parts going around
that we actually have to consider and that makes it so difficult also to make predictions
in the social sciences. Which is one of the reasons that some researchers
from the natural sciences used to say that the Social Science is not really a science. It’s more like art. Right, the university of the arts and sciences
because in the social sciences, we weren’t able to make a lot of predictions. If we could predict like 20% of a variance
of something, you would publish in the highest journals in the social sciences. And physicist and biologists, well, that’s
not, we can make predictions with much higher accuracies. You guys are not a science; you’re basically
an art. But now as you will also see, with a computational
social science, we’re really converting the social studies into a science. We are able to make predictions of 85, 95%
accuracy with what happens if you take this bird eye view and look at society as a whole
so that’s why it’s going computational social science. Let’s look a little bit deeper into this
well, highest level of of emergence in this in this pyramid, social emergence
how can we think of that? so I said the social sciences is more like
studying one human, right? it’s it’s not individual psychology for example
it’s it’s the social so we look at an aggregate How can we think about this aggregate? Well this aggregate actually has kind of like
its own behaviors, it has its own laws that it functions by and a social scientist we
take this bird eye view and we look at this super organism that we call society or societies
and different social scientists look at different aspects of it
some might study how this super organism nurtures its needs
its its demands and how it supplies for its demands this are called economists
and others see for example how this super organism governs itself right
well its kinda like the government structure ,the political scientists for example
other look at kind of like the quirks anthropologist, sociologist look at how the quirks how it
behaves, what it actually does how it actully works inside
these behavioral patterns and and then we study this from a bird’s eye view and this
is this idea of social emergence to give an idea because social science is a little bit
difficult to think about it because we’re part of it
think about an ant colony so an ant colony is basically a social structure
this is an example taken from from one of my favorite books called Godel, Escher, and
Bach I recommended if you’re interested in reading something on the side
Godel is the biggest logician basically that we had Escher is an artist a painter and Bach
the musician, Johann Sebastian Bach right and its a very beautiful book that combines
all together so what Hofstadter explains in this book is
an example of an ant colony and the anteater the animal the anteater basically communicates
with the ant colony so the anteater calls the ant colony, he calls
the ant colony, aunt Hillary pun intended so its an ant colony that lives
on a hill so aunt Hillary for him is like one being
now the ant colony actually consists of individual ants but for the anteater is kind of like
a dialogue with not with the individual ants but with the ant colony
and the ant colony sometimes extends to the anteater gives him some parts she wants to
get rid of so aunt Hillary says well can you lick here
I want to get rid of these ants and the ant, they have a symbiosis they live together they
actually help each other out right so in one point the anteater says well all the ants
in Aunt Hillary are as dumb as can be but there are teams on higher levels whose members
are not ants, but teams on lower levels the thoughts, the thoughts in aunt Hillary
emerge from the manipulation of symbols composed of signals composed of teams composed of lower
level teams, all the way down to ants and as a result of this emergence of these
different levels of teams within teams within structures from ants there is on a high level
call this emergence phenomena called aunt Hillary is actually one of the, well if you
look at it and you think well what does aunt Hillary think?
and how does she think well these are the structures of the different ants and thats
what the anteater discusses so that’s how you can think about then society
a society emerges from individuals and different kinds of organization with different motivations
that we study in different social science disciplines
and thats what we take, we take a bird’s eye view right
and actually the total of that is often much more powerful than even the sum of the parts
so for example here I give you another example from biology and again its easy to imagine
it’s biology look at that thats a flock of starlings
basically trying to fight a Falcon so the Falcon, a much more powerful bird is attacking
the starlings and the starlings go into a social formation in order to confuse and intimidate
this Falcon and so have a look of this little video In the social sciences therefore its
quite intuitive as the idea of social emergences at the heart of all the social sciences alright
so that the sort of what we are trying to figure out, thats what we are trying to understand
when we do, especially we being kinda the ants being part of this higher formation you’re
trying to figure out what this actually this higher formation is all about right
trying to take a bird’s eye view of something that we ourselves are part of
it’s an interesting very complex complex problem but if you think about all of the big scientist
and influence influential thinkers have actually pointed to that thats thats the major problem
for example lets start with philosophy so originally all scientists are philosophers
right umm Immanuel Kant, the big Immanuel Kant he
was fascinated by the problem again as well he said for example it seems like marriages,
for example and Kant is the philosopher of the free will, he said, marriages, birth and
death are not subject to any rule, since the free will has large influence on them so that
was his thing right? the free will and its your free will who you want to marry for example
still, the annual tables of the large countries show continuous preservation
so its kind of like that baffled him how can it be that we all have a free will but then
if you just look at the bird’s eye view you can actually make predictions on a higher
level of how many people will get married here or there and on this higher level we
can then look for laws, rules, patterns and do social science
sociology as well Durkheim the founder of all of the founding fathers of sociology he
was fascinated with that one of his most influential books was on suicide and suicide thats kind
of like the last free will decision you can take
you basically take the free will decision to take yourself out
now, he was also fascinated with the idea that actually you could predict how many people
would commit suicide in this city next month so
it is not like an individual would say like oh wait we’re not going to comply the statistics
I better commit suicide on Tuesday and not on Thursday so in order
NO right it an acclimation of free will but still if you take a bird’s eye view from society
there are very stable patterns which he called a collective current and each
of them contains each of us contains only a spark of this collective current
The same economics of course Adam Smith and his invisible hand so literally
he said he the individual intends intends only his own gain so we are really egoistic,
we only intend our own gain and he is in this as many other cases led
by an invisible hand to promote and end which has no part of his intention
we dont try to be good to society by pursuing his own interests, his own egoistic
interest he frequently promotes that of society the global interest more effectively than
when he really intends to promote it so thats the kinda like the idea was really
preplexing and made Adam Smith quite famous this observation right?
we are all egoistic pursuing our own interests but then all together on a higher level it’s
not like society it promotes the good for everybody right
market mechanisms and so thats again it shows you in a higher level there are different
kind of laws that we can study and economics actually studies that, it studies on a higher
level this social this social structures what else do we have oh political science,
political science another social science discipline Rousseau so Rousseau famously made the point
that the individual will, like my will my individual volunteer particuliere and your
individual will your volunteer particuliere if we sum them up, we get the total will so
he called it volonte de tous so I want to drive fast on the freeway, you
want to drive fast on the freeway so if you summed that up lets have no speed limit on
freeways but then Rousseau said well actually that’s
not how it works because if you really think about that then I think about what, together
with us both on the freeway with more oh no thats thats very dangerous lets have a speed
limit so what we want what he called volonte generale, the general will is different from
what I want plus what you want so the volonte generale is different then the volonte de
tous the general will is different from the total
will if you just sum up the individual wills we
might get to one conclusion But if you really think about it
you know you want something different what we want is different then from what I want
and plus what you want and this is because of the interactions on a higher level their
interactions and we have to consider these interactions and we will model them as well
in this course these kinds of interactions we will look at
networks, at computer simulations at digital footprints that help us to see these interactions
and then see on a higher level actually what emerges and thats what we’re interested in
the social sciences also making this differences what emerges on a higher level
Alright what else do we have here we have philosophy sociology economics political science
oh oh this gentleman here everybody says he is theirs
sociologists economics politically was like in all of the social science disciplines
very deep thinker independent from his political ambitions as
a social scientist profoundly deep thinker and it would call in Marx observe was that
at one point merely merely quantitative differences to keep on pushing keep on pushing keep on
pushing with something you make it bigger bigger bigger beyond a certain point pass
into qualitative changes so thats the definition of emergence
something that is qualitatively different after a certain point
more of the same is not going to continue to just be more of the same its its really
different so that’s that’s called the metaphysical principle of dialectics but basically it says
this idea of social emergences so that’s what we are interested in studying
right in society and therefore we need some some really powerful techniques in order to
so because society is not and that’s that’s the usual case how we would say society works
its an i.i.d now identity independent identity distributed individuals we’re not society
where it looks like this its a very complex inderdependent medium diversity
we are all not completely different but we are also not not the same some of some things
are different some things are the same we are also not all completely connected. some are connected but not to others it’s
like this interesting in between where we are we cannot go with the averages is just
going like it is a bunch of individuals we makes averages because the inderdependencies
often among us we can’t also go with only a few because there
are many variables that actually influence all of that that you see here and therefore
yes we need to make this third big advance as beiber would say and computational tools
help us a lot to understand entertain better this this complexity and social emergence. An ant is pretty stupid
It doesn’t have much of a brain, no will, no plan, and yet many ants together are smart
An ant colony can construct complex structures Some colonies keep farms of fungi, others
take care of cattle. They can wage war or defend themselves
How is this possible? How can a bunch of stupid things do smart
things together? This phenomenon is called emergence, and it’s
one of the most fascinating and mysterious features of our universe
In a nutshell, it describes small things forming bigger things that have different properties
than the sum of their parts Emergence is complexity arising from simplicity,
and emergence is everywhere. What does Computational Social Science cover? How does it fit in within the broader picture
picture of the scientific method, especially the scientific method applied to the social
sciences? Well, the scientific method basically is based
on three legs, that’s how you can broadly think about it. The first is empirical, we observe reality,
we make observations, we collect data, or we just look out. An ethnographer just looks and tries to make
observations. The other one, kind of the other extreme is
the world of thoughts, is theoretical. There, we can make up worlds that don’t
even exist in empirical reality. But in theory we want the world to be a better
place so we make theories about societies that don’t even exist, that we can’t even
observe. And the one in between, kind of like the overleg
is what we call analytical. So let’s look at some of the most famous
influential scientists for example, empirical. Darwin, Charles Darwin with evolution he started
with empirical evidence. So he went on this ship, called the Beagle,
and for five years he basically was on the ship making observations. A lot in South America, he famously went to
the Galapagos Islands and saw that finches and different islands of peaks with different
length. He saw a lot of fossil evidence from the ocean
in the mountains and he thought about how long that must have taken and with this empirical
evidence then he got it all together and wrote these big books that basically try to convince
people based on a lot of observations that he made. Now, the mathematical theory of evolution
was developed later, significantly later. Some thirty or forty years later other people
got together and really distilled all these big books into some very neat equations that
actually show you how concisely you can think about the survival of the fittest and how
actually evolution works. So me as a social scientist, when I work with
evolutionary theory, I just use these equations and then I see how social systems evolve over
time. I don’t go back to the finches and the empirical
evidence. We kind of distilled a theory out of all this
empirical evidence that Darwin so painstakingly collected over his lifetime. So, the other way around, another very influential
scientist would be Einstein. So Einstein went on that from the other way
around so he just started out with theory, he didn’t really look at reality so much
he just said, well there were a few observations that were known at the time, so for example
Einstein said “So guys, if you really say that the speed of light is always constant,
that light travels at a constant speed, if you say that, because speed is, I don’t
know, miles per hour right that’s a speed so there’s this mile space and there is
time per hour right there’s space and time then something has to happen to space time.” And so he took the train, he rode it through,
he derived from that that actually it must be a space time continuum, that it must curve,
and so forth. That time is just another dimension of space,
is just the fourth dimension. And he just rode this train all the way through. He said, well if that first thing is correct
then all this other stuff has to be correct too. And that’s the theory of relativity. Now when Einstein published that, it’s not
like people said “whoa, what a genius” instead it was like “well that’s just
one idea, we don’t know if that’s correct or not it’s just this idea of this young
guy” who actually worked in a patent office so he wasn’t even a professor or something. And then it took some years later that they
empirically prove that he was correct. So experimental physicist, Einstein being
a theoretical physicist, then came and years later, again fifty years later, the more concrete
proofs then actually showed that Einstein was correct. And only that made him instantly famous. He wasn’t famous before that. In 1919 there was one of the first proofs
and people had to wait a solar eclipse, and went down to Brazil and Africa to look at
the solar eclipse and with that could then prove that Einstein’s theory is actually
correct and only that then made him famous. Interestingly enough one of the assistant’s
of Einstein once then asked him what his reaction would have been if these empirical tests from
Brazil and from Africa, if they would not have confirmed his theory. And Einstein famously said “Well, then I
would have felt very sorry for the dear Lord, because the theory is correct.” So, wow, he was so convinced of his theory
but we needed the empirical proof to really see. Like, okay, he believed in it but all of us
were only convinced once we saw this empirical proof so it went from theory to empirical
where Darwin went from empirical to theory so these are the two ways we can think about
that. In general if you now translated to methods
what Einstein did was kind of like work with a problem with a few variables. So e as MC squared has very few variables
and you can think about this as math, how the relation is between e energy and the mass
and the speed of light and e is MC squared. Okay, three variable problem, problem of simplicity. And the other one is kind of like statistics,
stats, so we have a lot of variables, Darwin collected a lot of evidence, but then what
he did mentally was kind of like curve fitting. Nowadays we do that with machine learning,
we have a lot of data points and we try to see well what are some patterns, what are
some things. And Darwin did the same thing mentally like
what do the finches and Galapagos have in common with this fossil evidence in the mountains
and how does it actually all fit together and then from this evidence he derived some
general tendency which was his ideas about evolution. So we can think about is as math and stats. Now when we do computational approaches especially
computational social sciences well we basically cover all three of the three fields. So empirical we often get from our what’s
called big data from the digital footprint we often get it. The digital footprint that we leave behind
with every digital step we take. So when you are on the internet, when you
walk around with your mobile phone you are leaving a digital trace behind and then we
often use that in order to do social science. And that tells us what happened. You record in real time what happened in your
social interaction, in your mobility, and so forth. Then the theoretical part of it is, we can
simulate it. So independent from the empirical observations,
we can just create new. New social systems, here for example I created
a city and here we explore why it happened. So we see something happened and we are not
sure why so we create many theoretical worlds and then can test, why did it happen like
it happened? So that is an interesting theoretical question,
it explains to us the why, not the what. But then we try in theory what’s the why. And then the one in between, the analytical,
is how it happened. And computational social science also covers
all of these aspects of the social sciences. So let us go a little bit more formal about
these ideas that I just presented, digging a little bit deeper. And so okay, we have Darwin, Darwin started
with his observation of a phenomena of a system and from that he collected data. He collected it just by observation but you
can also ask questions, or build the data so you can make a survey, or you can even
simulate the data and get the data that way through an experiment. You just collect data in some shape or form. And from that then you abstract a model, for
example in modern days it would be machine learning, curve fitting, you make a regression
model to try and fit a line or data to your curve. So you look at, does it go up, does it go
down, you’re going to describe it a little bit more succinctly. Right, I don’t want to extract all of these
data points, I cannot make sense of them, so I want to break it down and have a model
and have an abstraction of reality that makes it easier to think and makes it easier to
extrapolate maybe make predictions with it. And you have your hypothesis, kind of like
your stylized facts and then you test your hypothesis at the end and you create your
theory. So you can reject your hypothesis, or you
can not reject your hypothesis and what you do with hypothesis testing is you try to see
if the model that you have, the data that you have, is just random. Often that’s what you try to do, or if there’s
a little bit something more to it. If it’s not just a random thing, if there
is a mechanism or something that really created this data. That’s the mechanism you then theorize about. So this goes from the phenomena, all the way
over to theory. Formally this is called induction, you induce. You induce so you go from data to theory. So then in cases where data is abundantly
available and good theory is hard to come by, then you usually do induction. Which you often do in computational social
science too because with the digital footprint we have big, big data so we often have a lot
of data. These companies, these big companies Google,
Facebook, Apple, and so forth they then throw machine learning at it, Amazon, they throw
machine learning at it. And with that, they come up with different
models which then help them to predict for example consumer behavior. Now Einstein went the other way around. So Einstein started with a theory, and from
that he formulated some hypothesis. He actually already delivered the hypothesis
he said “well if my theory is correct, this and this and this and this must hold true
if you guys test it. I don’t have time for the testing, I’m
already convinced they don’t even need to test it.” But if any of these things fail, the entire
theory is out of the window, that’s what’s also so strong about it is he says “well
all of that will hold. I bet, I bet you all of it will hold.” So when you hear the word hypothesis it’s
basically, when you hear the word hypothesis you can replace it with the word bet. It’s a bet. I bet you, and you actually also formulate
the direction of it, so I bet you that it will be like this. And then you can lose the bets. That means you can reject the hypothesis. So that’s actually what a hypothesis is,
you can replace it with the word bets. Interestingly enough we can never prove a
hypothesis we can only disprove it so you can only lose bets so that’s the idea of
hypothesis. So Einstein formulated hypothesis and said
“if any of these things are rejected, then all the entire thing must fail but I’m sure
all of it will hold. I’m convinced so I don’t even have to
check it out. You guys, go and check it out I have other
things to do now. I want to continue looking for the world formula”
that he famously looked for until he passed on. So we formulate this hypothesis, then other
people came up with some models, they made some models some abstractions they came up
with the idea. Well, the solar eclipse and so forth, so that’s
how we can abstract it, that’s how we can test it. Then they went down to Brazil and to Africa
and collected this data during the solar eclipse. And with this then, well, they confirmed the
phenomena, the theory of relativity. So Einstein went this way around. This is called deduction so you deduce from
theory to data. Well so if thought is abundantly available,
maybe you have a lot of Einstein’s, right, and good data is scarce to come by, you have
to wait for a solar eclipse, then you go this way. And also in computational social science nowadays
we have a lot of theories because we can simulate them, we have a computer simulation with a
lot of competing theories which we can then test. So often we also go this other way around. We often can go from theory then to data,
we can do both of it together. So now if we reorganize that a little bit,
and in this class we will reorganize it to see a more modern approach to it. We not only do induction and deduction, that’s
how it was traditionally told, but the truth is you can go through that in many many different
ways. So it’s more like, when you think about
it, it’s a circle of different components. And it’s you, it’s the choice of the researcher,
how you walk through this maze from one end to the other. Some are more recommended than others, but
actually the sky is the limit here. We surely have to do a much better job with
doing social science. Right, that’s kind of like the challenge
that we face. So summing up everything I said here, that
was a big introduction as I said. Well, social science can handle simple or
large problems but not realistic ones. Society is not simple and society is not large. They’re averages. Like realistically we are somewhere in between
and we need this third advance of science as Weaver said. And I also said social science is the most
complex of all sciences. They’re kind of like based on the pyramid
of all the other sciences below, which are part of it. Right, which kind of like influence it as
well. But on top, is a many degrees of freedom,
so many degrees of freedom, always changing, the stationarity is always changing. It is extremely complex to do social sciences. All right. So if you do physics, for example, just to
give an example, it’s great. The sun was rising yesterday and it has been
rising for a long time and we can predict that tomorrow it will rise again. Now in social science, it’s more like you
know, you have in physics, you have the five forces, for example. And they also stay the same. That’s why the sun is rising again, right. So for example, gravity is a force. In social science, it’s more like, we take
one of these forces out like gravity and we replace it with something completely new. Call it cellphones, something that wasn’t
around, just not so many years back. And you kind of like have to start all over
again, right. So if the sun is still rising, well I don’t
know, we have this new force. We got rid of the old one. Well, you have to start and start and start
and it’s really complex. It’s quickly changing. Many interrelate parts. It’s the most complex of all sciences and
that’s why traditionally we have not been able to make good predictions. But we’re starting to and getting to that. And we are pretty bad in reproducing reliable
results. So actually, we cannot handle the realistic
problem. It’s the most complex of sciences and we’re
pretty bad at it. So yes, computational social science to the
rescue, that’s the best bet that we have right now. We put all of that in computers and we say
well, computers, digital tools, help us. Maybe we can get a better grip of doing social
science. And that’s why we go through the circle
and I will introduce in this segment. We will go through that in this course. So first of all, we have our empirical evidence. We have big data, our digital footprint – I
already talked about – and that’s how you can imagine how it fits into our framework
of the class. We have our analytical tools. We will talk about social network analysis,
machine learning, natural language processing, Artificial Intelligence in general. And we have theories. We do computer simulations with that. So basically that’s how we fill it out. Well, we’re not gonna do red wine. But, guys, we intuitively always do a glass
of red wine theorizing. We always do it; I do it; everybody does it;
and what I’m trying to tell you here – this is such as a big challenge that everything
is an outcome. Just sitting there and this is an amazing
computational machine that we have here. Just sitting here, reflecting, we often find
things intuitively that we couldn’t find in any other way. So it’s good for you to reflect. You don’t need the red wine. It’s just, just to give it a name. But it’s good for you to reflect, and also
follow your gut sometimes. I often do that; all of us often do that. So we do enough of that. We won’t specifically talk about it in this
course because there’s no really systematic way you can foster it. But of course, that would also be included,
and basically, close up the circle. Now, how you go about going through the circle,
as I said, there might be some better ways and some worse ways to go through the circle. One thing I can assure you for sure: we don’t
know the best way of doing it. We really don’t know. And that’s not, we can actually prove that
we don’t know. And that goes back to a very famous discussion
that we have been having in science, maybe one of the most profound ones. And that was between a mathematician, the
most influential mathematician of the 20th century – David Hilbert – the real Hilbert,
you know, not like me. My great great distance, I got him a name
say, relative. And David Hilbert, he once asked the question. We can just call that entscheidungs-problem,
in German. So he wants us to question: are we able to
actually automize, automate the process of knowledge creation? And he kind of like found an algorithm, a
procedure, a recipe that we can just rattle through and then create knowledge. Can we do that? We can prove mathematical theorem and all
of that just by automating it. We call it the entscheidungs-problem because
it’s a decision that’s called the decision problem, German for decision problem. We have to decide when we found the best way
of doing this or that thing, right. We have the knowledge of how to do it the
best way. And he posed this question in 1928, very influential
question. Then came along this guy here, Kurt Gödel,
only a few years later. And said he destroyed Hilbert’s dream. He destroyed Hilbert’s dream and said no,
it’s not possible. We will never, that’s called the incompleteness
theorem, we will never ever be able to show that this is the best way of doing things. We don’t know if there might not be a better
way. That means there’s no way we can kind of
like optimize finding absolute the best way of doing things. And that is proven mathematically. So basically because there will always be
paradoxes, inconsistency, whatever frame of reference you show, that may good, the most
influential logician that we have. And for our purposes, what it shows is I cannot
teach you the best way of producing knowledge and nobody can. So you have this framework. Throw at it what you can. Go add it, go through it. Some ways might be better than others. I talked about it. Some ways are kind of like nah, I don’t.. We have shown that that doesn’t lead to
very good results. But any other way, creative way, how you can
go through this circle is very welcomed. And so what computational social science is
about, the methods that we have, and that we will develop. It just gives you more variety also for methods,
many different ways you can walk through the circle, which is the scientific method that
we’re revisiting here. And you need many different tools so you can
have one tool. It’s kind of like a tool like a hammer. So you have one scientific method that you
pursue and you have this hammer but once you have a hammer, everything looks like a nail. And you go there hammering for 150 years going
back to
the replication crisis. For 150 years, he used the hammer and everything… He used this hammer and everything looks like
a nail, right. But a hammer, one hammer might not always
be useful, right. Sometimes reality, she has funny ways of hiding,
right. You might need a bomb, maybe even an atom
bomb because even after 150 years of hammering, you will
not be able to break through to reveal the truth hidden there, you know, underneath the
rock. And sometimes with a hammer you might destroy
way too much. You might need a feather, you know, might
be the feather being very careful so you can still see what you need. So what this course is about is to show you
some modern hammers, and feathers, and bombs you can throw it at. You can throw it at the scientific methods that help us to reveal
knowledge, always keeping in mind that we don’t know what’s the best way of producing knowledge.