Trey Ideker, Ph.D. (University of California at San Diego): The Cancer Cell Map Initiative

Trey Ideker, Ph.D. (University of California at San Diego): The Cancer Cell Map Initiative

October 24, 2019 0 By Stanley Isaacs


>>At cbiit.nci.nih.gov and you can find information
about future speakers on that site and by following us on Twitter at
NCI_NCIP. Today, we are very happy to welcome Dr. Trey Ideker who is a professor of genetics in the Department of Medicine at the
University of California at San Diego and he’s head of the program in
Genomes and Networks at the UCSD Moores Cancer
Center. The title of his presentation today is “The Cancer Cell Map Initiative.” And with that, I’ll go ahead and turn the floor over to
Dr. Ideker.>>OK. Thanks very much, Julie, for that kind
introduction. It’s a pleasure giving this presentation today and
what I want to use the floor today to talk about is an initiative that our cancer center announced about six
months ago together with the cancer center up UC San Francisco. But then as I’ll talk about today, this is sort of north-
south California act, this is something we’re definitely hoping to
expand, and so please tell me what you think. The initiative is called the Cancer Cell Map
Initiative and it’s based on sort of the following premise. As we all know — OK, good. As we all know, there are a lot of cancer genomes
out there at the moment depending on how you count and
how you define a cancer genome, whether it’s an exome or the whole genome or
some part thereof. It’s safe to say there is about 10 to the fourth tumor genomes online that one can
access. And if it’s a program like Cancer Genome Atlas at
NCI, they’re relatively easy to access. It’s just a few clicks away for at least some levels of
data. The real challenge, however, that almost every
paper by the TCGA or the sister organization, International Cancer
Genome Consortium or ICGC, all of these papers lament or at least observe this
challenge of heterogeneity. And I’ll talk here about the challenge of heterogeneity across the patient cohort. Of course, there’s also a lot of interest these days
in the challenge of heterogeneity within a single patient. To illustrate this, many of you online or in the room
that had seen these kinds of long tails of cancer mutations before but this is
my version. I have simply taken an arbitrarily chosen patient out of the TCGA breast cancer cohort or one of
them. And here we’re simply looking at systematic
mutated genes in her tumor, there happens to be 25 of them that’s actually on
the low side for this cohort. But be that as it may, we’ve now sorted those
mutated genes by order of frequency with which those mutations are found
in the cohort of about a thousand women at large. So, the most frequent mutated gene is GATA3 but then you see very quickly what happens with
this long tail is that by the time you’re to the third, fourth or fifth most frequently mutated gene, you’re looking at a relatively rare event. And by the time you’re down to gene 20 or so, you’re essentially looking at genes that have never
been found to be mutated before in any patient, or if they have, it’s really just a very small number of patient. Now, interestingly, I have marked here four genes
that were mutated somatically in this patient about which much or at least some
is known in terms of their involvement in cancer. And anecdotally at least notice there’s really no
relation between the genes that we know in cancer here and their frequency
mutation in the population at large although certainly GATA3 is a known
cancer pain and it’s the most reclaimed opinion. And so the question then really is, what about
these other genes, are they functional, are they drivers, or are they’re passengers in this working tumor? Another way to look at this is in the following
matrix. And so here is another cohort, I’ve switched to the
ovarian tumor cohort in the TCGA which a few years ago when this slide
was made had about 351 ovarian tumor exomes in it. So look at heterogeneity another way. What I simply done here is zoom in on
chromosome 17 in gene resolution. So, we’re not looking at nucleotide resolution here
with some other resolution. It’s gene by gene ordered along chromosome 17 and the dot blot here represent — each dot represents the observation that the gene on the column was mutated somatically in the
ovarian tumor patient on the row. The vertical blue line there in the center of this dot
blot or matrix is not there because I went in and draw — and drew a blue line, in PowerPoint it’s there
because gene 170 or so, ordered along chromosome 17 happens to be
P53. And that is the most frequently altered gene in
many cancers, and certainly ovarian tumors are no exception. About 84% of these tumors have a TP53 event. Other than that though, it’s very difficult, I think you’ll
agree, to see any structure in these fields of — or this field
of rare event. So that really is the problem. No two patient tumors look alike when it comes to
somatic mutations. This matrix by the way would look very differently if I was showing you another layer of all these data
besides mutations. So, for instance, if I was looking at mRNA
expression data, we see a very full matrix with lots of similarities that
could be clustered out among patients and so on. Here, with somatic mutations, it’s much more
difficult. Now, what’s — what if anything is the hypothetical solution to this? Well, the potential solution is certainly not new in
cancer biology, it’s a very old idea and that is that these
heterogeneous mutations in a patient cohort or rare mutations in a single
patient can be at least partially explained by the observation that
cancer is not a disease of single mutated nucleotides or even single
genes. It’s a disease like other complex diseases of
pathways and pathway level alterations. So here is just one example of where a TCGA or
other cancer on this paper has gone in and used powerfully
pathway knowledge of cancer pathways to organize mutations found in
a patient cohort. This happens to be a glioblastoma cohort
published just a few years ago, the citation is at the bottom of the slide. The point it’s making is that a lot of glioblastoma
patients, it would already known before the study, are
alternate EGFR. That’s shown here sort of top center of the slide,
57% of this cohort. However, if one looks in the neighborhood of
EGFR in pathway phase, that is to say, really just downstream of EGFR
here, but it could for another mutation be upstream, then we can pick up many other events that might otherwise have been considered rare
but become common events in the context of this pathway map, in that all these patients were hitting the same
region of pathway space. So here, you can see on this paper that a gene
like RAS or BRAF really only mutated in 1 or 2% of patients
here but can be related by pathways to more dominant mutations like
EGRF. And so the final slide I’d like to show by way of
introduction to this Cancer Cell Map Initiative is this slide which
reminds us that for the reason of heterogeneity perhaps, as well as other reasons we can talk about, most
of the cancer genes that we and others experts in the cancer field at large
considered to be cancer genes are simply or have not been
found by sequencing originally. Some of them are confirmed by sequencing. A few were found for the first time but most were
not. There are different expert list of cancer genes and
the community, as you may know, doesn’t really have a single list yet of agreed upon cancer genes, but one can look say at the first mobile theme list
of cancer genes published and is — review cancer genome landscapes. One can look at cancer gene repositories like the Sanger Center’s cancer gene census as I’m looking at here on this slide. And there was other list one can look at. Qualitatively what I’m showing here applies to all of
these lists. And what you can see here is that tumor sequence
analysis has found at least in terms of a cancer gene census, 67 of these
genes for the first time. This is an analysis where you simply go back and
look historically through PubMed at the first reports of a cancer
gene or a gene being associated with cancer. You can see the real winner here in terms of
methods for identifying cancer genes for the first time if
gene functional studies like knockout, knockdown, knockin type studies. So most of — or I — at least the largest minority I
should say of cancer genes in this particular database were found for the first
time to be associated with cancer by those gene manipulation type
technology. So, in light of all of this, last year in the middle of
the year or so, my colleague Nevan Krogan and I and our cancer
center directors at UCSD and UCSF wrote a review article where we announced something we call the
Cancer Cell Map Initiative whose goal is really centered around this idea of full
knowledge of networks and pathways being super critical. Not just sort of knife knowledge but the statement
and belief is going to be fairly critical to ultimate interpretation of
cancer genomics. So, like good geneticists, we are, again, as
everyone is, trying to go from patient genotype to phenotype, so the horizontal axis if you like on the schematic. In the case of cancer genomic genotype is
increasingly the full cancer genome, of course, it might also mean the exome, it might also mean a panel like Foundation One
but we’re talking about the somatic and eventually germline
alterations found in that patient. And phenotype relates to disease diagnosis and
outcome. But you’d like to associate those and lots of
people would like to do that. But what’s here is to add a middle layer. The middle layer being as comprehensive and
unbiased, a knowledge base as one can assemble around hallmark cancer networks and pathways. So, that then engenders really two sub-challenges
with the — here at the bottom of the slide. One is, how does one, given some knowledge of
cancer pathways and networks encoded systematically, how does
one begin to use that knowledge to interpret patient genomes in terms of outcomes
and ultimately therapies? Two, however, is an equally interesting question. How does one begin to map and accumulate this
network data in the first place? So, for the remainder of my talk here, I want to
focus on both ideas. First, starting with, let’s assume we have some
knowledge and in fact as I’ll show you we do have some
knowledge already about cancer networks that’s been encoded
systematically. How do we begin to use that? And then about halfway through my talk I’ll shift
gears and talk about how we think one can begin to
systematically map or further our systematic knowledge of these
pathways. So in terms of existing network data that one can
use to get started, I’m sure many of you are already familiar with the
fact that there are public databases of molecular
interactions of various kinds. Some of those are literature curated, meaning basically using armies of individuals or at
least a few individuals to go back to literature, sometimes assisted by natural language processing algorithms to go
back to a literature and curate interactions. More frequently these days, however, we’re seeing less reliance on literature curation
and more reliance on systematic unbiased studies. So the kinds of studies one might do would be to
work systemically out biochemical or protein-protein interaction
networks, would be to work out protein-DNA networks. Metabolic network are still much interrogated
systematically actively these days but there are large repositories online of these well
charts. Lacking any kind of physical, genetic or
transcriptional information about networks, one can always derive networks
from co-expression. So attracting pairs of genes that are over and
underexpressed together over many, many patients or samples or times points is another reasonable way of
assembling networks. If one goes to databases like BioGRID, that’s a Canadian database of interaction. IntAct, that’s the European EVI version. HPRD, that’s the Johns Hopkins out of Akhilesh
Pandey’s lab. It stands for Human Protein Reference Database. DIP is out of UCLA, so on and so forth. One can go to these databases. And this is not an exhaustive list. And for human cells of some kind, one can find around about a million interactions
between pairs of genes or proteins there. Now, as of the present time, these interaction
databases really have very little in terms of tissue specific, cell line specific,
contact specific, genotype specific and so on information about
interactions. We really have to think about these interactions as
possibilities of interaction between pairs of human proteins that someone
has reported in some condition or context. It’s not unlike, however, the way the GenBank was
for many, many years before we have complete human and
other species genomes. We really just searched, and we still do, the non-
redundant part of GenBank in our database, which really was just an
amalgamation of every sequence we knew of. And that really is what the state of these protein interaction databases are for
today. But you do get — the main point here is you do get 10 to the sixth or so interactions that someone has measured
sometime and someplace. And those, as I’m going to hope to convince you
shortly, are useful. The way we’ve attended to make use of those is to
recognize similarities between somatic tumor genomes that would not otherwise be recognized at the
gene level. To illustrate this, I want to take back ovarian tumor
cohort again of 351 patients and show you how we can cluster
that cohort into patient subtypes that were not clusterable or
recognizable without the use of networks. This first work I’m talking about was published now
about three years ago in Nature Methods is this Hofree et al publication. It’s built on a number of other works, integrating
networks and cancer datasets out of my lab or establishing basic methodology to
do that. Those are shown below. But the seminal reference here really is the Nature
Methods paper. So, if you haven’t seen that, look at that first. That’s Matan Hofree in the lower left hand corner
here and the other pictures are faces of people who have contributed seminal works before or
after. So, the idea here is if we simply try to cluster those
somatic tumor genomes in the ovarian cancer cohorts, out of the box, what you get is the cluster matrix shown over here
at left. If it’s — It’s hard to interpret simply because what’s happening here is all patients are
getting placed in the same cluster. Why? Because the algorithm really finds no other features than P53 that — to approach proclamation that any two patients
share. And so, it basically puts — I suppose based on the strength of that it puts them all at the same cluster. However, if one transforms a patient somatic
genome from its list of mutated genes, through molecular network
knowledge, you can get the cluster gram shown on the right
here where you can see that patients are in fact placed into robust clusters,
here four clusters, which will become the fine subtypes I’ll talk about in
a minute. Now, how does this go down? First of all, we’ve accessed these public
databases of interactions I just talked about. Here we try three different ones and qualitatively the results don’t change a whole
lot when you switch this database source from
Pathway Commons to HumanNet to String. They change a little bit of course. Part of the reason the result may be stable may be
due to the method but part of the reason may simply be due to the fact these databases have an
uncharacterized amount of overlap in them. So, using any one of these databases we get as I
said before about a half million to a million links between
human protein interactions. Now, how do we use these to transform the patient
genome? Very simply put, we take a list of genes that are
mutated in a patient and instead, return a score for every gene in the
human genome based on how close in the network that gene is to altered
genes. So if the gene is altered originally, it’s high scoring as it should have been before
transformation. But if the gene is one step upstream or one step
downstream in a signaling pathway or on a transcriptional
network or a neighboring approaching complex, maybe that gene encodes a neighboring subunit, even though those genes weren’t mutated in this
patient, they feel the heat and their score reflects that. So, the idea now is every gene has a non-bureau
score typically based on its network or pathway distance to mutations. This is the key idea behind this whole first part of
my talk. So to illustrate this, look at the following slide,
which is just a mockup or a toy example of a network, of course the full network as I said is a half million
to a million links. This is just a hand drawn concocted example
where we have a part of the network impacted here with two patient
genotypes, a yellow genotype or with the somatically mutated genes and the blue genotype for the second patient. And as I showed you in the ovarian cohort, most of
the time, these genes do not overlap. They are altered in any given set of two patients. Maybe there is one gene like P53 that is hitting
both and that gene will be colored in both patients here
shown as a green node down here at the center bottom of the network. However, if we now perform this network
transformation operation I just mentioned, and in the literature it’s been called
network diffusion or network propagation, I should head or pause for
a minute and point out we did not invent this technique by
any means. In fact, it wasn’t even invented in biology at all but it
was first borrowed for biology as far as we’re aware by [inaudible]
group at Tel-Aviv University, and this Vanunu et al publication in 2010. It’s also been used by Ben Raphael’s group quite
successfully as first suggested in Vandin et al publication in JCB. Here we’re simply using these techniques to
cluster patients and to find subtypes in a way that hadn’t been
done before. So the idea of network propagation or network
smoothing or diffusion is that each mutated gene in a patient is a source of
heat for a heat diffusion algorithm out of
thermodynamics, that over many cycles of simulation will then
spread to its neighbors. And that’s how upstream or downstream genes
and pathway space will start to feel the heat of that original mutation. So that after propagation — and this thing comes to steady state, one now gets in this particular example a large
region of the network encompassing genes that although
they were not mutated in either patient directly are near mutations in both,
that again, that is the key concepts of how all these works. And allows you, to raise yourself above the gene
level now and look systematically at the network level. Now, I showed you how that allowed us to find
similarities or allow the algorithm to find similarities between
patients that put them into well-defined clusters. Here is one such cluster of patients. If you look at the network that’s responsible for
clustering these patients, that’s shown in the upper right of the slide. The idea here is that there are a very few genes. In fact, there’s only one gene here that hit in many patients and that’s shown here. That’s [inaudible] and we can talk about [inaudible]
later. Some people don’t think it’s a cancer gene at all. We actually do because of network context. But other than that gene, there is very few genes
here that’s hit in any more than one or two patients. Really what you’re looking at though is a set of
patients that are mutated. All of which are mutated somewhere in this view. And so some — So for instance down here in the lower left of this e network we have the fibroblast growth
factor and fibroblast growth factor receptor family, FGF
and FGFR genes. Any one of those genes is hit rarely in this cohort,
however, if you’d consider the gene family or the — sort of the pathway of FGFR signaling which picks
up all of these genes and interconnects them as you see here, now we
have a common event, and that’s exactly the principle that we’re using to
cluster these patients. So, the other reason that these subtypes you can
define using somatic alterations plus networks. The other reason that that’s interesting is we do
begin to see associations between these subtypes and prognostic and
predicted signatures. So what you’re seeing here at the left part of the
slide are the survival curves that go with the four clusters of ovarian tumors I just
showed you. And in fact, the network I just showed you is the
network that aggregate somatic mutations for this
aggressive subtype 1. Notice that if you’re hit in that region of the network
or pathway space, then at least for the 64 patients in the TCGA
cohort, no one was alive past 70 months. Compare that to survival rates of other subtypes you can define this way where — and for the best surviving subtype 4, all 21 of those patients were still — or I should say 68 or so percent of those patients
were still alive. If you look at drug resistance information in here, although in general the TCGA it’s hard to look at drug outcome or therapy outcomes. For ovarian cancer, it’s a little easier because essentially every patient gets the same
standard of care which are, it says, platinum based agent. So, here we can look at the drug resistance of that
regimen in again these four subtypes. One of them actually, we didn’t have that time
information available for more than five patients so it dropped out but where you do have still significant numbers of
patients to analyze, you see the same trend. And in particular for the lower survival, subtype 1, you can explain or begin to hypothesize at least or why the low survival occurred is simply because
these patients are resistant to their therapy almost out of the box. And so — And in unpublished work, my lab is now looking into this network as a potential network that defines platinum
resistance, but that at the moment is still hypothetical or
speculative. Now in unpublished data, what we’d been able to
show is that the same trends are quite nicely validated in a
more recent cohort of women with ovarian tumors were sequenced. Part of the ICGC paper, these are Australian
women and here we’re not learning. We’re doing anymore by informatics to define
these subtypes. We’re simply using the same networks and their
corresponding subtypes that I already told you about as learned or defined from the Cancer Genome Atlas cohort, now we’re simply applying that to the ICGC newer
cohort and we see the same survival trends. Again, you drop out one subtype just because of
low sample number but for the subtypes that are supported by
significant numbers of examples and can be analyzed, you do get that prognostic
power. So we think there’s something going on here. And again, the next step is going to be to go back
to that network that defines both the poor and perhaps the good
survival subtypes, and really understand why. Now, just before I leave this example, to further
illustrate what’s going on with this network smoothing or propagation
business that you can recognize the same subtype in both
cohorts. This is the same network I showed a few slides
ago, it’s not as pretty in terms of the layout but I assure you it’s the same network. The difference here is I’ve floppily laid this out on
my own screen. And I have, however, superimposed the genes that
are mutated in the TCGA ovarian tumor cohort on top of the
genes that are independently found to be mutated in the
ICGC cohort. And so, if you didn’t have the network analysis, that’s the list of genes you would get in this network
region and you can see they overlap four genes. So there’s four genes here in green that were
mutated in tumors at both cohorts. But by and large other than those four, you’re looking at a non-overlapping set of gene
mutation. However, note, all of these are falling in the same
pathway regions and that’s why, again, you could recognize this
subtype. In passing, I’d like to point out that these ideas are of course not restricted to cancer. I think everyone on this call probably cares most
about cancer is alive but there are of course lots of people interested and productively using network analysis
techniques in a variety of complex diseases. So here is a nice collaboration that we had
primarily driven by my collaborator here at UCSD and now half
time at Rockefeller, Joe Gleeson. Joe studies neurodegeneration in — primarily in childhood disorders. This is a rare childhood disorder called hereditary
spastic paraplegia or HSP. It’s what Joe tells me is a complex Mendelian
disorder, which means that although in any one child in
pedigree, it’s likely caused by a single gene. From pedigree or to pedigree or from child to
child, it’s a different gene that’s driving their disease. Generally, that’s what’s found. So it’s in some sense had some similarities to the rare mutations I was just analyzing in cancer. Here what Joe was able to do is write down all of
the genes that have been found by individual gene association studies and gene
linkage studies. Those are the seed proteins here in blue. And then by focusing further association and
linkage studies just in the network neighborhood of these genes, first we were able to show what these genes do
impact the same network. And second of all, using that information, we were
able to focus the search for further disease genes into their network
neighborhoods. And that gives you a huge boost in power, over genome-wide association type studies where
you have to pay the multiple hypothesis testing penalties for
every gene or every test. Here, this is more of a network driven candidate
gene approach for finding disease genes. And so, all of the red highlighted nodes here are
genes that can be found to be linked to disease, but only with the increased — or at least about half of these only with the increased power that the network
provides. OK. So I’d also like to just briefly touch on the fact
that there is hardware and software infrastructure that underlies all of this. So, first of all, all of the network diagrams they
were drawing and the computations and the analysis are
assisted by the Cytoscape framework which is an open source tool that my lab and now a few other labs — I should say many other labs over the years have contributed to and continue to
build. But I also wanted to acknowledge our latest foray
which is a database of network diagrams and ultimately cancer network
subtypes that is funded as part of this NCIU 24 program. So this database is called the ndexbio — index database for network data exchange. The URL for that database is in the upper left-hand
corner here, ndexbio.org. We just released our second formal release of this
a few days ago. It’s functional but works definitely in the exponential
case of increasing up functionality, but I think already is useful and you
can begin to see the vision of where all of this could go. So the idea is there already are lots of interaction databases out there as I described. But what the big hole in the field was is databases of the networks people derived with these raw
interactions. So often when someone performs a pathway or
network analysis and finds a particular network that’s significant in
their datasets or in their study, that network ends up in, you know,
supplemental figure 7A, never again to see the light of day. We’d like to be able to, in fact, make those much more central to the primary
publication model of that work. And the first step of doing that is to put these
networks in a place where they don’t die in supplemental 7A but in fact
they live and are continuously approved and shared with
the community of investigators. So that’s the point of NDEx, it’s as much a social
networking site as it is a molecular networking site. And so you log in if you like. You don’t have to. You can search this thing anonymously. But it’s more powerful if you get a user account, begin to own networks, share them in circles. You are encouraged to share your networks with
the public but it’s certainly pre-publication for instance, there’s a requirement before any of that. Post-publication, there’s actually no requirement
for that. So you can define your circle of friends and
workgroups and push these networks forward. We have had very productive collaborations with
some of the journals that publish a lot of these networks. So Elsevier now has a Cytoscape/index upload
tool and they piloted this tool now in eight different journals. It just rolled out a few months ago. And so I think after the first year, the idea is to take
in an assessment of where that tool is and if judged successful or
depending on how it develops, roll that out in a larger suite of Elsevier papers. We’re also talking with of course other journal
publishers like [inaudible], like Faculty of 1000, and so on and so forth. My group just had a self-systems paper where we — which of course is an Elsevier journal where we
were able to use this system and it’s — I think it’s quite helped get the word out about the networks we’re publishing. OK. So now, I’m two-thirds say through my talk and
for the last third to a bit some — a bit more than that, what I want to talk about is the second issue of — I have now spent the first half of my talk to talk
about ways of using networks to translate genome to — or genotype to phenotype in cancer therapy. But now I want to talk about how one could begin
to get at this network information from a systematic point
of view in the first place. So the first way that had been developed over the
past 10, 15 years for populating network databases is
tandem mass spectrometry. My colleague up at UCSF in this Cancer Cell Map
Initiative, Nevan Krogan is a world leader in using mass
spectrometry to define biochemical interaction, the interaction
neighborhood, so that’s his experimental pipeline that’s being
shown here and a few of the instruments in his laboratory are shown next to Nevan’s mug
shot. But the idea is straightforward. You take a frequently mutated cancer gene and the
goal will be to ultimately map the interactomes of all frequently mutated cancer genes in this way. Here we’re showing BRCA2. Given that frequently mutated cancer gene coming
out of the genome project, you epitope tag that, you express that tagged
protein in a panel of cell lines. You harvest the proteins, along with the interactors
of your tagged protein. And to identify those interactors, you use the tandem mass spectrometry setup
resulting in a network neighborhood that has been then defined around that tagged
protein of interest. And of course if you’ve done this in multiple cell
lines or genetic context, you’ll get one network per context. In particular, we’re very interested in cases where there is a known point mutation. This happen as you might know especially for
oncogenes where really the frequent mutations are found to
pile up in a particular site at the protein and so one can then contemplate
making isogenic experiments where one mutates that site versus does not and defines the networks in either case. To get started here, as a sort of exposition of this
or proof of concept for cancer, we started actually with the viral
interactomes, that of HPV, with that of the human host. So as you might know, HPV is linked to cancer in
a few tumor types these days. Certainly for cervical cancer, HPV plays a role in
almost all of those. And for an increasing number of head and neck
cancers as well, HPV has been implicated. So what we set out to do was to tag each of the
HPV subunits and express those and pull those down to define their interactomes in
three different cell lines. An ovarian cancer cell line, C33A, shown here as
the red interactions, as a control the HEK293 cell line, and then the data that are just coming off the mass
spectrometers now are for the head and neck cell line new profiles. Those aren’t yet shown here. Each of the subunits of HPV is the hub or is one of
the hubs that you’re looking at here. And splayed out around those hubs are the human
interactions, protein-protein interactions with those HPV
proteins. So as analysis of this network is still ongoing
between Nevan’s lab and my lab. But just anecdotally I can tell you in passing, one of the exciting findings in these networks is
you do in fact find that the networks, proteins that are interacting with these subunits are enriched for frequent
mutations in either ovarian or head and neck cancer. You recover as positive controls the expected
interactors with gene or gene product like P53 and RB, and then of
course you find several others that are noble findings and are going to be really
interesting to follow up on. But just in terms of defining networks, what I
wanted to do in the second part of my talk is provide more of a fly by a couple of
different technologies. And this is not going to be an exhaustive set by
any means. It’s really just a set that we started with first. So the second network mapping approach that is
quite relevant for these cancer networks is the so-called genetic
or chemo-genetic interaction. It’s also — it’s at a super class of what are called synthetic lethality or epistatic
interactions. To show you what I mean by a genetic interaction
here sort of a synthetic thick or lethal interaction, what you see in the upper left-
hand corner is interaction between irinotecan which is an inhibitor of
topoisomerase and a knockdown of the gene RAD17 in a repair gene. So what you can see here is if you apply irinotecan
without the second insult to RAD17, those are quite viable up to high
concentrations of irinotecan. However, if you do the double exposure with
RAD17 knockdown, you get this synthetic killing effect. So that’s what I mean by a chemical genetic
interaction. But given that this is a targeted drug against
topoisomerase, this is really a phenocopy of a gene-gene or
synthetic lethal interaction. Now we and many others are interested in
symptomatically mapping these networks. And so what JP Shen and Rohith Srivas have
done in my lab is screened a large number of FDA
approved, mostly FDA approved targeted therapies which
have one or just a few well-defined protein target against
knockdowns of a large panel of tumor suppressor genes shown here on the rows of the drugs that are — Sorry, the drugs in the rows. Tumor suppressor gene knockdowns are in the
columns. A synthetic killing effect would be the blue side of
the heat map in this particular diagram. So you can see lots of blue, lots of, lots of
variability in the growth effects that you measure in this basic initial assay. Now, if I said there’s lots of groups doing this and
one challenge, we think, for analyzing these data going forward is not how
quickly you can screen these. In the near future, it’s clear to do a crisper type
technologies may prevail in terms of allowing us to rapidly screen through
many such synthetic lethal relationships and a whole
panel of cell lines that certainly where this is logically going to go. We think though that the challenge is going to be not do you measure a synthetic lethal or
genetic interaction in one cell line or another, but how likely is that to pan out in patient
populations and ultimately lead to therapies of interest. So as one way of prejudging or predicting or
prioritizing these interactions, we turn to a cross-species approach whereby
every time we measure a space of synthetic lethal interactions in humans, we mirror that in a very evolutionarily distant model
species, here Saccharomyces cerevisiae or budding yeast, which is evolutionarily separated from humans by a little more than a billion years. But the result is quite nice in that you do find
significant overlap in these synthetic lethal genetic interaction
relationships even across all of eukaryotes in that way such that you can draw a
relatively large wiring diagrams with one day potential for the clinic. And we’re not there yet, of course, but you can see at least where we might be going
in the field. So the idea here is in this diagram every
interaction I’m showing you has been observed in at least one human cell line and also
the setting of yeast where you get a synthetic killing effect if you knock
down both genes that you don’t see if you knock down only one. And because of the design of our screen I already
described, it focused on interactions between drug targets
and tumor suppressor genes. And that orientation is shown in the upper left-hand
corner here, a drug target point and a mutated tumor
suppressor gene or TS gene. That arrow is — It’s not an arrow, it’s a ball. But you can then read off of this map in those
conserved relationships. So the idea ultimately, again, we’re not there yet but the ultimate idea would be that let’s say your
patients is found to be mutated at BRD4, then you would look
upstream of that in this wiring diagram for the conserved interaction with a therapeutic target here, ADA as an
example. So that’s — So we think that these conserved maps are going to be quite nice to the way of prioritizing which interactions are
likely to be more stable and likely to be observed in new context, whether it’s a patient context or a new human cell
line. So the presence in yeast of an interaction, we can
quantitatively assess, makes an interaction in human cancer cells about
four times more likely than it would otherwise be. And then if we’ve observed that interaction, not just
in a single environment but across multiple compounds and genetic
backgrounds, then the rough increase in likelihood is 20x. So that just gives you some quantitative dealing for
how you can begin to prioritize these interactions. The third kind of interaction that this is of high interest is really the same kind of synthetic
effect but now upholds statistically out of cancer
genomics datasets as opposed to measured in a directed fashion with RNAi drugs
and CRISPR. This has been called a statistical genetic
interaction. And to illustrate that — Once again, there is a number of groups that have nice work in
this area. This is my talk. I’m talking about my work but it’s more
representative here of a sort of cottage industry I think in pulling out these statistical genetic interactions. Here, Andy Gross, my former student in
collaboration with Quyen Nguyen, I had in that cancer surgeon here at UCSD, as well as a consortium of others made the
following observation by looking at the head and neck somatic genome cohort. What you can see in this Venn diagram here is a
striking overlap or genetic interaction between two events, P53 alteration and loss of chromosome 3p. So the outer purple region of patients are those
that don’t have either events or have neither events, I should say. The middle orange sun are patients with both
events, mutation of p53 and loss of chromosome 3p. And then the slivers in purple or in yellow on either
side are patients with just one event or the other. For instance, the yellow patient have a P53
mutation but not 3p loss. That is essentially a co-frequent event, just like one can find frequently mutated cancer
genes as evidenced that those genes are under selection. Here we’re finding a co-frequent event in cancer
which by the same logic argues that that joint event is under selection. So that is by itself a genetic interaction. Now, one can look at those interactions for whether they predict youthful outcome type data
into here again in the PCGA you really have survival as the main
outcome you can look at and you do see here that there is a strong
association with survival for this joint event in particular. So it was well appreciated before our study that a P53 mutation is not with prognosis. But it was not well appreciated that that event
requires, it appears, a concomitant loss of large parts of 3p. Because you can see here if you don’t lose 3p,
that’s the sliver in yellow, those patients actually do quite well. And so that really was a striking finding as this
interaction between two events that has prognostic implication, and especially for patients that you might have
treated more aggressively, you can now adapt perhaps a more watch and wait stance on that group of patients. OK. So I am a basically out of time. I had a few remarks at the end but I think what I’m
going to do to save some time for questions is to summarize
more quickly. I want to just say a couple of words here in
conclusion about where I think all these network analysis is
doing. So I’ve shown you one major analysis approach
where I’ve spoken deeply on a single paper for the first half of my talk for how we think we can group cancer mutations
using networks and how networks probably aren’t just nice. They’re probably going to be necessary
knowledge in subtyping or defining subtypes of patients ultimately. In the second part of my talk, I gave more of a survey of three different network methods that can be used to — in an unbiased fashion begin to flesh out these
networks. I have of course missed a number of other network
mapping methods that if we had more time, we can talk about but of course you should probably be included in
any kind of cell mapping initiative. So now, you know, where do we think — Now, of course we’re going to go in the future. Probably, the biggest concern I’ve had and the one where I’ve devoted the most sort of
futuristic research in my lab is this bothering idea or bothersome
idea that cells don’t look like this. Cells don’t look like hairballs of interaction. In contrast, where network models are used in
other disciplines, sometimes they are a more physical
representation of what’s going on. For instance in circuits design, if you open up a
Pentium chip, there actually is a network inside that Pentium
chip. If you open up a cell, there is not a wiry diagram that looks like that in the cell. It doesn’t mean it’s not a useful representation but it is certainly an abstraction. So how does one or can one get closer to
ultimately a complete structure or function model of a cancer cell that as perhaps
the best means of all of summarizing knowledge that could then be
applied to interpreting cancer genomics. Here we’ve shown how the hair balls are useful but
really we think we need to kind of push these network data out of the
domain of pairwise interactions and really closer to structure and function models of the cell by way of — an example here is that you’re looking at this
stained structure in panels A and B. This is the proteosome. The panel B was assembled from a stack of about a million x-ray diffraction images. Panel A is assembled by molecular reaction maps but the color coding is the same. So that the core of the proteosome here is shown
in red and yellow. The regulatory subunits in the proteosome here
shown in blue and purple are all [inaudible] by the red, yellow,
blue, purple coloring here in the wiring diagram. Just briefly what we have been able to show in the
past couple of years is it in fact — Maybe, you can’t get all the way to structure from these raw interaction diagrams but you can
get a heck of a lot closer. So in particular, we’ve been able to show that you
can at least get this hierarchy of modular subunits that you can learn directly from interaction data. For instance here, you can infer just the core
contains actually two modular subunits and at the right it was [inaudible] particle
also contains two modular subunits. It’s hard to see at left but it’s possible. And the way this works is we call the NeXO
method for network extracted oncology. But in interest of time, I think I want to just flash
more quickly through here you can read about this method in our Nature Biotech paper from 2013. But it can be used to organize interaction data as large hierarchies of cellular subsystems. So not yet the entire structure of itself but at least a
hierarchy of modular subunits and that’s shown here. You might be thinking that another hierarchy of
modular subunit security from literature is a database called the gene
ontology or GO. That was literature curated. This one that I’m showing you here is driven from
experiments like protein-protein interactions, co-expression and other kinds of interaction
measurements. But having built this hierarchy, one can then compare it to the literature curated
GO as a reference and begin to recognize parts and label them, and
that’s what’s been done here. And so, if one were to zoom in to the part of the
ontology that covers the proteosome, now, that is shown
here at left. And so we do find that the proteosome comes out
of the modular subunits out of the entire interaction database of
information and that that can be split naturally into a core and a
regulatory particle and split again into an alpha beta subunit of the
core and a base in the lead of the regulatory particle, as well as some new
modular structures that aren’t recognized against GO or prior
knowledge in GO but may indicate other complexes, for instance,
associating with the proteosome. Certainly some of these are recognizable, others
are less recognizable. But we think that these ontologies or hierarchies
that are driven directly from data in an unbiased sort of way or a more
systematic way, maybe useful step forward in comparison to
looking at these networks as flat hairballs. So, maybe then what is the case is that these
hairballs of interaction show here in bottom left really are just too close to the data. Just as I talked about the proteosome structure
being assembled from x-ray diffraction images, we at this point
prefer not to look at these x-ray diffraction images. We run the algorithm. It’s called structural — it’s called the field of structural proteomics that gives us these structures we believe is going
to be developed a similar set of algorithm that pushes these pairwise
interactomes, which really are just too close to the data upward
towards whole cell models. Here we’ve gotten as far as hierarchies like the
gene ontology. Other labs like Ruedi Aebersold and other
proteonomics lab are in fact trying to go directly from pairwise interactomes to
structures through approaches like chemical cross linking. But they [inaudible] is the message. We expect that if one were to roll forward 5 or 10
years making labs, that we’re not going to be looking at these
pairwise interactomes as much anymore as building models from them. So with that let me summarize. What I have argued today is that genome
sequencing, as we all know, is in full bloom, especially for cancer. It has revealed hundreds if not thousands of genes that are somatically altered in patients. The question of course is how to deal with that
rampant heterogeneity we see, every patient is a snowflake when it comes to their
tumor genomes. However, common patterns do begin to emerge at
the level above genes in this cellular hierarchy. So the protein complexes the gene products
encode. The pathways those complexes play rolled in. The organelles, even, that those pathways and processes are components of, and so on and
so forth. So, understanding those hierarchies of subsystems pathways above — et cetera, above the gene level is going to be
critical to organizing genomic data. For this reason, we recently launched this open
campaign we called the Cancer Cell Map Initiative. We — For this to be successful, we’re going to very quickly need to grow the thing and get my share. So, please, I know I’m not in the room today, but
you either ask me a question or send me an email, we’re going to try to have
one of the first planning meetings for this thing shortly, probably over this summer. And so we love to reach out and work together in
much the same fashion as the TCGA set a model for how you work
together in analysis and generation of cancer genomics data. We think this effort could feed a similar type of
collaboration and network around generation and analysis of
network data. In this last little bit, I made some futuristic remarks
that really are to the effect that we think pairwise interaction
networks are too close to the data. We need to be thinking about the whole models or
cancer pathway models that are informed by these data rather than looking
directly at the data. But no matter how you represent it, whether it is a
pairwise interaction network or as a something else, we think this type of
pathway knowledge isn’t just nice. It’s really going to be necessary. So with that, I’ve already mentioned most of the
players along the way. But I’d like to just take any questions. Thank you very much.>>Great. Thank you, Dr. Ideker for a very
important and thought-provoking presentation. We might have time for a questionnaire, too, so at this time I’m going to go ahead and open the
floor to questions. For those of you in the room, please use the
microphones that are on the desks in front of you. If you’re on the WebEx, please indicate with a
raised hand on the WebEx dashboard and we’ll unmute your
line. So I believe [inaudible] has a question.>>Looks like he’s not on the line anymore.>>OK. If there — As we wait for others to indicate their questions, I’ll go ahead and ask one, Trey. So, I was very interested in particular in the
networks moving approach. And I was just — I was wondering if do you use any information about the gene mutations to inform the sort of heat
propagation and how that heat propagation is expressed or is it
sort of– is it just done the same across different mutation
types?>>That’s a great question, Julie. So the answer is it’s the simplest possible thing. It is simply tagging a gene as a one if it’s been
altered and a zero otherwise. We have — it’s not for lack of trying. We have tried a number of ways of scoring genes, that’s probably what you have in mind. And they don’t improve the signal in any clear way that made us confident in using them. And my bias is always to adopt this, you know, Occam’s razor approach to all of these. So what’s nice about our current purchase, it is
really quite simple. You know, you have a network of links. We don’t wait those either at the moment even
though you can think about that. We have a set of mutated genes defining the
patient’s phenotype, and that’s it. So there’s a sort of elegance and beauty in the
simplicity of the thing. That said, again, it’s a longer conversation we can
have, you know, offline later. We’ve tried to score genes. We’ve also tried to score interactions. In some cases, our efforts actually make the
results worse. And again I need to define what I mean by worse,
but it’s interesting. So, I don’t think that I can make any strong
conclusions that, you know, these scoring approaches aren’t useful to us. I just think more research is needed.>>Thank you. Thanks. Looking at the time, I think we probably
need to close for today. So we hope you can join us for our next
presentation which will be Wednesday, March 16th, when Dr. Richard Gershon from
Northwestern University will be here to present “Item Response Theory, Computerized
Adaptive Testing, the Patient as Participant, Precision Medicine.” So I want to just thank everyone who has joined
today and special thanks to Dr. Ideker for sharing your time and expertise. Thanks very much, everyone.>>Thanks, Julie. [ Inaudible Remark ]