Developing the Scholarly Communication Ecosystem: A CMU Perspective

Developing the Scholarly Communication Ecosystem: A CMU Perspective

August 17, 2019 0 By Stanley Isaacs


Developing the Scholarly Communication Ecosystem:
A CMU Perspective David Scherer, Ole Villadsen, Keith Webster So we’re doing this presentation in three
sections. I’m Keith Webster, Dean of Life at Carnegie
Mellon going to kick off with some pie in the sky pseudo fiduciary stuff to try and
frame the more substantive contributions that will be made by my colleagues David Scherer
and Ole Villadsen. David is on my immediate left, Ole on the
far left. First I bring greetings from Carnegie Mellon
just by way of introduction, the higher moral rankings place us in the top twenty-five universities
in the world, therefore they are the default university rankings that we should
all believe in. (Laugher) We have particular strengths, as
many of you may know, in computer science and engineering and technology and in the
fine arts. The University was born or took its roots
from its foundation as Canadian Institute of Technology, founded by Andrew Carnegie
in 1900. Fifty years ago this year the institute married
with the nearby Mellon Institute of Industrial Research to find the University we know today. It has strong Scottish roots, the photograph
taken outside my office represents some of the Scottish traditions that you will see
on campus most days. There is a prominent faculty handbook that
anybody from CMU speaking in public does so with a Scottish accent. It’s taken me about four years to shake
off my Wisconsin roots, but I’m kind of getting there. (Laughter) So a couple of years ago, you’re
going to be a good audience, a couple of years ago I was asked to contribute a section to
the university’s strategic plan on libraries. And as we talked through what that might look
like we came up with the notion the 21st Century Library. I’d been to enough conferences labeled 21st
Century Library to know it was a thing, even if I couldn’t quite describe what it might
look like, which is what everybody asks me. What does the 21st Century Library look like? I truly don’t have a clue. But what I do believe is that it marks a shift
from our role as the campus community’s primary information provider to something
different. The reality particularly in the university
focused on the disciplines I mentioned earlier is that much of the information content required
by our students, our faculty, our researchers exists on the network. They don’t come to the library to acquire
it. Frankly I suspect they find cyber a faster
way of getting to things than coming through our channels. But nevertheless we have really become a rather
boutique side shoot of the university’s procurement office. So if you take our role as information provider
as Something that we have on life support, rather
than out critical mission, what is the role of the library in the 21st Century? And that’s kind of what we’re trying to
attract through a number of initiatives that are underway. One of those is particularly what we are going
to address in this presentation. Glad to see Larkin Dempsey sitting there. He has written and spoken on a number of occasions
about the shift of the work flow in the research enterprise. The recognition that in the 1990s and before
then the researcher built information work flow around the library. They came to our buildings, to our librarians
to access content to keep up to date. Today, for most researchers, we have an information
work flow that exists entirely outside the library and one of the challenges for us is
to insure that we integrate with the researcher work flow. We need to make sure that our services, our
tools, our technologies fit into the way that today’s researchers work in the online networked
information ecosystem. As we think about how we might move forward
we recognize that the skills that made us successful as information providers have immense
relevance in today’s world. We need to keep on top of the changes in the
environment and understand where our skills might align to allow us to do things for our
researchers that are done more effectively than anybody else in the marketplace. Two of the trends that I observe out there
are those of open science and the evolving scholarly records. Open science has become significant over the
past few years partly because of the growing expectation that those who pay for research
will have access to the product of the work they fund. We see a greater expectation of the ability
to reproduce the findings of research. We want to make researchers accountable for
the results that they present to the scientific community. We know that the internet has democratized
many aspects of our lives as citizens and there’s no reason why science should be
any different. And we know too that open science can increase
the visibility, the impact of a university. It’s a bit of a stretch to say that it will
drive us up the world rankings, but it’s an important market certainly of a university’s
presence in the scientific world. Another product of the open science environment
is the way in which the scholarly record has morphed from being focused primarily on the
outcomes. Think about the Journal of Articles as the
outcome of a research project. And instead we see many of the other artifacts
of the research process as being amenable to dissemination, curation and repurposing,
whether it’s the research process and the protocols, the avenues, the community conversation
that are all taking place in a digital world, or the community reuse and repurposing of
the outputs of research. All of that forms a list of things that it’s
important for us as research stewards to manage. So as I think about how our libraries move
forward, there’s bucket of work around our role as supporters of the student experience,
whether it’s about repurposing libraries to come more in line with 21st century learning
activities and needs or whether it’s about helping students navigate the increasingly
complex information landscape. We know there’s a lot to be done there. But we know too that there’s much to be
done to support the world of open science and that increasingly distributed evolving
scholarly record. When I look at the commercial world I see
players like Elsevier become very adept at populating the research lifecycle. Many of the tools that we have built or bought
are trying to build together a one stop shop for the researcher, for the research institution
and arguably for the research funder. Libraries, in my sense, have generally been
less adventurous and many libraries probably could start with a lifecycle and point to
the publication space and talk about the collections that we make available and perhaps the open
access in institutional repositories that we provide. But we haven’t necessarily been adept at
covering a broader sweep of the lifecycle. But remember my opening point about how if
we are to be successful we need to integrate with the researcher workflow. This is the sort of stuff that the average
researcher navigates every day. Increasingly we are turning to a variety of
tools and services that they access individually because they are convenient and we’re not
going to try and stop them, but we need to understand how we fit in if we are to be accountable
to our institutions to curate and showcase the scholarly work that our institution produces,
and if we are to be accountable to the funders and others who place mandates around the disposition
of data and publications emerging from the research we have funded. We spent quite a long time looking at a variety
of solutions that might try and take us forward. We thought about building things in house. We road tested a number of commercial products
and earlier this year We announced a partnership with Digital Science
where we agreed to implement four of their main products, Symplectic elements to be the
campus research information system, Figshare to be our comprehensive data publications
and anything else digital repository, Altmetric and Dimensions. And we’ve started to begin to map out what
our take on the research workflow will be in a way that helps our campus community understand
fairly simply what we are offering them. You can see from the slide that we are not
just in partnership with Digital Science. I don’t want this to be a putting all of
our eggs in one basket situation. But what we recognize is that by following
this Approach we can perhaps drive what is painted,
I hope you can see it, on the world’s most painted object, the fence at Carnegie Mellon,
which students decorate almost every night impact. That is what we are trying to help our faculty
achieve and we believe that the most appropriate way we can help them do that today is by helping
them showcase their work. And to take us on the journey. David. David Scherer:
Thank you. So I’ll be talking now about how
CMU has moved toward what we call, or refer to, as our Comprehensive Institutional Repository,
which is very pointed, as Keith has pointed out, these are topics that CMU has been dealing
with for a number of years. But there was a publication that came out
through this organization this past May that highlighted the idea of looking at strategies
for Institutional Repositories and rethinking about how institutions view the repository,
both as a standalone entity, but also how does it integrate potentially with the broader
needs and capabilities of an institution. So in that report there were three institutional
perspectives that were covered and three of which are actually ones we would like to be
able to focus on today. The first one is noting that the Repository
needs to be thought of not as a standalone entity, but also how does it fit into these
broader strategic initiatives of an institution. And then thereby doing so, how does the institution
use the repository to showcase this work? Secondly, it’s something that we’ve talked
about very well through this organization is that t here is a path for institutional
repositories to be seen beyond just as a repository that is a platform, but actually seeing a
repository they can go from that to a bundle of services or eventually to a bundle of related
services. I think one of the things that we’re looking
at though is trying to think of the repository both as a bundle of services, but as a bundle
of interconnected services. And then the last point was that there was
a highlight from the members of the Executive Roundtable that one institution was using
Figshare as the repository, so we are here today to talk about that process. But before we do we should kind of note some
of the historical context of repositories at CMU. So prior to the adoption of our new repository,
repositories were seen as three different categories, whether it be Archives repository
built on archival way or what’s now known as Novation, our Traditional IR for publications,
theses, dissertations, created literature, tech reports, which was powered by digital
commons from BPress and we lacked a Data Repository. As some may know, with digital commons it
can be used both as a repository platform, but also as a publishing platform. And while CMU was very highly focused on the
IR side of digital commons, there wasn’t a lot of work being done on using it as a
publishing platform. At the time it really didn’t fit the needs
that we were hearing form our campus constituents and trying to fill what they really needed,
which was something to fulfill the data repository solution. So this was really what kind of was driving
our need to look at our repository currently, but also what else was in the space, be it
Open Source or another vendor solution that could help us to fulfill that need. And as Keith mentioned, we did an environmental
scan of the space parallel to what was going on in the broader institution of looking at
what was needed for research information management and we went with our partnership with Digital
Science through Figshare. One of the things that we did though was that
in that entire process this was not solely something done by the libraries, but involved
many different units across the institution all the way from the administration, with
the President, Provost, Head of Research, all the way down to individual faculty members. And this was something we wanted to make sure
we carried through to the development of a repository of, even to the simple thing of
giving it a name. So one of the things we did was, we ran a
contest on campus to name the repository with a small prize and you can see from some of
the numbers we have that it was pretty extensive where we were getting coverage on involvement. And we thought this was an early way to develop
ownership of the repository by campus. I know many of you who work with repositories
or work around them, you maybe have heard of faculty who were very unsure of what this
repository is. They are not sure what is going on. You get contacted because they get an email
because their materials are in the repository. This was a way to kind of deal with those
early issues and have the campus take ownership of what the repository was going to be. Which then led us…we’ll continue with
the Scottish theme…to KiltHub. So this is our central comprehensive repository. It’s powered by Figshare, so you’ll see
that it looks and feels very much like Figshare and there’s a point to that, which I’ll
allude to in a minute. And then the card on your right is one of
our promotional materials that we’ve created to engage with campus. And this idea that the repository is there
to weave the fabric of your research. And we’ll talk about why is that phrase
important and how has it actually empowered faculty to think about what they may put into
the repository. We’ve also done quite a lot of engagement
on how to use the repository as far as printing guides and tutorials, informational pages;
so making sure that we’re engaging with the faculty and students and researchers to
better use the repository for their needs. And giving to this idea of how do we close
the gap between ease of use and what we would like to see as a picture perfect deposit. So how do we get to between those two different
concepts? So when we talk about the repository and why
our faculty should decide to use KiltHub over other resources, we kind of have a few different
reasons that we can point to them between Make it Open to Simplifying the Research Workflow. But ensuring that if they do make things open
that they are doing so with getting the highest level of impact in return. So we do provide a DOI to everything that
we publish within the repository so they can track their citations and metrics. And this gets along to Getting Credit for
Your Work. And being able to Comply with Funders. We still see today that many funders are
Grappling with the idea of compliance and using repositories to say where things should
go. And we see a lot of cases right now where
publications are getting really flushed out and deciding upon where things should go. But data is another story. With some funders they are suggesting the
use of Figshare, which my colleague, Ole, will talk more about in a minute. But I think one of the last things which we
did with this question of why should we use institutional repositories over, you know,
public repositories that are out there in the cyberspace, which I think is one of the
points…is that we’re here to help. The libraries can serve as a mediation point
and thereby proxy assistants for the repository and to help with some of that overhead that
is dealing with making deposits and making items available. And this alludes then to what apprises the
repository team itself, which is actually a very dynamic team. You’ll notice that there are individuals
that are sort of responsible for both scholarly communications, research data management and
other surrounding topics, but also is very much based upon the liaison model where there
is a librarian that serves as the conduit between the libraries, the specialists within
the libraries and then the disciplinary faculty. So then to reiterate the idea of a comprehensive
repository, this is both combining what we would call a traditional institutional repository
with a data repository. And being about to accommodate research data
and what we refer to as scholarly outputs. So these two different categories kind of
form the warp and waft of our repositors. So again this is how we’re weaving the fabric
of research. It’s being able to accommodate everything
one may produce during the research lifecycle. Another point made in the CNI Executive Roundtable
Forum was this notion of the Enterprise Repository. That repository content most of the time when
it ends up there it’s at a terminal point. Once it’s been published. Once it’s ready for dissemination. That’s when it goes to the repository. But there’s this other idea of thinking
of the repository as the collaboration point where the research can be developed and maintained. Now we know there are other solutions that
institutions may use or researchers may use, but how could the repository be used for this
additional activity? So this may be including having a collaboration
space or having a project housed in the repository. And there’s all different types of concerns
that we have to think about this as far as with security and storage allocations and
things like that. So this is something that we’re still trying
to grapple with. But it’s getting towards the idea that if
I can narrow the gap between where things are created to where they are disseminated
I can try to ensure that I can get more content made available through the repository. This also adds then to what types of integrations
may be possible outside of the repository and one such integration that I’d like to
highlight is integration that Figshare has with GitHub, where we know that many researchers
today are using for software and code. And the benefit of this connection is that
it is a true integration between GitHub and Figshare where the researcher can authenticate
their two different accounts to move content from GitHub over into Figshare, allowing for
version control and using Figshare then as a place to publish, as well as to preserve
materials that a researcher may be producing within their Git repository. The next of these types of integrations is
how does a repository integrate with a Research Information Management System or a CRISS. And there are a lot of different repositories
that talk to, for different ways, with CRISS. And there’s many different types of Cresses. So this gives you an idea of the many different
repositories that you may see and right now the most common CRISS’ that are out there
today. What I’m going to talking to you about specifically
though is the connection between our Figshare for institutions repository and Symplectic
Elements. Now I should point out that while both of
these are from digital science within the portfolio, Figshare is actually the fourth
integration for Symplectic Elements. Predicated before this was actually connections
for Dspace, Eprints and the connection for Data Source harvesting practice from digital
commons to Elements. So I think there’s some notion of that as
looking at there’s the wide variety of connections that can happen between repositories and CRISS’. And this connection is not a single flow,
but actually a cyclical flow of information from one to the other for various reasons
and for different activities. The first is looking at the repository to
the RIM, which is an activity of finding what content has already been made openly available
and harvesting that information to match with publications records that one may find in
the CRISS That way from the perspective of the press you can see publication record and
then verifying, has that thing already been made openly available and if so in which repository? And then from the RIM to the IR we have the
ability to do actual deposits. Deposits that we are able to use additional
API say from Romeo to verify the version of publications that we can add to the repository,
inform the user of what that information is, with also being able to provide local institutional
context over that, so we don’t necessarily have to provide the straight sure flow of
information but actually the library’s interpretation of that information. With any additional information for deposit
that we may require that then can go over to the repository and be involved with our
curation profile for the submission process. With these two things connected together then
we are able to monitor open access to see what has been made open access and what is
not. One point I do want to make sure that we do
stress is that the repository is not a compliance component. We’re not using it to say what has been
made open access and having any kind of power over that. This is just making faculty aware of what
they have in the repository and how it reflects within the overall research and publication
view. So, I’m now going to turn it over to my
colleague, Ole, to talk more about interacting with faculty. Interacting with the Faculty: Challenges and
Lessons Learned. Ole Villadsen: Good afternoon everybody. My name is Ole Villadsen and I’m a research
liaison for cyber security and information systems at Carnegie Mellon University Libraries
and I’m also a member of our steering committee for digital science and helping to implement
Elements and KiltHub throughout the campus. And I’m going to share with you some of
the challenges and lessons learned that I and my colleagues have come across when it
comes to working with the faculty and informing Them about Elements and KiltHub. So there are a lot of great reasons why faculty
and researchers who use KiltHub for depositing their data sets and their publications and
those…I’ll draw your attention first to the two that are at the bottom…Making it
Open and Simplifying Research Workflow. And we can all agree in this room, based up
our positions and where we currently work, these are very valid and very important reasons
why someone might want to use KiltHub. But we’ve found that the two that are at
the top…Compliance and Discoverability…tend to resonate a little bit more with the faculty,
especially depending on how they’re viewing things at that particular time in their research
lifecycle and their careers. Compliance…the word is getting out…a lot
of researchers are now well aware of the mandates from the Federal Government and from publishers
about where they need to deposit their data and discoverability. Figshare, and by extension KiltHub, have a
very large footprint on the internet and Figshare itself acknowledges that 60% of traffic that
comes to Figshare comes there from Google. And I want to talk about why that’s really
important in just a moment. So for Funder and Publisher Compliance….here
are just a couple of examples of where Figshare is mentioned in both a government NOAA and
their requirements for data deposits, along with some other repositories as an acceptable
place to deposit data. And PLOS, also a publisher pointing out that
Figshare is also a place where data can be deposited for general purpose studies to fall
neatly into one of their main repositories that they recommend. So why is discoverability through a general
purpose search engine important? So these are a couple of studies that were
done in the past few years that take a look at how researchers are finding data sets and
they agree on one point…that using a web search engine, a general purpose web search
engine, which essentially means Google, is one of the top three ways that researchers
are looking for data, with the other two being the main repositories and checking in relevant
journals. So that drives home that importance of making
sure that data sets are discoverable through Google as being one of the top ways that researchers
are gonna look for that data if they can’t find it in their particular repository or
through the journals that they read for their particular domain. And so,
I’ve done a test here where I’ve taken one of our deposits over the summer into KiltHub. It’s a study that included a data set and
a number of data artifacts for looking at Software Ecosystems Culture and Breaking Change:
A Survey of Values and Practices in Open Source Software Ecosystems. And I went to Google and I put in four words
that I think anybody looking for that…for anything on that particular topic might use
in order to conduct their search. And the study that’s in our KiltHub repository
came up as number 1. And I’ve done this with a number of other
deposits that are made to Figshare that’s very discoverable through Google. So I did…I’m never sure of what the secret
sauce is going into Google’s search engine, so I controlled for location using the VPN
and thought I was in Kentucky, even though I was in CMU’s campus, and I used a private
browser to control for search history and it all still comes out the same….that it’s
very high in the search results. So I’m going to switch gears a little bit
now and talk about some of the implementation challenges we have had with our research information
management system RIM or CRISS that might interchange with those on campus. The first is curating profiles. So in Elements the faculty’s publication
is drawn or harvested from a number of data sources like SCOPUS and Web of Science and
a number of others. However, they’re not always accurate out
of the box or when they run their first initial harvesting polls from those sources. And that’s for a number of what I would
think acceptable reasons…those name variations at play here where researchers may be confused
with other researchers out there in the ecosystem and academia. And also incorrect linkages to identifiers,
and by that I mean that the…identifier, particular researcher correctly with their
SCOPUS ID, or may associate a researcher with the wrong SCOPUS ID, or there may also be
more than one SCOPUS ID, as we’ve come to find, for researchers and they may not be
including them all or including the best one. So we’ve found, or decided, that we need
to really go through and check all the faculty publication profiles to make sure that they’re
reasonably accurate before we turn them over to the faculty members. And that’s required a lot fo time and effort. A rough back of the envelope math would say
it’s probably between four to five hundred man hours in order to go through and take
at least a quick look at every faculty member and for some faculty to dive much deeper in
order to figure out why their results are not close to being accurate and figure out
what the problem might be. So we at CMU have a very decentralized campus. And I think the United States
Around the time of the Articles of Confederation between the Revolutionary War and the adoption
of the Constitution, as a not bad example of the way things are at CMU. The colleges hold a lot of autonomy. They hold a lot of control over their decision
making and that has created challenges for us when implementing a common RIM or CRISS
across the campus. Challenges, for example, for different expectations
and needs for RIM among the different colleges. There’s different annual review processes
and forms that each of the colleges use. And there’s also different …they have
different current approaches to tracking their research output. They might be using different systems altogether
from one college to the next Which makes it difficult. Instead of going from one current system to
another new system, we’re going from multiple different systems to bring everybody together
on board on one common system. And then finally, is technology integration
taking full advantage of Elements and all of its capabilities? It would mean being able to harvest information
from, for example, our current systems that hold grant information, systems that hold
our teaching information and evaluations and so forth, all of that can be harvested directly
into Elements automatically, but there may be integration challenges depending on the
type of technology that is currently in use. For example, if it’s been built internally
it might need some modification or a fair amount of work form developers in order to
figure out how we can harvest that and bring it directly into Elements. And so, for some more lessons learned here,
that We’ve had with both KiltHub and Elements,
integrating them onto the campus, first is Google Scholar. So when I spoke about the great discoverability
that we have with KiltHub through Google, but we have not found that to be the case
with Google Scholar. We were expecting better visibility for our
research products in KiltHub on Google Scholar, but it’s just not there yet. They’re not findable through that particular
platform. That’s under investigation. We’re working with our partners at Digital
Science and Figshare to determine why, but that’s something that we hope to improve
upon. The second, these are good problems to have,
as we…when we rolled out KiltHub earlier this year in a soft launch, we got some interest
from some faculty to deposit their data into KiltHub, which was successfully done, but
it also drove home the need to come up with a Data Submission and Deposit Requirements
policy so it would be very clear how we were going to handle deposits in the future for
data and how, for example, we would negotiate that deposit based upon the data that’s
being submitted and the kind of documentation we’d like to see, such as readme files and
data dictionaries and so forth. And also another good problem that we found
out we had was a researcher also who was submitting a grant application needed to prepare a DMP,
a Data Management Plan. The faculty member had heard about Figshare
and KiltHub and knew that this was something that would help with the DMP and wanted to
get some language from us about how to include that in the Data Management Plan. We were able to meet that need successfully,
but it also drove home the need for some boilerplate language about Figshare and KiltHub for faculty
members for their DMPs so that we can submit that or provide that to them very quickly. It’s not uncommon that we might only have
a twenty-four hour turnaround to look at a DMP before it needs to be submitted to the
Funder, so having that ready and on hand and sent out to the liaison is another lesson
learned. And so finally is the need to balance carrots
and sticks when marketing the research information management system and the repository across
campus. So this may come as a surprise, but not all
Faculty members are enthusiastic about the prospect of sitting down in front of a new
system and populating it with all of their information, especially that which may be
incorrect or which may…has not been harvested from an automatic source. In those cases, we found that it’s very
important to partner with the administration at all levels to make sure there’s a clear
understanding that we are there to implement these systems, that we’re there to help
the faculty members with these systems, but we’re not there to direct the faculty members
to use them. That is a…that is within the domain of the
administration. To use the analogy that one of my colleagues
at CMU uses in this kind of situation that we are there to be “H&R Block, we are not
there to be the IRS.” We really are there to help. And with that I’m going to turn it back
over to David for completing remarks. David Scherer: So now that we’ve had a chance
to kind of show you what CMU has been working on for the past year or two, we’ll now talk
about next steps and what is our future expansion of the ecosystem at CMU. So our immediate next steps are to continue
our role at engagement of KiltHub. We’re doing this through release at department
by department and in many cases trying to roll out to entire colleges when appropriate. And part of this roll out is also how do we
engage with our faculty and students who are using Figshare.com. So one of the things that we’ve noted is
that we do have a number of faculty and graduate students who have used Figshare.com to house
things That we would very much like to have in the
institutional repository version of Figshare. So one of the things that we’ve been able
to do is by working with the vendor and identifying with the user what content would meet out
collection development policy and be applicable for the repository. Having that content migrated from Figshare.com
into KiltHub without requiring the user to have…to make that deposit themselves or
having duplicates both in the .com and the institutional version. We’re also now developing our use cases
for our deployment of Elements. And these Use Cases include Elements as a
supporting mechanism for faculty profiles and various documents, CVs, bioschetches for
grant applications, other types of documentation. Documents that may be supporting the annual
review and reporting processes, but also any kind of support that the system can provide
for the promotion review and tenure process, PRT, not PTRC, they’re different. Some of the things we’re focusing upon for
future expansion of the ecosystem is completing the research lifecycle support loop. So Keith mention for so many years we’re
supporting right now…where we’ve already made ventures into, but those are by no means
the only things that we need to be able to provide assistance and support throughout
the lifecycle. So many areas we’re looking to expand now
into is how can we support different activities, such as electronic lab notebooks, protocols,
and collaborative writing platforms that would allow for, again, the researchers to continue
what they are doing already in many of the systems but having institutional support to
do so. So with that, thank you so much for sitting
here, listening to our presentation. There is some resource information to give
you more information about what we’ve been doing, as well as access to our repository
KiltHub. So thank you very much. (Applause)