The blog of the John S. and James L. Knight Foundation
The following is part of a series that looks at The Digital Public Library of America - the first national effort to aggregate existing records in state and regional digital libraries so that they are searchable from a single portal. It is written by Annie Schutte, a librarian, teacher and consultant for Knight Foundation.
The Minnesota Digital Library currently serves as a hub for more than 150 libraries and cultural heritage organizations around the state, and aspires to "expand that dramatically" working with the Digital Public Library of America (DPLA). Associate University Librarian at the University of Minnesota and service hub Director John Butler describes the Minnesota Digital Library' s current partners as spanning "from academia to Main Street." He is particularly interested in trying to partner with urban community groups to reach into new content areas, such as bringing oral histories from Minnesota's immigrant and refugee population including the Hmong and Somali communities, into the archive.
The Minnesota Digital Library first online exhibit for the DPLA will showcase its impressive collections of images and documents from its Native American cultures and populations. But the collaboration will, as a whole, bring a wealth of diverse materials to DPLA—more than 130,000 items spanning topics as far ranging as Vaudeville, ice palaces and the historic Twin Cities’ streetcars.
In this interview, Butler talks about the DPLA's data-related challenges in this massive undertaking, such as record duplication and record disparities. But more importantly, he speaks of the immense possibilities this data aggregation presents for understanding our cultural history; and the way that DPLA could change how to do research.
Could you tell me about your organization and how you became involved with the Digital Public Library of America?
J.B: My affiliation with the DPLA primarily comes through the Minnesota Digital Library, which is a statewide collaboration consisting of Minitex, a library resource-sharing network in our region (Minnesota and the Dakotas), the University of Minnesota, Minnesota Historical Society, and other key institutions large and small throughout the state of Minnesota, such as academic and public libraries, art and historical museums, clubs, and others. We have numerous religious and non-profit organizations, genealogists, history hobbyists—spanning academia down to Main Street. The participating organizations are represented in the management and advisory functions at the Minnesota Digital Library, as well as in the collections that we have built over the past eight or so years.
I think it was the Minnesota Digital Library’s tremendous diversity and sheer number of contributors that attracted DPLA’s interest in our prospects as an initial participant. We have over 150 content contributors to Minnesota Digital Library that on Day One of DPLA launch will be represented at the national level, and by means of the project, we hope to expand the number of contributors dramatically.
Could you tell me about the types of contributors you're looking to reach out to?
J.B: We’re interested in a new dimension of contributors — organizations in our contemporary urban communities. For example, Minnesota holds one of the largest populations of refugees in the country, including in the 1970s the Hmong refugees coming out of Southeast Asia, and more recently, with very large Somali populations. So, we're interested in tapping into these communities, for content generation, to capture oral histories of their immigrant experience, and to engage the community, if we can, in documenting their lives within our contemporary Midwestern culture.
What's unique about the collections you currently have through the Minnesota Digital Library that you'll be bringing to DPLA?
J.B: We have documents of all types. We started off very heavily with images—photographs, maps. And more recently, we've done quite a bit with historical documents, text, where we are indexing the full text of these, whether or not they are historical or legislative documents, geological surveys, or diaries of explorers … textual documents of all kinds.
More recently, we have been engaging in the conversion of audio recordings and Minnesota newspapers, working closely with our state historical society for the latter. The topics run the gamut. Each year, we put out a call for proposals, usually along the line of a certain theme, and we get proposals back from the communities, again, ranging from libraries, museums, to historical societies, clubs, religious organizations, non-profits (for example, the Hazelden Foundation—the chemical-substance treatment center has a collection in Minnesota Digital Library) — so it's really quite a range.
We have a lot of historical photographs. We have a marvelous collection of U.S. Corps of Engineer cyanotypes of the Upper Mississippi River Valley going back to the 1880s. Things like theater programs from early 20th century Vaudeville and theatrical houses, photographs of the tradition of building ice palaces in Minnesota. A number of things from art collections from local monasteries and religious organizations are included as well.
And then some of the special items we have coming out of the Native American community, which I did want to talk about, because that's the focus of our online exhibit for DPLA. We have historical and contemporary newspapers from our Dakota and Ojibwe populations here. We have the Dakota Territory newspaper going back into the 1870s, and then a more contemporary news publications out of the Ojibwe community throughout the 1990s. Where needed, we’ve been considering translating these works to make them eminently more searchable and discoverable. There's quite a bit of work to be done and opportunity to seize here.
The library's content is very far ranging—from transportation to architecture to, again, the written word, literature, to geology. One of our most popular collections has to do with the Minnesota Streetcar Museum, because we decommissioned all the streetcars that used to run through the Twin Cities back in the mid 1950s. But we have an extensive photographic collection of these streetcars, which, because they were routed everywhere , have resulted in a one-of-a-kind collection of photographic documentary evidence of neighborhoods throughout the Twin-Cities area at points in time. Through that transportation network, we have wonderful photographic portraits of many, many neighborhoods throughout Minneapolis and St. Paul.
What local benefits will your position as a DPLA service hub provide?
J.B: I like to use the word “amplification” because Minnesota Digital Library and our database, which is called Minnesota Reflections, has a loyal, but primarily local, audience. And this notion of mixing our metadata with the metadata and subsequently the digital content of these other state and regional libraries, I think, has a very powerful effect—not only for people interested in history of families and genealogies, but also for research. That would span to scholarship, as well as to the research that is conducted in K-12 and in postsecondary areas. So I think the mixing of the data across all of the collections that DPLA aspires to represent is something that intrigues me quite a bit. I think this “national mix” will have an exponentially powerful effect.
We will probably discover things about Minnesota in DPLA that we were heretofore unaware of because of this aggregation of data from around the country. And I would expect, for example, with what we've been talking about, this emphasis on native cultures, the same kind of discovery to happen from people down in Georgia or out in Utah, or Arizona. and the user populations of the Mountain West Digital Library. So that is very exciting.
We also want to use DPLA to expand the portrait of Minnesota by—and this is one of our funded project areas — by creating a much larger aggregation of metadata from collections with our state that we will process through the hub and pass along to DPLA, even though it won't go in our database. So, the point here is that there are, of course, many other digital libraries throughout Minnesota, some of significant scale and utilization. The University of Minnesota and the Minnesota Historical Society are such examples; both have very large databases of digital images and other documents that are not in the Minnesota Digital Library's database, and may never be . But we have begun engaging in discussions about creating a larger Minnesota aggregation that could appear together in the DPLA index and create a far more extensive representation of Minnesota history and culture through the DPLA. So that's a big aspiration through our involvement in the service hub project.
What effect do you think the DPLA launch in April will have nationally— for libraries, archives, patrons, other information providers, etc?
J.B: I think it could be powerful in two ways: one is the sheer aggregation of collections that we have wanted to search as one before, but not to rely solely on Google to do that, just because of the richness of the metadata or the services that might be attached to the metadata and the viewing tools or the ability to “play” with the objects and so forth.
The other powerful effect is—and I think this is one of the real, added-value potentials of DPLA—is the intention to release the metadata for download, remix, reburn. And I think there could be—just like the app store—an engagement with the community at the data level that could produce some very exciting interfaces, views of the larger aggregation, and even products. Who knows, there could be some commercial products that come out of this. But there could be different stories told because we have access to this large pool of data, and we can do whatever we want with it. We can put it in our own interface. We can create a widget that is planted on another portal page. I think that's really where we're going to see DPLA take off from where other large, digital library aggregation projects have stopped, or have not found a way to go.
Do you think that DPLA has the potential to change the way that people do research as we see an increasing amount of Googlization and more expensive databases. Do you think that DPLA has the potential to change or disrupt that?
J.B: I do. This is not to underestimate the power of Google. We all use it every day, and it's part and parcel part of our library infrastructure now, so it's an important part. But Google does not open the back end of its index to us, does it? And DPLA is taking a stand to do that. It is saying: these data are open and can be exploited, can be reused, can be used to create value-added projects, or whatever. And that's going to be a great stimulus, I think, toward the kind of digital-library-derivative creation. We're going to see a lot of interesting projects and maybe even publications come out of that.
What challenges are you anticipating going forward as the DPLA grows and expands after the April launch?
J.B: I can see a number of technical challenges, as well as the challenges that we would naturally expect to find in terms of finding effective governing and decision-making models as a large, collaborative, multi-institutional entity. I have been working very closely with HathiTrust for the past several yeas, serving on its Strategic Advisory Board, and we've worked very hard on creating an effective governance model so that we have a trusted and well-understood way to make decisions that affect the entire community. That's going to be extremely important, and we will find our way in DPLA, but it's something that will take time and will take many iterations to get right.
I also think there are some significant technical challenges here—again, not insurmountable—but things like duplication of records in the database. We may even see this on Day One. We're going to have duplicate entries, especially around published works. So, how do we address that kind of “noise” in the digital library environment? How do we avoid letting that become a detriment to good user experience? When we attempt to de-duplicate—how do we choose the most authentic and best copy to save? Who has that? And then there's executing the technical process of deduplication itself.
I think we're going to face challenges like we do with any large aggregation in terms of unevenness of metadata provided. We will have some very minimal records, and then we will be dealing with very robust, full records. An, how are those records treated by a search-engine algorithm so that discoverability across these objects is as good as it can be? Similarly, how do we deal with search and discovery of documents represented with full text in an index alongside the metadata for, say, an image, which could be very minimal. These are some of the technical challenges that not just DPLA is facing, but may other large projects are in the digital library realm, as well.
What's your big-vision hope for the future of the DPLA?
J.B: I pin a lot of my hope on the vision that DPLA has cast for the openness and availability of the data. That presupposes we will succeed in creating a high-quality, massive index that has strong representation of local, state, regional, national communities. Another hope that many of us have for the DPLA is in its potential to strongly represent all levels of America.
Part of this prospect here in the aggregation is for the first time, seeing some of the large, national library efforts -- the National Archive, the Smithsonian, the Library of Congress, Harvard, the CIC institutions -- come together with smaller collections represented by the state and regional digital libraries. That has never been accomplished in a thoughtful and robust way. So, bringing that together with strong representation of all levels and then, opening up the back door of the data to anyone who wants to take it and be creative. If we can pull off those things, I think it will be unprecedented—certainly in this country, and maybe even internationally. And it holds the potential for delivering tremendous value to end users and to creators of content. In the end, we want to provide a rich and high-quality user experience, so that it will keeping bringing people back again and again.
By Annie Schutte, a librarian, teacher and consultant for Knight Foundation.