Kim Dulin and David Weinberger: Hacking the Library

David Weinberger and Kim Dulin co-direct the Harvard Library Innovation Lab at Harvard Law School, a group that explores the future of libraries, “because, well, we think their future needs exploring.” The Lab came to pass after John Palfrey left the Berkman Center and became Librarian for the Law School – he saw a need to re-envision and hack libraries to deliver the good stuff libraries have. The Lab is looking at libraries, broadly, not just the HLS law library.

Libraries tend to be very knowledgeable about what they hold in their collections. But they’re much less good about helping people discover that information. There are few systems like Amazon or Netflix recommendations that help scholars and researchers discover the good stuff within libraries. Dulin argues that librarians have been pretty passive in the face of new technology – they’ve purchased fairly primitive systems and had to buy back their content from the companies who build those systems.

Researchers tend to start with Google, Dulin tells us. They might move to Google Books or Amazon to find out more about a specific book. And perhaps a library will come into play if the book can’t be downloaded or purchased inexpensively. Libraries would like to move to the front of that process, rather than sitting passively at the end. And lots of libraries are trying to take on this challenge – new librarians often come out of school with skills in web design and application development.

The Lab hopes to bring fellows into the process, much as Berkman does. It works to build software, often proof of concept software. And innovation happens on open systems and standards, so libraries and other partners can adopt the technology they’re developing.

Two major projects have occupied much of the Lab’s time – Library Cloud and ShelfLife, both of which Weinberger will demo today. There are smaller applications under development as well. Stackview allows the visualization of library stacks. Check Out the Checkouts lets us see what groups of users are borrowing – what are graduate divinity students reading, for instance. And a number of projects are exploring Twitter to share acquisitions, checkouts and returns.

Weinberger explains that ShelfLife is built atop Library Cloud, a server that handles the metadata of multiple libraries and other educational institutions and makes that metadata available via API requests and “data dumps”. Making this data available, Weinberger hopes, will inspire new applications, including ones we can’t even imagine. ShelfLife is one possible application that could live atop Library Cloud. Other applications could include recommendation systems, perhaps customized for different populations (experts, versus average users, for instance.)

There are open questions about building applications that respect privacy, and questions about what’s the most appropriate development environment.

Paul Deschner, the lead developer of ShelfLife, shows off the alpha system, making it clear that the system might be in a pre-alpha state if only there were a greek letter that preceded alpha. He and the team are starting to open to tool to pre-screened audiences to start getting feedback and ideas. The metaphor for the system is the “neighborhood”, clusters that books can sit within. The audience for the tool is a researcher, scholar or professor, exploring the space of a library collection.

We see a search for “a pattern language”, referring to Christopher Alexander’s influential book on architecture and urban design. We see a results page that includes a new factor – a score that indicates how appropriate a title is for the search. We can choose any result and we’ll be brought into “stack view”, where we can see virtual books on a shelf as they are actually sequenced on the physical shelf. Paul explains that it’s actually much more powerful than that – many books at Harvard are in a depository and never see the light of a shelf. And many colelctions have their own special indices – the virtual shelf allows a mix of the Library of Congress categories with other catalogs.

The system uses a metric called “shelfrank” to determine how the community has interacted with a specific book. The score is an aggregate of circulation information for undergraduates, graduates and faculty, information on whether the book has been assigned for a class, placed on reserve, put on recall, etc. That information exists in Library Cloud as a dump from Harvard’s HOLLIS catalog system – in the future, the system might operate using a weekly refresh of circulation data. The algorithm is pretty arbitrary at this point – it’s more a provocation for discussion than a settled algorithm.

A question from the audience wonders why a fresh new approach to library data uses the old metaphor of books on shelves – was this the product of research? One advantage of the visualization, Paul tells us, is that there are a lot of parameters you can use to convey information – size, color, text.

Weinberger explains that the model, right now, sticks pretty close to objective reality – tall books are taller in the visualization. This, Paul mentions, often indicates a book with illustrations. David notes that we could use other factors – we could make books taller based on how many libraries hold them in their collections, or based on how often they are checked out.

One possible feature of StackLife is the idea that books could “friend” books – a book could refer to another book that isn’t included in the book’s home neighborhood, you can influence a neighborhood by bringing a book into the virtual space. A question from the crowd suggests we might use co-checkout data to enhance shelves – it’s an interesting idea but has privacy concerns related to it. And since the system is designed to work across libraries, simultuneity doesn’t always show that books are related.

Other neighborhoods could cluster books by subject, by author, by “also viewed” data, and by user-provided tag data. Author neighborhoods use Library of Congress sequence information to cluster groups of authors. Data on the web from Wikipedia, book reviews, lectures from authors, interviews on NPR are all linked to enhancing an author’s profile page. (Paul acknowledges that curation is a fairly massive challenge.) The goal of these systems are to increase serendipity, to maximize the possibility that viewers will stumble onto a book they didn’t know about and are excited to discover.

In discussing the system, Maura Marx points out that librarians are fierce defenders of privacy who often scrub their data nightly. Unfortunately, that means we’re throwing away fascinating data. Is the strong concern for privacy universal across librarians? Paul mentions that the data is thoroughly scrubbed when it comes to Library Cloud… though Harry Lewis notes that allegedly anonymized data like Netflix logs often turn out to be personally identifiable.

An audience question wonders when projects like ShelfLife go out into the world – “to work on another project, you’re going to have to leave this one behind.” David explains that keeping projects open sourced is a way to try to address this concern. And releasing the tool with an API may mean that the most interesting uses of the project are in the context of other people’s applications. This might mean that ShelfLife will be in dialog with GoodReads, Google Books and other projects that help people discover books. Library Cloud is different in the sense that they’re providing information to other systems, allowing others to build on the foundation of that work.

Maura Marx wonders whether the system is helpful in allowing readers to discover “canonical” texts, the five texts around a subject one must read to understand a subject. An interesting possibility is comparing “canonical” sets from one institution or geography to another.

David Abrahms notes that “ShelfRank” could be controversial, but is less so if users can adjust it. Then the aggregate of ShelfRanks could, in turn, provide interesting information. Grad students as a group might weight texts very differently than undergrads or professors, and provide information on what texts are most helpful for their peers.

Harry Lewis points out that the current system is vulnerable to borrowers who take out lots of books – how do we deal with people who take out lots of books indiscriminately. I wonder whether the system might allow people to publish their borrowing records and share them with other users. David notes that the balancing act between data and privacy is an extremely tricky one, and that lots we’d like to do is very difficult to do while taking anonymity seriously. It’s somewhat worrisome, he notes, that concerns about possible deanonymization is acting as a kill switch.

I asked a question about Kim’s stated goal – getting libraries closer to the start of a search chain. Does ShelfLife help us get there? Sasha, who works with Kim and David, mentions that the goal is to turn ShelfLife into a community site that’s capable of enabling user-created content. This content, in turn, will be available to Google and other search engines, which should also push new users into the ShelfLife system.

7 thoughts on “Kim Dulin and David Weinberger: Hacking the Library”

Pingback: …My heart’s in Accra » John Palfrey: The Path of Legal Information
ault November 14, 2010 at 3:28 pm

The reference to canonical texts causes me to recall a professor who would happily lead any student who disputed a grade to the library. He would then proceed to name a few texts (from the optional reading list, and some which were more obscure); if the student could demonstrate that they knew where to find them without assistance, the prof would amend their mark.
Pingback: …My heart’s in Accra » John Palfrey: The Path of Legal Information | After Today News
Pingback: Library hacking in Harvard : Stephan Humer – Internetsoziologie
Pingback: …My heart’s in Accra » Backwards, towards serendipity
Pingback: Dulin & Weinberger on the Meta-Library « Legal Informatics Blog
Pingback: El Oso » Archive » A Delicious Proposal For An Annotated Web

Comments are closed.