January 26, 2012

David Weinberger: Too Big To Know

David Weinberger‘s new book “Too Big To Know” (#2B2K – be sure to pick book titles that make good hash tags…) launched last night at Harvard Law School with a talk entitled “Unsettling Knowledge”. If you know David’s work, it’s obvious that the title is a pun. And David’s new book is a wonderfully unsettling piece – it challenges our notion of what knowledge is, and introduces the uncomfortable question of how we navigate this new space.

Knowledge as we know it is coming apart, David tells us. The bastions of knowledge, the physical emblems of knowledge, like encyclopedias, newspapers and libraries are undergoing radical transformation. We know we’re heading into a future that’s deeply different, though we don’t know quite how. The manifestations of knowledge are at risk, and all it took was the touch of a hyperlink.

How did these institutions fall apart so quickly? It’s an impossible question to answer, but he offers one path through the thicket. He starts with a famous quote from Daniel Patrick Moynihan, who tells us “Everyone is entitled to his own opinion, not his own facts.” This is the promise of knowledge: that if we all got together and had an honest conversation, we can eventually come to an agreement. There is knowledge and it can bring us together.

We tend to assume that knowledge gives us an accurate picture of the world, built up bit by bit, fact by fact. In acquiring knowledge, we nail down each piece with certainty. And we see knowledge as a product of filtering and winnowing – we move from perception to true perception, from a mob of opinion to true belief. Knowledge is about finding gold within the flux.

We’ve always had to filter, based on the fact that the world is way bigger than what fits in our skills. There’s too much to know (quoting Anne Blair’s book “Too Much to Know“) and the world is too big to know.

Traditionally, we’ve handled this by breaking off a brain-sized chunk of the world and getting an expert to understand it. Once we’ve got that expert, we can stop asking questions: we simply ask the expert. Experts, and the credentials that create them, are stopping points. They’re points beyond which we don’t need to look any further.

But that’s how knowledge works on paper. Books, for all their magnificence, are a disconnected medium. They are contained within covers, they are shelved apart, they don’t naturally connect to one another. The author’s job is to put everything she knows on a topic between two covers. The arguments move in sequence, from the beginning to the conclusion. And because the book is an essentially limited medium, good writers ruthlessly cast things aside, deciding what it put in the book and what is excluded. Books are born of long-form arguments, moving us forward step by step, brick by brick.

Links are a new form of punctuation. They give you a means of continuing. In the print world, to follow a footnote in a book, you need to get on a bus and go to the library. That’s why we don’t generally follow footnotes. But now we can jump from one book to the next. It’s a magic map – touch a place on the map and you go there.

The internet is an environment that’s all about connection and our knowledge is picking up properties of the medium. Knowledge in this space is characterized by the fact that it’s “too much, messy, unsettled, and unstructured”.

Clay Shirky suggests that there’s no such thing as information overload, only filter failure. This is a very modern response to an older question. Futurist Alvin Toffler warned us about information overload, popularizing the phrase. It’s an extension of the idea of sensory overload, the idea that too much input could overwhelm and paralyze you. This is based on the faulty assumption that brains are information processing machines, and that we can overwhelm and crash them.

This line of thinking led marketers to conclude that choosing between 16 brands would be overwhelming to an American housewife and that fewer choices needed to be offered. But we’re now headed to a point where there’s an exabyte of genomic information available, and that number doesn’t lead us to paralysis, but to fascination. We’ve redefined the term “information overload” through how we use it.

We’re less overwhelmed because we’re learning different ways to filter. When we filtered in the print world, we did so in a way that prevented us from seeing the dregs. We saw only the books that our local library chose to buy, and only the books the publisher chose to print. The manuscripts filtered out of that process were invisible to retrieve through ordinary means.

Now, in a digital age, we filter forward, not filter out. All that information – some of it very low quality – is out there somewhere on the internet. We could curate and try to delete the stuff that’s wrong, hurtful, harmful or hateful. But it’s expensive to exclude information and cheaper to include everything. When you curate, you’re making decisions about what is interesting to your users, and no one can accurately predict what might be useful to a researcher in the future. Filter out all the gossip and crap from new media and you harm the scholar who wants to study celebrity behavior. You couldn’t have predicted the high level of interest in notes from a committee meeting in Wasilla, Alaska in 2008 until Sara Palin became a public figure.

The web has worked by developing tools that include all content and filter when we retrieve it. As recently as a decade ago, information retrieval experts told us that ordinary users would never use tools this complicated. But now we use them everyday, because we have to. And we’re seeing much better tools, like Shelflife, the tool Harvard’s Library Lab has created to allow users to browse the vast set of information in Harvard’s library systems.

We don’t just have a lot of information – the information is very messy. We like order – David shows a slide of zoological specimens, beetles mounted on pins – and we’re very good at establishing it. We understand where everything fits in a tree of species, based on similarities and differences. To know where a species fit into this tree was to know how the world works – to not know it was to be adrift.

In the physical world, there’s only one way to sort manifestations of information. You might want to sort your CDs by artist, while your partner might want them sorted by genre. There’s only one possible they can be stacked on the shelf, because no two things can be in the same place at the same time. In a digital age, we simply make playlists. We end up with a mess of information, but it’s a rich and fertile mess.

Figuring out where things fit in the natural order of things was an essential piece of being human. Human beings saw ourselves as “the knowers. But there’s multiple orders and multiple ways of categorizing, through tags, playlists and other ways to sort information. Messiness is an essential feature of how we scale meaning. But, David warns, we still tend to think of knowledge in the ways we did when books had to sit on a single place on the shelf, when knowledge had a single, possible, right form, rather than multiple forms.

Knowledge is too big, messy and wildly unsettled, just like the internet. “For every fact on the internet, there is an equal and opposite fact.” David warns that there is nothing we all agree on – you can find someone willing to argue that 2+2 is not 4 (and, indeed, a quick Google search shows this to be true.) We don’t agree about anything, and David warns, we never will. “This doesn’t mean there are no facts – but it does mean that people are going to insist on being wrong.”

What this persistence of disagreement means is that the promise of knowledge Moynihan offers – that we can agree on a set of facts and then argue our opinions – is not going to be fulfilled. As it turns out, we don’t even know whether Moynihan said “everyone is entitled to his own opinion, not his own facts” or whether that’s exactly what he said.

The good news is that we’re rapidly developing ways of dealing with difference and disagreement. YouTube has a crummy commenting system, as is well documented and well established. David shows us a threat of comments on a recent Batman movie trailer. Somewhere deep in this comment thread is an impassioned argument about circumcision. It would have been great if YouTube supported forking of conversations. Forking is a powerful way to deal with disagreement. It’s very hard to do in the real world without social consequences – if we decide to move away from the dinner party to our own table where we talk about circumcision, it makes people uncomfortable – but it’s very easy to do this on the web.

In the 19th century, it was very challenging to classify the platypus. There was one space in a taxonomy for warm-blooded animals, and another for animals that produce eggs. Scientists thought the platypus must be a hoax, because it didn’t fit within existing categories. Even when presented with a specimen from Tasmania with eggs intact, they fought the platypus “hoax” as something that didn’t work within existing categories.

Now we can solve problems of overly rigid taxonomies by using linked namespaces. We can create a database of names, and a database of taxonomies. We can deal with the platypus and the water mole, and map scientific and colloquial names onto different possible structures. “Pick your name, pick your taxonomy and get on with your life. So what if we disagree? Yay for difference!”

David is actually quite concerned about difference, and just how much difference we can tolerate and still interact and function. He acknowledges that there’s a human tendency towards homophily, flocking together in groups united by race, gender, belief, socioeconomic status, etc. This can lead to a serious challenge to public discourse – echo chambers that can solidify beliefs, making them more extreme and polarized. But David worries that posing issues this way relies on an unquestioned assumption: that conversations are between people who disagree deeply and looking for solutions and common ground by trying to get to the facts. This analysis misses the social role of conversation. We need so much context and so much agreement to even have a conversation. “To have a good conversation, you need to have 99% similarity and 1% difference.” He suggests that some of the work Yochai Benkler and I have been doing may help us find productive paths towards including difference, but reminds us that the high level of disagreement and the difficulty of finding common ground is likely a core feature of the internet and knowledge in an internet age.

Finally, knowledge in this new paradigm is unstructured. We’re used to the idea that knowledge has a basic structure. We have grown used to long form arguments that take us from A to Z, and we’re particularly fond of arguments that take us from A to Z in an orderly path, where Z is an unexpected place to end up. “This is a magnificent form of thought, but the long form argument is losing it’s preeminence.”

We might think of Darwin as a leading proponent of the long form argument. And his argument certainly led somewhere unfamiliar. But he wouldn’t have analyzed data for years and released a massive book if he were working today. He would publish online. And even if he didn’t, the conversation about his work would be based online. Whether or not we imagine Darwin tweeting from The Beagle, the web is where the thinking about and reacting to Darwin’s work would take place, and collectively, it will have more value that Darwin’s long form work taken alone. Moving forward, we will not just see these long form works, but the webs that precede and follow them.

Michael Nielsen has recently written about scholarly community reaction to results at CERN that offer evidence for faster than light neutrinos. As these results came in, they were posted to arXiv.org, a journal preprint site. They stirred up a firestorm of interest and reactions. Some of those reactions are brilliant, some are stupid and wrong. But that welter of discussion is where knowledge is – it’s taking place outside of printed peer review journals.

Darwin spent seven years studying and dissecting barnacles before working on The Origin of Species. His two volume work on barnacles includes countless facts, and his hard work to discover and pin them down was an act of nobility. But science doesn’t work quite like that anymore. We work with clouds of data about genetics, astronomy, and other topics. These data clouds are fundamentally different than facts. When data.gov released sets of government information, they didn’t clean or normalize it ahead of time – they released raw data. They concluded that it was better to put the data out there than to constrain themselves to information that was consistent and known, for the simple reason that this constraint would have slowed them down badly. Darwin would not have agreed – he spent seven years on one fact.

There’s value in getting the data out quickly, David argues. It may be the one approach that’s scaleable – releasing raw data and letting individuals and groups clean, analyze and share what they find. Peer review scientific journals don’t scale, but perhaps peer to peer peer review might. We’re seeing growth in the Open Access journal field, particularly in spaces of repository where data is released, not peer reviewed.

One way we can start making sense of these new data sets is through the magic of linked data, a format suggested by Tim Berners-Lee, father of the web. We organize information in triples:

the platypus | lives in | Tasmania
Watermoles | lay | eggs

When we link triples to a central reference, we can resolve our platipae to water moles and link our triples together. Facts, which used to look like bricks, now look like links.

David closes by returning to his original question: why were old knowledge systems so fragile? These systems assumed knowledge was bounded, settled, orderly and proceeded step by step. But that’s not what knowledge feels like in the age of the internet. It feels unbounded, overwhelming, unsettled, messy, linked and governed by our interests. And those properties are the properties of what it means to be human in the world.

“Networked knowledge may or may not be truer about the world, but is is truer about knowing… This crazy approach to knowledge feels familiar to us, because it’s how we tend to know.” He closes with an observation that’s both hopeful and unsettling: “What we have in common is a shared world about which we disagree, not a common knowledge we share and can collectively come to.”

I’ve followed David’s work for a long time, and had the pleasure of watching him work through the ideas behind this book – David and I are both part of a group at Berkman that helps colleagues explore book-length projects. While I’m familiar with this line of David’s though, it was exciting and unsettling to hear him work through these ideas covering the whole arc of the book. I think this may be the most unsettling and radical book David’s put forth. On the one hand, it’s not a surprise that people will disagree on any concievable fact. But David’s suggestion that we give up on achieving an impossible consensus and proceed with the hard work of getting on with our lives strikes me as challenging and liberating, a very different path than I hear from most activists and advocates. I’m enjoying wrestling with the ideas David puts forth both in this talk and in the paper and hope lots of readers will take up the challenge as well.


  1. Is David arguing that knowledge is unstable because we can no longer come to consensus about facts, a la Moynihan? I actually feel comfortable living in a world without knowledge (consensus about facts) but where facts themselves still remain. Is it possible to make this semantic split? In a previous era most experts “knew” the world was flat, but in *fact* the world was not flat. If we now live in a world without knowledge, but where facts still remain, I am okay with that.

    Comment by Mary — January 26, 2012 @ 7:17 pm

  2. First, thank you Ethan for your liveblogging that, as usual, puts the ideas better than the person you’re liveblogging did. Phenomenal.

    Mary, I believe we’re in agreement, and thanks for the clear statement and clarifying question. I do believe that the world is one way and not another, that is, that some statements are true and others aren’t, and that there are facts. Because I’m old fashioned, I even believe that various domains have methodologies for discovering facts and for coming to general agreement about them, with some imperfect degree of confidence. Science works! History works! Neither can (or does) claim to have a perfect grasp of facts, but that’s what happens when you choose to be born as a human.

    And, yes, I am only pointing out that the empirical evidence seems to suggest that even though there are facts and there are disciplines that have highly developed methods for agreeing on facts, Earthlings are never going to come to complete agreement about those facts. The arguments will not be resolved, not even by referring to facts (or, more exactly, by referring to our grasp of the facts).

    I, too, am okay with that. Although I do find it unsettling.

    Comment by David Weinberger — January 26, 2012 @ 8:11 pm

  3. Looking forward to get the book. The comment and the blog post makes me to think the conversation I had with one of my friend. As my friend points out it is becoming impossible and difficult to come up with a reading lists of books similar to Mortimer Adler nowadays.

    Comment by Henok — January 27, 2012 @ 12:21 pm

  6. Ethan:

    Do you have reference for Tim Berners-Lee and linking in triples? I would like to learn more about that.


    Comment by Don Bailey — February 16, 2012 @ 3:20 am

  7. Of course, the platypus is a bit of a red herring (to mix taxonomic classes); the critter was taxonomized long before the Internet came along to let everything be miscellaneous. And theories underpinning that taxonomy shifted and changed may times through the medium of print.

    Would Darwin have tweeted from the Beagle? I’m not so sure. To the consternation of his colleagues, Darwin kept natural selection safely sealed off from the very-robust mechanisms of communication that existed in his time. The media didn’t prevent him from parceling out his work in pieces; cultural and personal concerns did the trick. Of course these things are interrelated—but in complex feedback loops. Then as well as now.

    It’s an interesting thought experiment, though, to wonder what tweeting from the Beagle would have done for natural selection. “Another crank who thinks two plus two doesn’t equal four,” many would surely have thought.

    Comment by Matthew Battles — February 19, 2012 @ 10:05 pm

  9. This looks like a triumph of style over substance. A collection of non-sequiturs and half-arguments. It’s possible this reviewer has simply conflated knowledge with cognition, which would explain this preposterous review, but either way, this book looks like a prime example of its tenet: there’s never been a better time to be dumb – especially if you can write a book as dumb as this, safe in the knowledge that there’ll be a market of numpties ready to lap it up.

    Comment by Barbara Otter — March 29, 2012 @ 9:34 am

