How can you not love a funny, funky lexicographer who titles her talk “ALL YOUR TEXT ARE BELONG TO US”? Or who disclaims her talk by explaining that she’s a female GenXer from the South, and that we shouldn’t take any of her pronunciations as exemplary?
Erin McKean explains that “lexicographer” is a word you’ll find in every dictionary, even a vest pocket English/Bulgarian dictionary. That’s because lexicographers want you to ask, so they can say, “Look it up.” People assume they’re the cross between the stereotype of librarians and stern nuns: grammar thugs who will tell you you’re mispronouncing “nausea” and then tell you to wash your hands.
“I’m not the judge of the Wesminster word show, I’m not the bouncer at the word night club… Words don’t go in the dictionary because I love them and think they’re beautiful.” Dictionaries, instead, are full of words people use.
(The word Erin wants you to use, because it’s beautiful, is “erinaceous”, which means of, like or pertaining to hedgehogs.)
People tend to treat dictionaries as unchanging, written in stone, as if it “came down from a high place where things are on fire.” But language is constantly changing – yesterday, during Bruce Sterling’s session, she took three pages of notes on new words.
Dictionaries are tools. (She recommends using them to pick up girls, saying it might work with her.) You can use them for spelling, meaning, pronunciation. They’re like power tools. But people don’t feel bad about not using their drill more often – they come to her and apologize for not using their dictionaries more often. She absolves us – “go forth, sin no more.”
Making dictionaries is hard – it requires figuring out how people actually use language. You can’t interview the world about how they use language, or surrepticiously record people. (The NSA hasn’t given her access to Echelon yet. She’s never really asked, just mentions it whenever she’s on the phone.) You need to take large amounts of text and distill it – dictionaries, she tells us are “the vodka of literature – odorless, colorless, tasteless but really powerful. And it goes great with Red Bull.”
Building a dictionary requires anaylzing over a billion words of text. This process is threatened by copyright, which fences off parts of the English language. People who think the language is unchanging may not realize the need to analyze it. But the changes are vast, and hard to follow. When Samuel Johson was building his dictionary, there were 250 novels published per year in London – all he had to do was buy and read those novels. But the printing industry is vast – $25 billion in sales a year – and buying all this data is unrealistic. Instead, we need to scan it.
This scanning shouldn’t be threatening to publishers. “I don’t care about your plot, or your ideas – I just want to analyze your use of the language.” It should be considered fair use… “but this is America – anyone can sue anyone for anything.” And just the threat of a lawsuit is enough to prevent lexicographers from analysing some texts.
She begs us to make changes to the copyright pages of our books so that lexicographers have the explicit right to analyze them. (I’ll be putting the idea in front of Larry Lessig, to see if this can be yet another selling point for Creative Commons.)
Finally, she gives us the intriguing idea of dictionary APIs – what does a dictionary mashup look like? What if you could mouse over every word on a webpage and drill down into the dictionary information of them?
Did I mention that she maintains a Dress a Day blog? I may be in love.
(It gets even better – asked about her feelings about Wiktionary, she says, “Their ontology does not need to recapitulate our phylogeny.”)