Saturday, 2006-10-28, 17:00
Greg Crane, “Intellectual Life in the Age of Google”

This one may be of interest to you non-TEI folk, too. V. belated final entry of a series.

Crane teaches at Tufts and is the founding head of the Perseus Project. His talk was apparently cross-listed as a UVic Lansdowne Lecture, so the audience doubled suddenly….

This write-up is a loose paraphrase; the talk jumped around a bit, so I haven’t tried adding connective pieces.

The digital world has no indigenous people–separate space. Do we want to see (something like) Canada or the problems south of it in this New World? (Bar chart of annual budgets for NIH, NSF, IMLS, and tiny tiny NEH.) Democratic valuations: what do budgets say? Choose the world we want: the old world is changing. Separate your tendentious anecdotes from the values they advance. If thousands read and no one corrects, how important are those errors? [At this point some people started frowning.] Contextualize, and consider the larger picture–cost versus accuracy, and types of cost.

Who has the wagging rights–does print dog wag digital tail? Disciplines that people invest in have already gone digital (sci/med).

How many projects assume their users have access to a huge research library? There’s an issue of building add-ons for specialized research versus building new space to foster the intellectual life of a society.

We’re in the Age of Google, loosely–global audience, cross-culture, cross-language; massive collections are subsuming traditional media and supporting new intellectual production. Don’t ask whether the net allows the same resources as some other medium, but rather to what one needs access for a rich intellectual life.

Power struggles [encapsulated by D. O’Donnell’s interaction with Wikipedia–generally, the later keynotes were quite good about referencing material presented earlier in the conference.]

Changing measures: collections acquire structure as they get bigger (more stuff) and smarter (better markup and metadata). Things that’re good and flexible beat better and static. If you find errors, you fix them and recirculate the improvement; let the community evaluate the materials. In some contexts, it may be that no one cares enough to notice and fix something.

The net isn’t junk anymore when compared with libraries, due to the massive digitization initiatives. See also a related CFP.

Four core technologies underlie million-book library collections: page image –> text, one language to another, text to data (database, e.g.), and customization / personalization. Million-book libraries online have almost no descriptive markup and many errors, but they’re available practically everywhere: persistent while funding lasts.

What’s the role of “artisanal” collections? –relatively small sets prepared better with a special front end. The wide-open stuff can use a common interface (familiarity) because few specific details need to be represented.

Is it the case that TEI : Google Books :: Xerox NoteCard : World Wide Web? [On NoteCard (pdf)] In other words, beware, TEI, lest you become so specialized and intricate that you lose relevance.

One collection’s response is to avoid competing where it’ll lose; identify strengths, redefine its institutional role, and open up existing materials. Use markup to enhance value, and expand the extant stuff. Train OCR instead of dumping more and more without improvement. How to deal with an open-access world? –release content under CC.

If new work cannot attract funding, how valuable is it? We do need instruments of capital-formation, though, such as UMich’s Text Creation Partnership….

TEI in two arcs:
–as output format: automatic tagging of milestones, personal and place-names tied to ID lookups; that way, one can pull all place-names mentioned, for visualization as well as help with correcting errors
–as training data: compare TEI output with OCR data. Find all the one-time-use words, use them as bounds, and then find dual-unique words that define a sequence. That way, one can compare output from different editions as well as use the uniques as keys to align two texts later (parallel-text presentation).


Poppeau: Google has many, many errors (uselessly so) versus BNF’s Gallica.

O’Donnell: traditional training is better defined, currently, and people are more interested in it, partly as a result of these large-scale projects everyone hears about. How do people learn that they need to correct, after all? These projects won’t teach them that, nor how to correct things. Perfectionism may not be essential everywhere, but it’s necessary in some contexts; otherwise we’ve no baseline.