how to make a digital scholarly edition–and why

[These are my speaking notes, cues and all, from a talk delivered at the C19 Americanists conference, Berkeley, CA, 14 April 2012.]

If you’d prefer to read this talk yourself, it should be visible now at

The abstract I submitted last fall declared that I’d draw upon Autobiography of Mark Twain, Volume 1 to show that scholars hold the means of production of editions more immediately now than in any prior decade since the 1470s. I’m trained as a medievalist and a manuscript scholar, and this is not hyperbole. I’ll touch upon some reasons why one may want to create a scholarly digital edition, then walk through some necessarily abstract components of creating one, with the goal of demystifying a starting workflow. To the extent that my talk has a thesis, it is this: you, too, can make a digital edition if you have even the slightest interest in doing so. This talk thus “draws upon” my Autobiography experience and related work to enable you to do something entirely different from the Autobiography, for reasons that’ll be clear a bit later.

For context, here’s a nutshell of Mark Twain Project Online, otherwise MTPO, and the project that underlies it. In 2002, the National Endowment for the Humanities required that the Mark Twain Project, as one of its flagship grantees, begin publishing critical editions electronically as soon as possible. This requirement was fulfilled with MTPO’s launch in November 2007, encompassing some 2300 edited letter texts—about 700 of them new, the rest converted from our printed editions—with the facility to display edited letters in “plain text” alongside annotation and textual apparatus. At launch time MTPO was a three-way partnership amongst the Mark Twain Project at UC Berkeley, the Publishing Group at the California Digital Library, and the University of California Press. In 2009, CDL’s Publishing Group stepped back in order to focus upon UC-wide undertakings, and MTPO gained two converted print volumes of literary work, one of them the massive 2003 scholarly edition of Adventures of Huckleberry Finn. We few at Berkeley have done nearly everything ourselves since then—more slowly—with capable server support from colleagues in the Library Systems Office.

The bestselling, PROSE Award-winning Autobiography of Mark Twain, Volume 1, published simultaneously on paper and the Web, is thus only the most recent critical edition our office has published in a forty-five year period. It is also the first edition we’ve produced for which the crucial textual apparatus has not been printed. In this way there’s a scholarly edition, the 736-page volume published in 2010 by UC Press; UC Press also has a DRM-locked PDF and epub, as well as a 2011 “reader’s edition” that strips the annotation and condenses the introduction. Amazon sells a Kindle version of both. But the only version that qualifies as a critical edition is our Web publication, which includes the equivalent of 200 printed pages of textual apparatus keyed to the text, as well as a bonus color scan of the Harper’s special issue celebrating Mark Twain’s seventieth birthday in 1905.

I should clarify here that when I say “edition,” I mean the culmination of research-backed inquiry into a text’s transmission history. If a text exists in one fair-copy manuscript or typescript with no hard to read words, nothing crossed out or inserted, and no evidence that it was published in any way—including by another scholar on their defunct GeoCities website, for example—then your task is relatively straightforward, whether you choose to render the text exactly as presented in that single attestation or whether you intervene editorially to emend punctuation and so forth. Otherwise, if your text exists in multiple iterations, you’ll want to collate them to see where their readings differ and, ideally, to report in some way what the differences are. Here it’s helpful to refer you to Greetham’s introduction to textual scholarship or—better—to Bowers, Greg, Hirst, Bryant, Zumthor, McGann, Cerquiglini, and others. It’s likely that many of you are familiar with such concepts and approaches already, but I can’t count the number of times that fellow scholars have asked me, “Why would you need to edit Mark Twain?” as though my office-mates and I spent our days proofreading his grammar and syntax or trimming prolixity. If you don’t usually work with textual apparatus or scrutinize the transmission of editions you read or teach, take a look at Wesley Raabe’s essay in The American Literature Scholar in the Digital Age, which is only one of the more recent challenges to scholars to attend to how the transmission of print and digital texts may affect the readings of key passages.

Whether you mean ultimately to stock an arsenal, as Kenneth Price has suggested, or whether you conceive of a scholarly digital edition in a more traditionally constrained way, there are many ways to extend and enrich a process that begins with the deceptive simplicity of scrutinizing a text’s history. But without fussing too much about the terminology—because such concerns will reveal themselves to you later, unasked—what would be ideal is for everyone here who hasn’t done it before to go and choose a short story, newspaper printing of a speech, short poem, or similarly brief piece that you’ve worked on, or that you enjoy sharing with students despite its underappreciated nature, and make it into a miniature edition. Go through the process with something both familiar and finite, not a novel or a major treatise, because if you start with something novel-length, you probably won’t finish. Get as far as you can stand to do; ideally, write notes to yourself, or blog, tweet, or podcast the experience so that you’ll remember later how you approached the early stages and what your unexpected cool insights were. It’s important to engage responsibly in scholarly work, but scholarly editing and publishing is still a “fail better” process, where—let’s be honest—you’re unlikely to achieve all of your goals on the first try, simply because the process involves so many variables. As you proceed, try to avoid the false dichotomy between “scholar” and “technical expert”—not only in how we theorize divisions of labor or discuss workloads for specific projects, but how we think about ourselves. Digital editions don’t work without a willingness to play and experiment, which is at odds with how academic roles currently tend to cast us and shape our time. One reason why so many editions of previously ignored work were published during the nineteenth century in the US, England, and Scotland is that textual editing was seen as an attainable activity, something anyone with a reasonable education could attempt. Some of these editions are terrible, granted, but the cynicism of Sturgeon’s Law doesn’t have an exemption for scholarly work.

Try it with the bar lowered deliberately at first, therefore, and figure out as you go how you and others can improve upon those baby steps. Then you will have some amount of project under your belt even if you don’t finish, which is much better than no project in this context. You can use what you’ve learned to apply for funding, share praxis and theory with your students, write up hard-earned wisdom in your next promotion cover letter, talk about bizarre moments in interviews, or even create a bridge into a related project that doesn’t focus upon an edition. Besides this, you’ll have what I call “recognition vocabulary” to aid in lengthening your reach for future work, since you’ll have made excellent progress towards determining the contours of your ignorance, which—as you know—is key for beginning to master any field. I’d suggest that you’ll also be a more confident colleague when reading and critiquing others’ digitally based work. Sadly, despite the importance as well as the appeal of the fluctuating confluence of theory and praxis which we call the digital humanities, there are no magic bullets anywhere in it except the one with which we occasionally shoot ourselves in the foot.

How does one stumble upon productive thoughts about implementing a project and find potential collaborators? Two key loci are DH Answers, which runs both a forum and a Twitter account, and the DiRT wiki—Digital Research Tools, now hosted by Project Bamboo. If you’re interested in a particular piece of software, there’s a good chance that fellow scholars have blogged their uses of it, and reading about their uses or disappointments may save you valuable time in deciding which tools you need most to become familiar with. Get your students involved, not to do the work for you (which some people seem to think is ideal) but to furnish ongoing accountability for your project, so that the project keeps moving. Publish to the web and make your work free to access, to grant it a wider distribution. You can link to a website until the proverbial cows come home; you cannot actually give away free books to encourage readership—much though I like codices—because people won’t take them. For that matter, websites have the ineluctable advantage of not requiring either a print run or a supply chain. MTPO isn’t the most frequented site, partly because it focuses upon a single writer and isn’t a major hub; nonetheless, MTPO pulls down between 1500 and 2000 visitors per month on basically zero advertising.

One thing not to worry about: whether editions are hot enough, and in particular whether you, in pursuing a digital project, are undertaking the right kind of work in terms of anticipated external rewards. Only in retrospect, as Stephen Ramsay observes in a recently posted talk titled “The Hot Thing,” is it possible to discern that one chose “correctly” and got lucky in aligning zeitgeist with personal strengths and interests. It is always the right time to edit a text, precisely because it is never the right time.… And though Natalia Cecire rightly critiques the rhetoric of doing things “hands on” in the digital humanities and finds flaws in its implied recapitulation of contingent labor, it seems to me that doing work in this fair field full of folk is a solid way to participate in what is happening, to shape it, and thus to effect change.

Here are some things to consider if you aren’t only working on a contained exercise for yourself; ideally, as I said, you’ve chosen a text that you like to teach so that the pedagogical aspects of creating and using a digital edition may be shared with your students as well. (This is the part where I attempt to demystify one possible planning workflow.)

First, for whom is this edition most useful? How much help do you want to offer to readers? Are you shaping its mise en page and annotation specifically for undergraduates, or is this something for fellow researchers and thus equipped with more textual notes than explanatory ones? Do you expect a reader to be able to cite or link to specific moments in the text, or will they have to settle for a link to the text’s landing page? It’s common to say that an edition is intended for all readers—access open to everyone!—but it can be difficult to provide editorial support for all potential levels of familiarity with a text’s context without overwhelming and obscuring the text itself. In some ways this is less about anticipated readership and more an issue of interface. How will you convey information clearly? And in a related issue, if you’ve never dealt much with interface design, spend a bit of time looking at existing sites and portals, to figure out what works and doesn’t. Ideally, buy someone lunch (or better) in exchange for their honest feedback on your design. There are best practices for design and “user experience” (UX) as well; even I don’t think that one needs to become an expert in everything, but becoming acquainted can only help you.

Second, think about the intellectual-property framing for the edition and how the text will be shared with readers. Will you put it on your own website, submit it to a journal that already has infrastructure in place, or arrange something with a library, a center, an extant archive project, or an academic press? Do you have copyright issues to negotiate, where a press’s formal backing would assist with gaining permission from an estate? I’ve flagged this as IP framing, but there are basic practical consequences as well. Partnering with projects already underway usually means inheriting their stylistic choices, which is a mixed blessing. Otherwise, there are free-to-use publishing applications that aren’t too hard to learn to use.

Third, how much metadata infrastructure is needed? Metadata is basically information about information: a bibliographic citation contains metadata pointing towards an article you’ve cited, for example. Metadata in a digital publishing context can be elaborate, such as to enable advanced search, now ubiquitous at larger sites; or as simple as a bit of structured text that describes when and by whom a digital object—your edition, in this case—was created. I often hear that only librarians care about metadata, but you should, too, because when it’s wrong, your web search result or your research into a digital or physical archive is incomplete, silently, and you may miss out on important information. Without good metadata, no one will find your work.

Fourth, especially if you make your work public without an institutional partner, how will you do so, and how will you account for preservation? Sustainability is overlooked all too often, except when libraries take part in a project, and there are pragmatic considerations as well as ideals. It’s helpful for a digital repository to store a snapshot of your files in case your webhost loses the files without warning, for example. The web has existed in something like its currently recognizable form for twenty years now. Twenty years hence, when your attention may have moved on to a completely different set of concerns, how will an interested party be able to read your work?

Finally, how will you arrange for peer review after the project reaches a stage where review makes sense? As far as I know, the only publishing modes for a digital edition which come with peer review built in are submission to a journal and, sometimes, submission for inclusion in an extant archive. If you’re preparing a text primarily for your own teaching, perhaps it doesn’t need peer review, but if someone’s willing to check your work in this way, great; you’re likely to receive some useful and unexpected feedback. But—by all means—if your project is small and you finish, submit it to Scholarly Editing (that’s the journal’s actual title); if it’s larger, consider especially the presses at Virginia or Michigan.

Once you’ve considered these questions, it’s time for a frank self-assessment of which parts you know how to accomplish and which need substantial research or a colleague’s input. For things not addressed by your extant contacts or an online forum, you can write to the maintainers of sites that implement something specific you’d like to know more about; I receive such queries once in a while.

Instead of attempting to introduce specific software tools, which even with slides would be difficult to follow, I’d like to point you towards a preliminary link-list I’ve compiled at There are sections for

  • Transcription
  • Text Encoding
  • Software Aids
  • Conversion to Web, Search, and Other Niceties
  • (Parts of the) Community

The largest want or need at present, it seems to me—and I ask you to forgive an outsider for what may not be a sufficiently informed view—is not the little chasm between desire for and implementation of a digital edition. As I hope I’ve shown, there are good ways to bridge that gap. Instead, the problem is too much distance between where most of us stand and a more traditional toolkit: textual criticism, with related training in palaeography, codicology, and bibliography. It’s possible to reinvent the wheel without formal training in those areas, which is convenient given that formal palaeographical training for nineteenth-century Americanists is rare to nonexistent.

I’m in the early stages of a project to help reduce that distance with a resource for “late” Anglophone palaeography. Some helpful web resources exist for hands prior to 1800, but they end there as though modern hands were uniformly legible to a modern scholar. As writing by hand becomes defamiliarized in contrast to time spent typing, pushing, sliding, and poking, it seems to me that we can take basic acquaintance with a range of possible letter forms even less for granted. Elaine Treharne tweeted a comment yesterday from a talk Jerome McGann gave at Florida State: “Editing, bibliography, book history are the most pressing programmatic needs, not Python or xslt.” We need both kinds, I think, with neither eclipsing the other, and I hope that I’ve been able to offer some pathfinding aid.

This entry was posted in events, work. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted 2012-04-17 at 13:53 | Permalink

    Great post–nice to see the challenges and rewards of scholarly editing outlined clearly and honestly. One point about having a web edition peer reviewed: sites like NINES ( and 18thConnect ( perform this valuable service.

    • Sharon
      Posted 2012-04-17 at 17:56 | Permalink

      Thanks, Justin! Good to know about additional review possibilities. The MLA’s CSE has also reviewed some websites, MTPO and the Blake Archive amongst them.

3 Trackbacks

  • Categories

  • Meta