12: Interoperability of Metadata for . . .

Tue 5 Jun, 9:00-10:30
Session 12: Interoperability of Metadata for Thematic Research Collections: A Model Based on The Walt Whitman Archive
Katherine Walter, Brett Barney, Julia Flanders, Terence Catapano, Daniel Pitti

[Greetings from the NCSA building at UIUC. Yes, I haven’t finished the Kalamazoo posts. Best put this set up as I do them, though. . . .]


K. Walter opened by talking about digital collections generally and gave an overview of the Whitman Archive. Using EAD, the team created repository-specific finding aids, then a unified guide. The finding aids weren’t uniform at first, of course–challenge to interoperability. The current undertaking explores gaps and redundancy in tagged data and looks towards METS.


B. Barney began working for Whitman Archive as a grad student in 2000, encoding texts in TEI. Since then his work has branched out. The site began in 1995 in HTML and moved to (maybe towards) SGML/XML in 2000. The archive includes seven editions of Leaves of Grass as TEI as well as page images; manuscripts (ditto); finding aids for the MSS (either created by Whitman Archive or edited to suit); ~150 reviews (currently in transition—served as XHTML, exist as TEI-based XML but aren’t 100% proofed yet; no page images); 130 portraits in a db as well as XML flat file, with images; a number of monographs and essays about Whitman (out of Iowa, mostly—TEI + illustrations); bibliography of scholarship (mySQL; content goes back to 1940); info on first printings in periodicals; archive of the archive as it existed in 1999. Barney has become interested in METS for its utility as a management tool. Whereas many projects using METS are focused on a single repository, Whitman Archive is looking at dispersed objects.

Redundancy issue: TEI overlaps EAD for some types of content and metadata [henceforth “md”]; one idea is to drop the overlapping bits into METS to reduce error, inconsistency, etc. Much of what appears redundant isn’t, however; one needs a <title> in TEI to help human readers and workers identify an object. . . . Some relationships that the Whitman Archive wants to describe aren’t things that METS cares about, and some relationships are contingent. Before METS can be used well for thematic research collections, some social and technical difficulties need to be overcome: steep learning curve, problem of how to adapt METS for this use.


J. Flanders: her role was to wear the TEI hat and help to work out arcane issues. How do thematic research collections fit into (or stretch) our prevailing sense of digital projects? One shared assumption is that digital undertakings should be repurposeable, that they create something interestingly synthetic out of interesting smaller components. (Cf. MONK.) The more modularity that can be built in, the better a job of repurposing we’ll do when it’s needed.

One has beliefs re: what the transcribing scholar sees on the page, e.g., in terms of what to encode and what’s been decoded or deciphered; there’s an authorial reference point (normalization, etc.); there’s beliefs about textual genres, expressing disciplinary perspectives, and how a text is related to other texts; and one wants to express certain intentions via interface design decisions (how one may sort data, e.g.). The front end is, in a way, the focal point for how a project’s editorial perspective (or perspectives) constrains the user’s experience.

Does a library intend to ingest a collection, versus does a library want only to ingest a collection’s objects: the answer depends on how the library positions itself—as publisher (who can repurpose materials later) or as purveyor / curator (who transmits materials in the form in which they reached the library). What sort of responsibility does a library have, and to what extent are the intentions themselves recorded in a way that can be studied?

Interesting effects from this undertaking from a TEI perspective: the information design that enables such distinctions is useful. (One does want to distinguish products of scholarship from raw materials, as it were, as part of a collection’s structure.)

Several possible models: data-only approach—library ignores the work scholars have done to coordinate files and merely sweeps everything into a digital storage-and-service area; data and connections approach—data are of primary importance but connections amongst digital objects are vital; data connections and interface approach—interface is primary, so ingest entire thing as it’s purveyed as well as the back-end connections.

The area that most needs work is how scholarly decisions and shaping of a collection are documented. Things are “trapped” in XSLT or CSS rather than recorded outright.


T. Catapano’s focus: handing off Whitman Archive [or setting things up for handoff?] to be included in external repositories—could screen scrape 😛 (as Internet Archive does) or address digital objects directly. For the latter one wants to use METS. A digital object is (inter alia) a surrogate of a physical document or conceptual entity; it has sufficient integrity and independent significance to be manipulated, referred to, or recontextualized usefully outside of the Archive’s configuration. The Whitman Archive contains multiple file types: TEI / TEIlike transcriptions, TIFF images, XHTML, “digital mortar” (CSS, XSLT, thumbnails, branding / presentation aspects, etc.), bibliography, database, and so on; these have various relationships, explicit as well as implied.

[Catapano talked through which parts of METS deal with which types of md (object itself, its source, technical and rights md, etc.), which I haven’t recorded.] The Archive itself is being treated as an object…. Then one records decisions, vocabularies, an example instance, and so on in a METS profile. Issues: ID/IDREF mechanism is ambiguous; XML hierarchies are ambiguous and limited (how are two items in a list related to one another, if at all? should be able to express it formally); relationships are bidirectional (if something is a child, one can say it is but it’s harder to point back up at a parent); how does one determine the adequacy of one’s md?; need to settle protocols for updates.


D. Pitti talked about UVa’s experience of taking the Whitman Archive METS files and trying to map them to a submission information package (ingest into Fedora). Technical feasiblity is not the same thing as economic feasibility; even where we can do things, the costs sometimes prohibit implementation, esp. re: “digital mortar,” which is difficult to collect [in the library sense of “collect”]. In mapping, UVa has made primarily economic decisions for what’s sustainable, so they’ve limited the number of profiles and objects that may come in. Over time Pitti hopes we can tame the complexity of these undertakings to increase economic viability. Final question: how do we continue to provide access without reprogramming continually?


Q&A (partial): Mandell commented on NINES and the Poetess project; Catapano replied that the Whitman mapping attempts a richer sort of preservation; Barney noted a tension in giving NINES the kinds of things they want versus not being able to preserve the things they aren’t prepared to take.