tech corner

Notes on three short talks—includes some poor phone-camera photos.

1 Solving Some Problems…
Joel Fredell, Terri Ilgen, Charles Borchers
Margery Kempe (BL Add. 61823)

Quick critique of Auchinleck site and Newton Project at Sussex re: presentation. Says Milton project shows Unicode limitations, MUFI requires installation of font, etc. They’ve devised font/presentation solution (see bad photo of slide). TextCrawler for first pass of ?encoding, Voyeur (Sinclair) for help finding which text-clusters needed special attention. (Essentially, normalization of markup.) Speedy automation let an encoder deal with one recto/verso within four hours, on average. They began work last spring.

Using font embedding with CSS to solve presentation issue. PROTOtypeR (built by their team) works after some tweaking (cross-browser problem). Only andron_scriptor converted successfully from ttf to eot (IE4-8); also using Charis SIL. Fonts are kept server-side. Showcase is name of presentation viewer.

http://english.selu.edu/humanitiesonline/kempe/

q: why use textcrawler instead of xml entities?

a: multiple files, and thus faster to run; dumbed-down encoding. still building xslt.

q: what kind of search?

a: under dev.

________

2 Bertrand Gaiffe, Metadata customization with ODD

Clarin project–structured md (not dc). Metadata in trees: nodes called components, elements called leaves. Component registry becomes ODD file, with byproducts: schema for editing md, new component that may be registered.

Could simplify how to refer to model classes. Instead of alternation, sequence, sequenceOptional, sequenceOptionalRepeatable, and sequenceRepeatable, one could have only sequence and alternation.

There was discussion of how to deal with choice in this syntax.

_____

Sebastian Rahtz, Realistic Targets in TEI to RDF



“Our TEI texts are a good archival form, not an interchange format. (Discuss.)” Context: CLAROS project (clarosnet.org), with a lexicon of Greek personal names. Discovery, extraction (comb text looking for stuff that can be represented in rdf), mapping (map all in text into rdf), container (decorate text with attrs that map to rdf); talk is mostly about extraction methods. Approach, not a particular ontology.

Where do we store mapping? equiv element with attrs name, uri, filter, mimeType. How to use? See OxGarage, inter alia. Gruff is an RDF visualizer.

Another way to go: NLP, as e.g. how people analyze Google Books data.

TEI coverage not great but relatively easily extended. Biggest problem is determining context, due to descriptive nature of TEI markup.