        Date: 2021-06-14 07:11:53+00:00
        From: Manfred Thaller <>
        Subject: Re: [Humanist] 35.85: obsolescence of markup

Dear Herbert,

Sorry for a delayed answer.

I am afraid, that my original mail to Humanist has been a bit too
elliptical. So I have to ask for your patience for a slightly longer
extrapolation before responding to your remark below.

Willard's original question was:

> Currently (correct me if I am wrong) markup intervenes to embed human
> intelligence about an object where artificial processes of detection and
> analysis fall short. Does this not suggest that some kinds of markup will
> become obsolete at some point? (I do not have in mind scholarly
> commentary!) Has anyone speculated intelligently along these
> lines?
  In response to some reactions, he expanded the question to:
> Jonah Lynch responded to my speculation about the obsolescence of
> markup, asking what I had in mind by the distinction I made between the
> kind I thought would not ever prove obsolescent and the kind that would.
> My overall intention was to draw attention to the impermanence of work
> in computing, and so to raise the question of invasive curation. Of
> course every thing is impermanent, in constant flux &c, but some
> artefacts of scholarship do survive because we care about them. Adding
> to them with highly interpretative metatext would be regarded as a
> different sort of contribution than denoting layout, would it not?
> Thus an example: metatext that says "this is a paragraph" versus
> metatext that comments on the author's likely intention in breaking the
> flow of prose in the particular version in question. I think we can say
> that completely reliable automatic recognition of paragraphs is only a
> matter of time -- except in relatively rare circumstances. No
> hard-and-fast rules, only a doubtlessly annoying observation.
> Is there yet another argument here for standoff markup? For working even
> harder on statistical methods of analysis? Something else?

For me, this can be "operationalized" from two points of view:

A conceptual / epistemic one. An operational / algorithmic one.

The epistemic one is in my opinion the one which lies at the heart of my
own scepticism of the TEI, or, as a wrote in the opening of my "post"
you quote:

> The markup embedded into
> a document shall: (a) represent characters, which do not exist in the
> fonts available or
> which are non-alphabetic like interpunctuation. (b) Allow the
> representation of
> abstract texts resulting from the evaluation of various witnesses in a
> critical edition.
> (c) Annotate a text with interpretations.

As long as these three - at least for me - completely different
epistemic layers are inseparably mixed in <emph>one</emph> markup
system, conceptual chaos ensues. But be that as it may conceptually,
there is also a technical problem, that is behind Willard's question for

How to you confront situations, where (1) a heavy investment has been
made in the proofreading of a raw text, creating a perfect base for
further work, (2) the conceptual comments added into the Sachapparat are
obsolete, however, as they reflect a theoretical paradigm that has
fallen into disgrace? Or: where (1) valuable interpretations have been
embedded into a very large text, which (2) is so large, that for the
sake of manageability these comments have been embedded into a text
produced by slightly dirty OCR, when (3) ten years later, after the
author of these comments has died, an OCR breakthrough allows to create
a significantly less dirty conversion of the original files? Or: When
you have an (1) admirably prepared text, which has however (2) only very
little done on the extraction of entities, which than become available
with the help of an entity extraction algorithm (3) which shall
supersede the few ones in the original markup, (4) leaving the rest of
the markup unchanged? Or, to take up Willard's argument: When you want
to keep something, which is acceptedly obsolete side-by-side with "some
artefacts of scholarship [which] do survive because we care about them".

While I've not been at this Wuppertal conference, I'm somewhat surprised
that these questions of how to replace an obsolete part of a marked up
text leaving the remainder unchanged have been solved by the Hofmeisters
(and yes, I'm also a quite regular reader of the Balisage proceedings).
And I have also overlooked somehow the discussion of how two data
objects might negotiate the consequences of updates in one of them for
the links between them, a bit after the line you quote from my blog.

Shortly on capta:

> Given, as she [Drucker] says,
> that all data is actually capta, current approaches to data visualization are
> misleading in that they suggest more certainty and stability than is actually
> the case.

Yes, to paraphrase (not quote) Darell Huff: How to lie with statistics,
1954: "How to lie with statistics? Visualize them."

Whether all data is capta is of course only the starting point of a
discussion, which has to clarify what is "all", or rather whether all
data are capta up to exactly the same degree, or rather whether there is
only one captor, or possibly a whole set of interacting captors. Forgive
me: Or whether it is really epistemically a good idea to consider (1)
the reading of an individual character, (2) the relative weight of two
textual witnesses and (3) the interpretation of the intent of an author
as operating on exactly the same conceptual level ...

Kind regards,

> Dear Manfred,
> I'm not known as a defender of XML/TEI strategies but what you say in your
> about "strings, which are mixtures of standard characters, graphics of strange
> wiggles of the pen and names of glyphs, which are recognizably standardized"
> cannot be taken as an argument against markup as a means of scientific
> communication (which is IMHO best suited to differentiate between different
> sources of information resp. knowledage). I would recommend to contact the
> organizers of the 2020 IDE workshop, "Die (hyper-)diplomatische Transkription
> und ihre Erkenntnispotentiale" or to talk with the keynote speakers there,
> and Wernfried Hofmeister(-Winter) about the possibilities to combine
> hyper-diplomatic transcriptions with standard markup technology.
> To come seriously to the point of 'interpretation' in and about humanities'
> textually captured sources I would propose to take into account arguments
> discussed in other contexts:
> "In her seminal essay, Humanities Approaches to Graphical Display, Johanna
> Drucker points out that visualization design for the humanities has still not
> properly accommodated the nature of humanities scholarship. Given, as she
> that all data is actually capta, current approaches to data visualization are
> misleading in that they suggest more certainty and stability than is actually
> the case." [1]
> Eventually we can compare the framwork of textual markup with the criticized
> framework of standard visualizations, and the genuinly suggestive effects too.
> Kind regards, Herbert
> [1] Radzikowska, M., & Ruecker, S. (2020, June 1). “Capta by Juxtaposition: A
> Rich-Prospect Approach to the Visualization of Information.” [Presentation].
> The Canadian Society for Digital Humanities/ Société canadienne des humanités
> numériques (CSDH/SCHN) annual conference at the 2020 Congress of the Social
> Sciences and Humanities (moved online due to COVID19).

