Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 32

Humanist Archives: March 5, 2019, 6:24 a.m. Humanist 32.510 - editions, markup, illusions and cultural politics

                  Humanist Discussion Group, Vol. 32, No. 510.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Dr. Herbert Wender 
           Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' (66)

    [2]    From: Hugh Cayless 
           Subject: Re: [Humanist] 32.496: editions, in print or in bytes (120)

    [3]    From: Domenico Fiormonte 
           Subject: RE: [Humanist] 32.496: editions, in print or in bytes (112)


--[1]------------------------------------------------------------------------
        Date: 2019-03-04 22:29:35+00:00
        From: Dr. Herbert Wender 
        Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text'

Patrick,

I like very much your straightforward thinking! Going one step further and we'll
come back to the beginnings of AI overthrowing the program/data divide. But who
anytimes has experienced a LISP bracket storm can fear the difficulties probably
greater than the actual ones.  But surely, you are not the man to fear
difficulties: I would like to indicate to the other partiiipants of this
discussion a talk I heard 2003 (TEI Memberr Meeting in Nancy, France).

Best regards,
Herbert

Patrick Durusau: What is a tree really?

 Every user of markup languages, from GML forward, has been schooled in the cant
of "descriptive" versus "procedural" markup. Procedural markup is panned in ISO
8879 as "inflexible" and requiring a user to change procedural markup in order
to change the presentation of their document. Changing procedural markup to
affect presentation? Sounds a lot like changing descriptive markup in order to
have a different tree!

 Does that mean that a tree is a particular presentation of descriptive markup?
Or perhaps more precisely, a procedural representation of descriptive markup?

 Techniques for transforming structured but non-SGML/XML files into SGML/XML
have been long known in the markup community. Such techniques should also
produce multiple valid instances of SGML/XML from a single file. One possible
file format for TEI P5 is proposed that allows unlimited (including overlapping)
descriptive markup, while retaining, with preprocessing, compatibility with XML
parsers.

 Source:
https://tei-c.org/Vault/MembersMeetings/2003-info/abstracts.html#TEI.1_text.1_bo
dy.1_div.2

See also the slides:
https://tei-c.org/Vault/MembersMeetings/2003-info/durusau.pdf

-----Ursprüngliche Mitteilung-----

>  Date: 2019-03-02 15:41:25+00:00
>  From: Patrick Durusau 
>  Subject: "Plain text" as illusion, was Re: [Humanist] 32.498:
standoff markup


But "plain text" in an electronic system is an illusion. Why not abandon
the distinction between text, markup and annotations, capturing all of
them in a database, upon which queries then search and/or render a
particular "view" of a "text" for your viewing?



--

Patrick Durusau
patrick@durusau.net
Technical Advisory Board, OASIS (TAB)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau



--[2]------------------------------------------------------------------------
        Date: 2019-03-04 15:55:41+00:00
        From: Hugh Cayless 
        Subject: Re: [Humanist] 32.496: editions, in print or in bytes

>
> > [...]Type and fonts were designed
> > for print. Ought we to stop using them as well? It would put an end to
> this
> > thread...
>
> I don't accept this analogy. Since all computing languages break down
> to the same operations in the end of course their effects are
> identical. You need to give examples of data formats that have been
> reused for different purposes. Not so many of those, are there?
>

Are you kidding? This happens All. The. Time. You conflate formats and data
structures below, which aren't the same thing (though certainly related),
so I'll toss in a few examples of both, gathered in about 20 minutes work.

* PDF as presentation format (e.g.
https://www.planetquark.com/2010/05/14/use-adobe-reader-as-presentation-tool/).
I have done this myself, now that I think about it. PDF is something
unambiguously designed for printing and/or replicating the print experience
on a screen. Yet I can use it like PowerPoint!
* BibTeX (https://en.wikipedia.org/wiki/BibTeX): Takes a typesetting
language and uses it as a reference database.
* And that reminds me of the wondrous variety of uses I have seen people
turn spreadsheets to...
* (Ok, this one is clearly meant just for fun, and I just happened to see
it this morning) Animated GIFs in Excel (
https://github.com/pugwonk/gif2xlsx/blob/master/README.md)
Turning to data structures, repurposing is probably even more common, and
expected. I'll use Java again, both because I'm familiar with it and
because they have the habit of naming classes that implement data
structures according to the data structures they use:
* ArrayList (
https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html). Some
unpacking for those who are still bothering to read and who aren't
programmers: An array is a fixed-size structure with a predefined number of
"slots" for holding data. A (linked) list is a node that has a pointer to
the next node, which points to the third, and so on ad infinitum, like a
chain. Linked lists have some very useful properties—you can iterate over
them, trivially add a new node as often as you want, remove a node by
simply changing the previous node's pointer to point at the following node.
They take only as much memory as they need. Why would you want an "array"
list? As it turns out, arrays have some useful affordances, like
constant-time access to members. If I want the hundredth item in a linked
list I have to start at the head and take 99 steps to get there. If I want
the 100th item of an array, I just grab the thing in box 100. Of course,
there are tradeoffs. Arrays are fixed-size, so if I want to add a new item
and I'm out of slots, the ArrayList has to swap my old array for a bigger
one and copy its contents over. They will pretty much always take up more
space than you're actually using
* JSON-LD. It's a tree! It's a graph! *Head explodes*

There are many, many similar examples available.


> I keep harping on this point because it is the nub of the problem.
> Every decision we take in designing a data structure, like a marked-up
> file, has powerful consequences on what we can do with that data
> afterwards. Markup languages are one TINY part of the full breadth of
> data structures available in computer science and you should
> appreciate this. Robert Sedgwick in his popular book Algorithms said:
>
> "data structures ... are central objects of study in computer science.
> Thus algorithms and data structures go hand in hand; in this book we
> take the view that data structures exist as the byproducts or
> endproducts of algorithms"
>

Markup languages are not data structures in and of themselves. They may be
interpreted by a program (an algorithm) as a particular data structure.
They may even be interpretable as *more than one* data structure.

>
> What we can do with a text encoded in XML is thus already decided by
> the designers of XML. I'm not saying that you can't do a variety of
> things, but it is only within a restricted field of operation.


How is its field of operation restricted? And more to the point, if my own
"field of operation" happens to match what it does, why shouldn't I use it?

> It's
> pretty close to the train and scooter analogy I used earlier. On a
> train we can go faster or slower, we can stop or go on but we can only
> follow in the tracks laid down by its builders.


I'm afraid I don't accept this analogy. I kind of hate scooters too. But
let's examine it a bit anyway. Both have their own affordances: trains can
carry vastly more people and freight faster than scooters can, but they are
constrained as to where they go. Scooters are lightweight and flexible, but
they basically only carry you. And they, like trains, rely on pre-built
infrastructure. Why can't we have both? Why do all trains have to be
destroyed so that scooters may rule the world? They don't. That would be
silly.

> In one of our
> presentations someone in the audience, after seeing what we could do
> with the texts of Harpur admitted, reluctantly, that abandoning XML
> freed us up to do more. That's how I feel about the straight-jacket
> that XML has become.
>

But absolutely no-one is demanding that you use XML. You were able to do
what you needed by going another way, which is cool. That doesn't mean your
way will work for everyone, nor that users of TEI are fools for using TEI.
You can hate trains and I can hate scooters and the planet will continue to
spin. I have serious concerns about your own use of layers, though I can
see how they'd work in certain circumstances. They may dangerously
oversimplify the variance in your sources, which in the worst case is
combinatorial; they rely on duplication of text, complicating the editing
process if you need to make changes; plain text / light formatting doesn't
deal well with things like ambiguity, dislocation, transposition, weird
glyphs, damage, gaps, etc.. I do understand that you're proud of the thing
you built, and that's great. But it doesn't follow that other people have
to make the same decisions you did. My tradeoffs are not your tradeoffs.

All the best,
Hugh


--[3]------------------------------------------------------------------------
        Date: 2019-03-04 11:00:21+00:00
        From: Domenico Fiormonte 
        Subject: RE: [Humanist] 32.496: editions, in print or in bytes

I've been following this and the previous McGann et al thread and I
feel like this was a gigantic *déjà vu*...

People I've been knowing for a long time, and for whom I've the
greatest respect and affection, were rehearsing the same ideas and
arguments I've been listening to for... may be thirty years?

But at this stage (over-fifty like myself) we should all try to be
intellectually honest and admit that all scholarly discourses
(including this debate), the tools, the methodologies, the models and
the so-called 'standards' we're talking about belong to a relatively
small circle of institutions and people, mostly Anglophone or located
in the Northern hemisphere, that had the political, economical and
cultural power to persuade the rest of us that both their ideas and
instruments were the best "available" solutions for "representing" our
textual heritage. And therefore they were necessary if we were to be
considered "acceptable" and admitted to the club of Rigorous
Scholarship.

We know how scholarly evaluation (and publication, etc.)
works: we (the Global North) set out the rules, you (the rest of the
world) follow them [http://knowledgegap.org/] -- and pay for the
service, of course. From a certain point on, this has been exactly the
pattern followed by the most influential, rich and powerful DH
projects. Just try to build a digital archive without using certain
'standards' (XML? TEI?) and you'll see if you'll get the money from
the NEH or EU or some big Global North quango.

Following my experience as XML encoder and DH undergraduate instructor
(and I still teach XML and TEI... great didactic tool(s)!) I'm
inclined to sympathize with Desmond's arguments. However I don't think
the problem is using XML-TEI or any other digital representation
language. The problem is not even *how* you want or need to represent
something. The problem is rather *who* you are, and *from where* you
are speaking (and also in what language you speak). What and where
are, so to speak, the material conditions of your knowing?

The Anglophone community is today still hegemonic and has the power to
impose specific models (through its language and institutions, first
of all) and so make it difficult, if not impossible, for others to do
or propose something different. The funding sources today are framed
within a certain epistemological discourse, while it should be exactly
the other way round: a competition of ideas.

I'm talking here about my personal experience. With other people in
1996 I've started Digital Variants, one of the first and less known
project on literary textual variation. The original sin of course was
that the majority of texts were in Italian.

But at that time DH was a small world and there was still some space
for diversity and pluralism. Later it became like any other academic
field: a battle for the hegemony where the first victim is innovation,
and the second is cultural and linguistic diversity.

That's also why I've been always questioning the geographical
expansion of ADHO (like I'm not making enough enemies with this
post...): in the present conditions of inequality and epistemological
subordination the foreseeable result will be another cultural
colonization. Because when and where there is epistemic injustice,
there's only one possibility, one culture, one vision of the world --
one "digital representation" tool and/or methodology. Epistemic
injustice is not something you work out by inviting subalterns to join
the winner's party.

The Anglophone and Northern European bias of this discussion has been
so strong that so far nobody has mentioned the cultural and
geopolitical bias of (any of those) tools. Is it in fact culturally
neutral where a tool, a technological standard or a methodology is
designed? Is there any lesson we've learned from the history of
technology?

What does ASCII mean?

Who's on the board of UNICODE?

Etc.

So are you really telling us that TEI is free of geopolitical,
cultural and linguistic bias? Any of these aspects is completely
absent in your discourses. Your epistemology seems culturally blind.
This is the bug in the system -- not overlapping hierarchies.

Did we ever ask ourselves how many expensive digital editions were
produced outside Northern Europe, USA, etc.? I'm in India right now --
it's my sixth visit to this country -- and I can't think of anybody
here (or in Africa, or in most of Latin America, but also in Southern
Europe!) who can afford to spend a massive amount of time and money
to build a scholarly digital edition (let alone to invest in
"specialised technical training" for keeping it going!) based on some
epistemologically questionable and economically infeasible model(s)
designed by a group of mostly monolingual Anglosaxons who seem to
think their solutions/models/standards are eternal and universally
good for all.

>>"We need better" [DS]

Indeed.

All the best / Saluti a tutt*

Domenico

---
Domenico Fiormonte
Dipartimento di Scienze Politiche
Università Roma Tre
http://www.digitalvariants.org
http://infolet.it
http://www.newhumanities.org





_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.