Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 34

Humanist Archives: June 15, 2020, 8:20 a.m. Humanist 34.110 - annotating notation

                  Humanist Discussion Group, Vol. 34, No. 110.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Hugh Cayless 
           Subject: Re: [Humanist] 34.108: notation, software and mathematics (30)

    [2]    From: William Pascoe 
           Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? (18)

    [3]    From: Desmond  Schmidt 
           Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure? (42)


--[1]------------------------------------------------------------------------
        Date: 2020-06-14 12:37:37+00:00
        From: Hugh Cayless 
        Subject: Re: [Humanist] 34.108: notation, software and mathematics

I fear it may be necessary to posit a corollary to Godwin’s Law for Humanist: as
the length of a discussion thread on Humanist increases, the probability of
arguments about OHCO/the evils of XML/the splendor of older or newer
alternatives (unusable alas, at present, but one day...) approaches 1.

The difficulties involved in disentangling the purity of theory from the
messiness of implementation are real, and perhaps the latter is best avoided on
a list such as this. Unfortunately for me, that is precisely where I prefer to
spend my time, as it is where things actually get done. Indeed, if I need fast
search of a (let’s say for the sake of argument) TEI XML corpus, I will build an
index for it and use that. Does that make the structure of the source data
somehow bad? I think there are some on the list who would answer yes, but I
don’t believe in a single, perfect data structure, so I regard that as an
implementation detail.

Relational databases did indeed come to dominate the market, because they met a
lot of needs quite well (and still do). More recently, perhaps readers noticed
the whole “NoSQL” movement away from them. They seem to have survived, however,
probably because they never stopped being useful. It remains to be seen whether
XML and related technologies will make it through the JSON era. I haven’t seen a
suitable replacement yet.

All the best,
Hugh

P.S. XQuery may be regarded as a superset of XPath. It is a full programming
language in its own right, and incidentally, does a beautiful job of processing
JSON!



--[2]------------------------------------------------------------------------
        Date: 2020-06-14 09:50:38+00:00
        From: William Pascoe 
        Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure?

Before a JSON vs XML war erupts that may eclipse the heirarchical vs non-
heirarchical debate, please bear in mind some things may be better for some
things than others, and in different circumstances. For example, I think the
suggestion that JSON wins against XML was probably in relation to marking up
metadata, or transport of data.

The task of describing and annotating plays is more about mark up of text.
Metadata and markup are two different problems.

Personally, I'd favor JSON for handling metadata on the web, and wouldn't even
attempt using JSON to markup a play.

I propose a better question than, "Does JSON beat XML?" might be, "What sort of
things would JSON or XML be better for?"

Kind regards,

Bill Pascoe

--[3]------------------------------------------------------------------------
        Date: 2020-06-14 07:50:52+00:00
        From: Desmond  Schmidt 
        Subject: Re: [Humanist] 34.106: how to represent act, scene, line, speech, and page structure?

HI Michael,

I don't think I said that. I'm sorry if I did imply it.

No. JSON is an awful format for transcriptions of plays or any other
literary documents. What I meant to say was that JSON is a simple
format that is growing in popularity for web applications. I suspect
Peter probably meant something similar.

What intrigued me about what he said was that COCOA, the format used
originally by OCP, which only uses milestone tags for everything,
might actually be a better format than XML for transcriptions. An
example would be something like [speaker Hamlet]. When reading a
document linearly from start to finish the value of "speaker" would be
valid until it was overridden by a tag of the same type, such as
[speaker Ophelia]. Since the structure of the digital document would
be completely flat, the overlap problem would not exist, and queries
like "what are all the speeches on page X" would become possible.

I make this suggestion for two reasons. First, the eventual
obsolescence of textual markup technologies that were, like SGML/XML,
brought into digital humanities from the outside, might be overcome by
devising our own simple format (like COCOA). If we could provide a
translator into the current formatting technology, say COCOA to HTML,
then our transcriptions of original documents like plays would be
independent of changes in technology.

Second, I don't see how any features of transcribed documents could
not be represented in this way. Maybe you can. I'd be happy to explain
how I would represent them in COCOA, and you could explain how you
would represent them in XML, and then perhaps we could see which
textual model delivers more bangs per buck for a given level of
complexity.

Desmond


--
Dr Desmond Schmidt
Mobile: 0480147690




_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.