Humanist Discussion Group, Vol. 32, No. 510. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: firstname.lastname@example.org  From: Dr. Herbert Wender
Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' (66)  From: Hugh Cayless Subject: Re: [Humanist] 32.496: editions, in print or in bytes (120)  From: Domenico Fiormonte Subject: RE: [Humanist] 32.496: editions, in print or in bytes (112) -------------------------------------------------------------------------- Date: 2019-03-04 22:29:35+00:00 From: Dr. Herbert Wender Subject: Re: [Humanist] 32.499: standoff markup & the illusion of 'plain text' Patrick, I like very much your straightforward thinking! Going one step further and we'll come back to the beginnings of AI overthrowing the program/data divide. But who anytimes has experienced a LISP bracket storm can fear the difficulties probably greater than the actual ones.Â But surely, you are not the man to fear difficulties: I would like to indicate to the other partiiipants of this discussion a talk I heard 2003 (TEI Memberr Meeting in Nancy, France). Best regards, Herbert Patrick Durusau: What is a tree really? Every user of markup languages, from GML forward, has been schooled in the cant of "descriptive" versus "procedural" markup. Procedural markup is panned in ISO 8879 as "inflexible" and requiring a user to change procedural markup in order to change the presentation of their document. Changing procedural markup to affect presentation? Sounds a lot like changing descriptive markup in order to have a different tree! Does that mean that a tree is a particular presentation of descriptive markup? Or perhaps more precisely, a procedural representation of descriptive markup? Techniques for transforming structured but non-SGML/XML files into SGML/XML have been long known in the markup community. Such techniques should also produce multiple valid instances of SGML/XML from a single file. One possible file format for TEI P5 is proposed that allows unlimited (including overlapping) descriptive markup, while retaining, with preprocessing, compatibility with XML parsers. Source: https://tei-c.org/Vault/MembersMeetings/2003-info/abstracts.html#TEI.1_text.1_bo dy.1_div.2 See also the slides: https://tei-c.org/Vault/MembersMeetings/2003-info/durusau.pdf -----Ursprüngliche Mitteilung----- > Date: 2019-03-02 15:41:25+00:00 > From: Patrick Durusau > Subject: "Plain text" as illusion, was Re: [Humanist] 32.498: standoff markup But "plain text" in an electronic system is an illusion. Why not abandon the distinction between text, markup and annotations, capturing all of them in a database, upon which queries then search and/or render a particular "view" of a "text" for your viewing? -- Patrick Durusau email@example.com Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau -------------------------------------------------------------------------- Date: 2019-03-04 15:55:41+00:00 From: Hugh Cayless Subject: Re: [Humanist] 32.496: editions, in print or in bytes > > > [...]Type and fonts were designed > > for print. Ought we to stop using them as well? It would put an end to > this > > thread... > > I don't accept this analogy. Since all computing languages break down > to the same operations in the end of course their effects are > identical. You need to give examples of data formats that have been > reused for different purposes. Not so many of those, are there? > Are you kidding? This happens All. The. Time. You conflate formats and data structures below, which aren't the same thing (though certainly related), so I'll toss in a few examples of both, gathered in about 20 minutes work. * PDF as presentation format (e.g. https://www.planetquark.com/2010/05/14/use-adobe-reader-as-presentation-tool/). I have done this myself, now that I think about it. PDF is something unambiguously designed for printing and/or replicating the print experience on a screen. Yet I can use it like PowerPoint! * BibTeX (https://en.wikipedia.org/wiki/BibTeX): Takes a typesetting language and uses it as a reference database. * And that reminds me of the wondrous variety of uses I have seen people turn spreadsheets to... * (Ok, this one is clearly meant just for fun, and I just happened to see it this morning) Animated GIFs in Excel ( https://github.com/pugwonk/gif2xlsx/blob/master/README.md) Turning to data structures, repurposing is probably even more common, and expected. I'll use Java again, both because I'm familiar with it and because they have the habit of naming classes that implement data structures according to the data structures they use: * ArrayList ( https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html). Some unpacking for those who are still bothering to read and who aren't programmers: An array is a fixed-size structure with a predefined number of "slots" for holding data. A (linked) list is a node that has a pointer to the next node, which points to the third, and so on ad infinitum, like a chain. Linked lists have some very useful propertiesâyou can iterate over them, trivially add a new node as often as you want, remove a node by simply changing the previous node's pointer to point at the following node. They take only as much memory as they need. Why would you want an "array" list? As it turns out, arrays have some useful affordances, like constant-time access to members. If I want the hundredth item in a linked list I have to start at the head and take 99 steps to get there. If I want the 100th item of an array, I just grab the thing in box 100. Of course, there are tradeoffs. Arrays are fixed-size, so if I want to add a new item and I'm out of slots, the ArrayList has to swap my old array for a bigger one and copy its contents over. They will pretty much always take up more space than you're actually using * JSON-LD. It's a tree! It's a graph! *Head explodes* There are many, many similar examples available. > I keep harping on this point because it is the nub of the problem. > Every decision we take in designing a data structure, like a marked-up > file, has powerful consequences on what we can do with that data > afterwards. Markup languages are one TINY part of the full breadth of > data structures available in computer science and you should > appreciate this. Robert Sedgwick in his popular book Algorithms said: > > "data structures ... are central objects of study in computer science. > Thus algorithms and data structures go hand in hand; in this book we > take the view that data structures exist as the byproducts or > endproducts of algorithms" > Markup languages are not data structures in and of themselves. They may be interpreted by a program (an algorithm) as a particular data structure. They may even be interpretable as *more than one* data structure. > > What we can do with a text encoded in XML is thus already decided by > the designers of XML. I'm not saying that you can't do a variety of > things, but it is only within a restricted field of operation. How is its field of operation restricted? And more to the point, if my own "field of operation" happens to match what it does, why shouldn't I use it? > It's > pretty close to the train and scooter analogy I used earlier. On a > train we can go faster or slower, we can stop or go on but we can only > follow in the tracks laid down by its builders. I'm afraid I don't accept this analogy. I kind of hate scooters too. But let's examine it a bit anyway. Both have their own affordances: trains can carry vastly more people and freight faster than scooters can, but they are constrained as to where they go. Scooters are lightweight and flexible, but they basically only carry you. And they, like trains, rely on pre-built infrastructure. Why can't we have both? Why do all trains have to be destroyed so that scooters may rule the world? They don't. That would be silly. > In one of our > presentations someone in the audience, after seeing what we could do > with the texts of Harpur admitted, reluctantly, that abandoning XML > freed us up to do more. That's how I feel about the straight-jacket > that XML has become. > But absolutely no-one is demanding that you use XML. You were able to do what you needed by going another way, which is cool. That doesn't mean your way will work for everyone, nor that users of TEI are fools for using TEI. You can hate trains and I can hate scooters and the planet will continue to spin. I have serious concerns about your own use of layers, though I can see how they'd work in certain circumstances. They may dangerously oversimplify the variance in your sources, which in the worst case is combinatorial; they rely on duplication of text, complicating the editing process if you need to make changes; plain text / light formatting doesn't deal well with things like ambiguity, dislocation, transposition, weird glyphs, damage, gaps, etc.. I do understand that you're proud of the thing you built, and that's great. But it doesn't follow that other people have to make the same decisions you did. My tradeoffs are not your tradeoffs. All the best, Hugh -------------------------------------------------------------------------- Date: 2019-03-04 11:00:21+00:00 From: Domenico Fiormonte Subject: RE: [Humanist] 32.496: editions, in print or in bytes I've been following this and the previous McGann et al thread and I feel like this was a gigantic *déjà vu*... People I've been knowing for a long time, and for whom I've the greatest respect and affection, were rehearsing the same ideas and arguments I've been listening to for... may be thirty years? But at this stage (over-fifty like myself) we should all try to be intellectually honest and admit that all scholarly discourses (including this debate), the tools, the methodologies, the models and the so-called 'standards' we're talking about belong to a relatively small circle of institutions and people, mostly Anglophone or located in the Northern hemisphere, that had the political, economical and cultural power to persuade the rest of us that both their ideas and instruments were the best "available" solutions for "representing" our textual heritage. And therefore they were necessary if we were to be considered "acceptable" and admitted to the club of Rigorous Scholarship. We know how scholarly evaluation (and publication, etc.) works: we (the Global North) set out the rules, you (the rest of the world) follow them [http://knowledgegap.org/] -- and pay for the service, of course. From a certain point on, this has been exactly the pattern followed by the most influential, rich and powerful DH projects. Just try to build a digital archive without using certain 'standards' (XML? TEI?) and you'll see if you'll get the money from the NEH or EU or some big Global North quango. Following my experience as XML encoder and DH undergraduate instructor (and I still teach XML and TEI... great didactic tool(s)!) I'm inclined to sympathize with Desmond's arguments. However I don't think the problem is using XML-TEI or any other digital representation language. The problem is not even *how* you want or need to represent something. The problem is rather *who* you are, and *from where* you are speaking (and also in what language you speak). What and where are, so to speak, the material conditions of your knowing? The Anglophone community is today still hegemonic and has the power to impose specific models (through its language and institutions, first of all) and so make it difficult, if not impossible, for others to do or propose something different. The funding sources today are framed within a certain epistemological discourse, while it should be exactly the other way round: a competition of ideas. I'm talking here about my personal experience. With other people in 1996 I've started Digital Variants, one of the first and less known project on literary textual variation. The original sin of course was that the majority of texts were in Italian. But at that time DH was a small world and there was still some space for diversity and pluralism. Later it became like any other academic field: a battle for the hegemony where the first victim is innovation, and the second is cultural and linguistic diversity. That's also why I've been always questioning the geographical expansion of ADHO (like I'm not making enough enemies with this post...): in the present conditions of inequality and epistemological subordination the foreseeable result will be another cultural colonization. Because when and where there is epistemic injustice, there's only one possibility, one culture, one vision of the world -- one "digital representation" tool and/or methodology. Epistemic injustice is not something you work out by inviting subalterns to join the winner's party. The Anglophone and Northern European bias of this discussion has been so strong that so far nobody has mentioned the cultural and geopolitical bias of (any of those) tools. Is it in fact culturally neutral where a tool, a technological standard or a methodology is designed? Is there any lesson we've learned from the history of technology? What does ASCII mean? Who's on the board of UNICODE? Etc. So are you really telling us that TEI is free of geopolitical, cultural and linguistic bias? Any of these aspects is completely absent in your discourses. Your epistemology seems culturally blind. This is the bug in the system -- not overlapping hierarchies. Did we ever ask ourselves how many expensive digital editions were produced outside Northern Europe, USA, etc.? I'm in India right now -- it's my sixth visit to this country -- and I can't think of anybody here (or in Africa, or in most of Latin America, but also in Southern Europe!) who can afford to spend a massive amount of time and money to build a scholarly digital edition (let alone to invest in "specialised technical training" for keeping it going!) based on some epistemologically questionable and economically infeasible model(s) designed by a group of mostly monolingual Anglosaxons who seem to think their solutions/models/standards are eternal and universally good for all. >>"We need better" [DS] Indeed. All the best / Saluti a tutt* Domenico --- Domenico Fiormonte Dipartimento di Scienze Politiche Università Roma Tre http://www.digitalvariants.org http://infolet.it http://www.newhumanities.org _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: firstname.lastname@example.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.