18.543 thoughts on writing (plain text)

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Mon, 31 Jan 2005 07:53:04 +0000

               Humanist Discussion Group, Vol. 18, No. 543.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

         Date: Mon, 31 Jan 2005 07:38:24 +0000
         From: Alexandre Enkerli <aenkerli_at_indiana.edu>
         Subject: Thoughts on Writing (was: Plain Text)

[Disclaimer: I mostly work on dynamic *oral* traditions and think of
writing as only a specific mode of language transmission.]

Fascinating thread. A few precisions on my previous mail and some random
thoughts. Apologies for the randomness.

As was probably clear from my message, by "Plain Text" I mean any type of
human-readable computer file format for textual content, including markup
formats. I was encouraged by messages here and elsewhere to think about
minimal markup requirements. Hence the reflections on WikiEdit etc. PDFs,
though "text" instead of binary, aren't in this category of "Plain Text" as
the content isn't "human-readable." Unless I'm wrong (entirely possible),
it's not really a markup specification as, like PostScript, it mostly
contains processing instructions, not structural markup.
The original idea behind my post was to find a pedagogically-sound solution
to Russ Hunt's problem with HTML files. I stand by my claim that minimal
markup can be of more use than HTML for most of our students. I also
encourage him and others to think about what our students actually *do*
with text.

As "Patrick" said, text comprises both markup data and character data.
Willard's anecdote on a fellow French-Canadian's reaction to the
elimination of "accented characters" (including, I assume, c-cedilla and
such) carries this point forward. Capital letters *are* markup. So is
punctuation. The history of typography has a lot to say about this and what
we're witnessing now is another major step in the history of
"writing." Parallels abound in music transcription and notation. Current
computer technologies (a major theme on this list) do encourage us to think
of text in new ways. Not to *limit* text. To expand it.

In fact, thinking of simplistic compression algorithms may help in the
discussion. What *is* the minimal information requirements for text? As we
all know, text is extremely redundant in terms of pure information
processing. Thinking of "Plain Text" (ASCII or other encodings) might work:
we need "character data" and "markup data." (There doesn't seem to be a
significant difference between "markup data" and metadata.) We use
different methods to separate this type of data from "character data" but
we still use parts of the same character set. This practice is the basis of
some technical issues, certainly, but we can think of this in the abstract.
For instance, capital letters and periods mostly delimit sentences and if
it weren't for exceptions (in English: title case, the pronoun "I," etc.),
other markup methods could be used to represent sentence boundaries, making
Plain Text more regular. Then, capital letters could be used for word
internal markup (as in WikiNames and other MixedCase practices). In other
words, capital letters and periods serve a similar purpose as parentheses
(and brackets, etc.).
Character data can itself be reduced, and we certainly all have note-taking
practices (fewer vowels, for one thing) which considerably reduce the
number of characters we need to type. While some may frown upon these
practices when used in more formal communication, they certainly have an
impact on the way *people* think of text. Current computer users probably
write on average much more than scribes of old. The "intrinsic quality" of
their writing isn't the issue, nor is the "intrinsic quality" of what they
read. People *do* read and write. Our goal could be to understand how they
do it. Instructors in composition are now acknowledging these "new methods"
of writing and may more easily help their students think of different sets
of rules for different forms of writing.

The fact is that many computer users (including a lot of our students) are
adopting new writing practices. Abbreviations and acronyms are now
commonplace. While partisans of prescriptive grammars may frown upon them,
they serve a purpose. Some abbreviations are "loss-less" as they can
readily be converted back to "normal writing" by spelling them out. But
many acronyms have come to mean more than what their characters stand for.
Granted, they go with the increasing informality of computer-mediated
communication. But they represent a major shift in the way people think of
text. And yes, I do include smileys/smilies/emoticons in this. Creative
uses of the ASCII set are part of the current changes in writing.

A mention on "accented characters" and similar elements. As a
French-speaker, the impossibility to use them (on several mailing-lists as
well as on some other communication systems) does change my own writing
practices. Say I need to write a short message in French using a keyboard
which makes accented characters inconvenient to type. I'll probably choose
words which don't contain accented characters (difficult for past
participles) and the tone of my message will be significantly altered. I'm
certainly not the only one who does this and it probably has an impact on
how writing is perceived.
Contrary to punctuation and capital letters, accented characters are *not*
markup, at least not structural markup (they may work for morphological and
syntactic markup, though). They simply *work* as other elements in the
character set and do not typically represent meta-data. While Unicode
dramatically increases the size of a "Plain Text" file, it's probably as
compressible as ASCII.

There are several practical issues here. Word processors have had a
tremendous impact (very negative, IMHO) on the way people think of computer
text. This impact was largely carried over to HTML and other formats. Of
course, HTML could have been used for structured text, to a certain extent,
but many users have difficulty understanding the potential of computer
texts. It's still significant that HTML is independent from page and
screen. It has forced people to think of labeled sections instead of page
numbers. Nice!
If one fully separates written form and content, there could be new ways to
write. LaTeX is a clear example of the separation of form and content. (To
answer Eric Homich, yes, LaTeX works on Linux and there are some good TeX
editors based on Linux, including LyX). LaTeX has kept challenging the idea
of "word processors" and WYSIWYG. XML formats for textual content are also
on that side of the equation. But the revolution hasn't happened yet. It's
coming, though. Whether it's through a specific format (possibly XML-based
like DocBook, OPML, RSS, TEI...) or through changes in the way people
*write* is uncertain at this point.
For instance typing notes may be quite efficient with special methods and
tools. Some tools are available already but they are relatively rare and
somewhat inconvenient to use. Still, shouldn't it be "natural" to be able
to type notes directly in a lossless shorthand and have them automatically
expanded into standard orthography?

Food for thought?


Alex Enkerli, Teaching Fellow, Visiting Lecturer
Department of Sociology and Anthropology, Indiana University South Bend, DW
1700 Mishawaka Ave., South Bend, IN 46634-7111
Office: (574)520-4102
Fax: (574)520-5031 (to: Enkerli, Anthropology)
Received on Mon Jan 31 2005 - 02:56:50 EST

This archive was generated by hypermail 2.2.0 : Mon Jan 31 2005 - 02:56:53 EST