Humanist Discussion Group, Vol. 37, No. 23. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-05-13 13:00:01+00:00 From: maurizio lana <maurizio.lana@uniupo.it> Subject: Re: [Humanist] 37.19: on scientising the humanities: texts as data Hi Tim re: > I don't want to say treating human made texts as data gets us > nothing. It does, and it's probably worth having, at least > sometimes, but it's not, and I would say, cannot be, the same > as we get when humans read and try to understand human made > text made to say things to other humans in some human world. i am aware that it is is disturbing to say (to read) that text can be treated as data. but this a simply a way suited to the year 2023 to say what Hugo de Saint Cher was thinking when producing the first concordance of the Holy Bible. (a time where no one was thinking with the words "text is data") fact is that this "treating the text as data" has not its main aim to count how many times word "weferg" recurs, but to understand the meaning of the word by reading and comparing the contexts where the word "weferg"recurs. this is interpretation and it recurs to the best tools available. "treating the text as data" allows to dismiss an "authority principle ("the meaning of word "weferg" is [abc]") and to adopt an experimental principle (let's compare the uses of the word and try to infer the meaning). inferring the meaning requires that you know the author, its context, the scope of the text, etc, so it is not an automatizable activity; nevertheless it is an activity where the text is meant as data. it is based on the concept that the texts speak to you also when you read them "transversally" and not only/simply when yo read them sequentially ciao! Maurizio Il 13/05/23 08:48, Humanist ha scritto: > Humanist Discussion Group, Vol. 37, No. 19. > Department of Digital Humanities, University of Cologne > Hosted by DH-Cologne > www.dhhumanist.org > Submit to:humanist@dhhumanist.org > > > > > Date: 2023-05-12 09:00:22+00:00 > From: Tim Smithers<tim.smithers@cantab.net> > Subject: Re: [Humanist] 37.5: on scientising the humanities > > Dear Maurizio, > > I think you make a useful, and important, distinction when you > point out that much work in textual studies treats the texts > as data, rather than as human writings made, by human > authors, with intentions of saying things to other human > readers, and listeners when the texts are read out loud. > > This kind of work is far from my expertise, but it does seem > to me to be important to recognise the difference between > trying to understand what texts may have been made to say, and > may also be taken to say, to humans, by looking at, > appreciating, understanding, and taking into account, that > these texts are all somehow embedded in human worlds, usually > in complicated ways. > > Ripping these same words from the worlds they are/were formed > in, understood in, used to say things in, and [most often] > converting them into numbers we can [much more conveniently] > compute with, surely leaves vast amounts of these human worlds > behind, and so, in return, can then tell us rather little > about what's going on in the human languaging in written form > we got the data from. No? > > It takes humans to say what the words generated by things like > ChatGPT, and its ilk, can be read to say, why, and how. > ChatGPT, et al, doesn't have anything like a human > understanding of human made text. But it doesn't need this > understanding to be able to generate human-like texts > > I don't want to say treating human made texts as data gets us > nothing. It does, and it's probably worth having, at least > sometimes, but it's not, and I would say, cannot be, the same > as we get when humans read and try to understand human made > text made to say things to other humans in some human world. > > Using [massive amounts of] human made texts as data, to build > representations of the statistical distributions of long > sequences of words, can be used, as we now see, in a > generative mode to make texts that are readable and > understandable, and even well considered, by humans. But what > were we expecting these systems to do, produce, gobbledygook? > Why? Just as with using lots of data to build statistical > distributions for other complicated things, weather patterns, > for example, the more text-as-data used to program -- so > called 'train' -- these large language models results in the > statistical distribution model having higher resolution on the > variations in the 'distribution surface,' and thus a greater > probability of generating texts more often more like the human > made texts use to make the data used here in the first place. > > This is, I agree, not uninteresting, and not, not useful, but > it's not the same as trying to understand how human languaging > in written forms works for humans, in all the ways it does, > and, I suggest, doesn't, and can't, tell us much about all > this. But, as I say, I'm not the expert here, so I'm sure you > and others here will be able to correct my thinking on this. > I'd be happy to be corrected. > > Best regards, > > Tim ------------------ non credo a niente che sia facile, rapido, spontaneo, improvvisato, approssimativo. credo nella forza di ciò che è lento, calmo, ostinato, senza fanatismi né entusiasmi Italo Calvino ------------------------------------------------------------------------ Maurizio Lana Università del Piemonte Orientale Dipartimento di Studi Umanistici Piazza Roma 36 - 13100 Vercelli _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php