Humanist Discussion Group

Humanist Archives: March 30, 2023, 6:36 a.m. Humanist 36.496 - numbers for words

              Humanist Discussion Group, Vol. 36, No. 496.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                Submit to:

        Date: 2023-03-29 14:28:14+00:00
        From: James Rovira <>
        Subject: Re: [Humanist] 36.489: numbers for words

These posts have been interesting and informative to me, and I'm grateful
for them.

Michael --

Are there any models that take into account polysemy in your example no. 2
below? Not just that a word has multiple meanings, but that a single text
can exploit multiple meanings of the same word even within the same
sentence? The word "father" can be a noun or a verb, for example, and has
seven definitions as a noun and three as a verb just in Miriam Webster.

Jim R

On Wed, Mar 29, 2023 at 1:36 AM Humanist <> wrote:

> 2. As a word => one number per word
> In most NLP applications, it makes more sense to train the model on the
> words in
> the data rather than the letters. The biggest reason for this is that the
> words
> are the main unit of meaning, and the sequence of data the computer has to
> analyse is shorter. So, for example, ‘the cat sat on the mat’ contains 6
> words,
> but 23 characters (including spaces). The computer has to perform fewer
> calculations if it is examining only 6 words rather than 23 characters. In
> addition, in a word-encoding scheme, ‘cat’ and ‘sat’ are simply two
> different
> words. If you let the computer see ‘c-a-t’ and ‘s-a-t’, it has a harder
> learning
> task, because it won’t know in advance that ‘cat’ and ‘sat’ are totally
> different: 2/3 of the letters are the same!

Unsubscribe at:
List posts to:
List info and archives at at:
Listmember interface at:
Subscribe at: