Humanist Discussion Group

Humanist Archives: March 31, 2023, 8:13 a.m. Humanist 36.504 - numbers for words

              Humanist Discussion Group, Vol. 36, No. 504.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                Submit to:

    [1]    From: Michael Falk <>
           Subject: Re: [Humanist] 36.496: numbers for words (60)

    [2]    From: Maroussia Bednarkiewicz <>
           Subject: Re: [Humanist] 36.496: numbers for words (44)

        Date: 2023-03-30 10:04:55+00:00
        From: Michael Falk <>
        Subject: Re: [Humanist] 36.496: numbers for words

Great question re: polysemy. The short answer is “yes.” The slightly longer
answer is: there are different ways of representing polysemy

I will just compare two examples: word vectors (e.g. Word2Vec or FastText)
and LDA topic models.

In a word vector model, each word is represented by a vector of numbers. A
typical model might assign a vector of fifty, 100 or 150 numbers to each
word. The individual numbers in the vector have no human meaning. They are
basically just an arbitrary set of numbers that represent what the computer
has learned about it how a particular word is used. If a word is used in
several different senses in the corpus, then this in principle will be
encoded in the set of numbers somehow.

How does a word vector model realise that a word is used in several senses?
It depends on the training algorithm used, but a simple example is the
“skip-gram” model. In such a model, the computer tries to learn what words
appear on either side of a given word. Since a word will tend to have
different “neighbours” when used in different senses, the computer should
notice this and somehow encode the information in the vector of numbers for
that word.

A topic model stores information about polysemy in a different way. A topic
model also represents each word by a vector of numbers, but in this case
each number represents the probability of the word being assigned to a
given topic. For example, imagine that you train a topic model to find 2
topics in a given corpus (typically you would search for 10s or 100s of
topics, but let’s simplify the example). The question is: if a word is
assigned to topic x, how likely is it to be word y? Imagine that your
corpus contained many texts about embroidery and many texts about cookery
and let’s say that the computer correctly managed to distinguish embroidery
discourse from cookery discourse. If a word were assigned to the embroidery
topic, then the probability that the word is “appliqué” might be 0.005 (5
in every 1000 words in embroidery texts are the word “appliqué”). The
probability of “appliqué” appearing in a cookery text is presumably lower.
By contrast, you occasionally stitch food in cooking, so we might imagine
that “stitch” could have a probability of 0.007 in the embroidery topic and
0.0007 in the cookery topic, for example. Likewise if there were sport
texts in the corpus, the model should be able to work out that the word
“stitch” appears in sporting discourse, in the sense of “cramp” (is that an

I guess the example of “stitch” does raise the question of what counts as
polysemy. You could say that “stitch” has the same meaning in both cases.
But topic models are certainly capable of encoding polysemy of other kinds,
as I hope the sport example can help you to imagine.

Of course, if you mean polysemy in the original sense used by Dante, then I
know of no model that can do that! There many be one, but it is far beyond
my ken.

Michael Falk, PhD, FHEA
Postdoctoral Research Associate Wikipedia and the Nation’s Story |
University of Technology Sydney

Sent from my mobile phone.

        Date: 2023-03-30 08:20:28+00:00
        From: Maroussia Bednarkiewicz <>
        Subject: Re: [Humanist] 36.496: numbers for words


Thank you so much for this interesting discussion! One should add to
Micheal's division of word representation from string, word and n-gram a
different division that prevails nowadays: the division between static
embedding and dynamic embedding.

---Static Embedding---
Word embeddings, known as 'static embeddings' offered by models like
Word2Vec (GloVe, FastText) will not account for polysemy. They represent
text units (whether strings, words, sub-words or characters) in a static
way without their contexts.

---Dynamic Embedding---
The kind of more advanced embeddings I described which represent a
word/sub-word/etc. *and* its context will recognise different usage of the
considered unit if enough examples of all the different contexts are
provided to the model in the training dataset. Hence the embeddings, i.e.
word and context representations, will represent as many 'fathers' as they
appear in different contexts (can be more than 6, i.e. there can be more
embeddings to father than its six entries in Miriam Webster). This is why
they are called dynamic. In a post-processing step one could for example
gather all the instances of 'father' and perform a classification task to
account for their different categories or contexts.

(I prefer not to use 'n-gram' in the case of dynamic embeddings because
n-gram can give the false idea of a fixed number [n] of
characters/words/sub-words, which is not the case in dynamic embeddings
used by BERT or GPT)

This blog by Stanford AI does a very good job at explaining contextual
embeddings and their advantages:

The distinction between static and dynamic embedding is crucial today with
the latest models: for example Micheal's point on character embedding is
valid only for static embedding, as shown in this paper about CharBERT, a
model using dynamic character embeddings which produced good results on
different NLP tasks (question answering, sequence labeling, and text
classification tasks):

I look forward to reading Micheal's experience on text representations and

Unsubscribe at:
List posts to:
List info and archives at at:
Listmember interface at:
Subscribe at: