Humanist Discussion Group

Humanist Archives: March 27, 2023, 7:53 a.m. Humanist 36.484 - numbers for words: advantages?

				
              Humanist Discussion Group, Vol. 36, No. 484.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2023-03-26 14:25:06+00:00
        From: Henry Schaffer <hes@ncsu.edu>
        Subject: Using numbers for words?

I was at a workshop about large scale computer processing with neural
networks/AI and Natural Language Processing (NLP) came up briefly. The
presenter mentioned that typically numbers were substituted for words - but
didn't discuss why. She referred us to
https://www.tensorflow.org/tutorials/text/word2vec as a method, and there's
some more explanation at https://en.wikipedia.org/wiki/Word2vec

I can see an advantage in storage and processing speed when dealing with a
word represented as perhaps 2 bytes rather than using perhaps 10-20+ bytes
per word, but I don't see any additional advantage. Do you?

Representing a word as a vector allows more information to be kept (as in
word2vec) and so that could give other advantages.

Can anyone add more explanation/reasons?

--henry


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php