Humanist Discussion Group, Vol. 36, No. 484. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-03-26 14:25:06+00:00 From: Henry Schaffer <hes@ncsu.edu> Subject: Using numbers for words? I was at a workshop about large scale computer processing with neural networks/AI and Natural Language Processing (NLP) came up briefly. The presenter mentioned that typically numbers were substituted for words - but didn't discuss why. She referred us to https://www.tensorflow.org/tutorials/text/word2vec as a method, and there's some more explanation at https://en.wikipedia.org/wiki/Word2vec I can see an advantage in storage and processing speed when dealing with a word represented as perhaps 2 bytes rather than using perhaps 10-20+ bytes per word, but I don't see any additional advantage. Do you? Representing a word as a vector allows more information to be kept (as in word2vec) and so that could give other advantages. Can anyone add more explanation/reasons? --henry _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php