Humanist Discussion Group, Vol. 36, No. 496. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-03-29 14:28:14+00:00 From: James Rovira <jamesrovira@gmail.com> Subject: Re: [Humanist] 36.489: numbers for words These posts have been interesting and informative to me, and I'm grateful for them. Michael -- Are there any models that take into account polysemy in your example no. 2 below? Not just that a word has multiple meanings, but that a single text can exploit multiple meanings of the same word even within the same sentence? The word "father" can be a noun or a verb, for example, and has seven definitions as a noun and three as a verb just in Miriam Webster. Jim R On Wed, Mar 29, 2023 at 1:36 AM Humanist <humanist@dhhumanist.org> wrote: > > 2. As a word => one number per word > > In most NLP applications, it makes more sense to train the model on the > words in > the data rather than the letters. The biggest reason for this is that the > words > are the main unit of meaning, and the sequence of data the computer has to > analyse is shorter. So, for example, ‘the cat sat on the mat’ contains 6 > words, > but 23 characters (including spaces). The computer has to perform fewer > calculations if it is examining only 6 words rather than 23 characters. In > addition, in a word-encoding scheme, ‘cat’ and ‘sat’ are simply two > different > words. If you let the computer see ‘c-a-t’ and ‘s-a-t’, it has a harder > learning > task, because it won’t know in advance that ‘cat’ and ‘sat’ are totally > different: 2/3 of the letters are the same! > > _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php