Humanist Discussion Group

Humanist Archives: May 24, 2023, 6:57 a.m. Humanist 37.38 - pubs: large language models

				
              Humanist Discussion Group, Vol. 37, No. 38.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2023-05-23 14:28:39+00:00
        From: Dhanaraj Thakur <dthakur@cdt.org>
        Subject: New CDT report + event tomorrow May 24, 2023 (10 am ET) - Can Large Language Models Analyze Non-English Content?

Hi everyone,

We are excited to announce the publication of our new CDT research
report, “Lost in Translation: Large Language Models in Non-English
Content Analysis
<https://cdt.org/insights/lost-in-translation-large-language-models-in-non-
english-content-analysis/>.”

The report explains the capabilities of a new AI technology called
“multilingual language models” that technology companies claim can
understand content in over 100 languages by extrapolating linguistic
patterns from high-resource languages. We further describe how these
models work, and argue that they have significant limitations
<https://cdt.org/press/cdt-finds-key-shortcomings-when-large-language-models-
analyze-non-english-languages/>,
particularly in “low-resource languages” — languages for which AI
developers have little text data available to train AI models,
regardless of the number of speakers around the world.

Companies, researchers, civil society advocates, and policymakers should
be aware of these limitations, as they can create real barriers to
information access and equitable online participation for individuals.
We also offer guidance on how to help close the gap between companies’
ability to moderate content in English versus the world’s other 7,000
languages.

The full report is available on CDT’s website, along with executive
summaries
<https://cdt.org/insights/lost-in-translation-large-language-models-in-non-
english-content-analysis/>
in Spanish, French, and
Arabic<https://cdt.org/press/cdt-finds-key-shortcomings-when-large-language-
models-analyze-non-english-languages/>.
Tomorrow, we’ll discuss the paper at an event called “Mind the Gap”
<https://cdt.org/event/mind-the-gap-can-large-language-models-analyze-non-
english-content/>(see
below for more details) — we hope you can join us!

Finally, we have an article out in WIRED
<https://www.wired.com/story/content-moderation-language-artificial-
intelligence/>about
how social media companies specifically use multilingual language models
to moderate content in languages other than English.


Feel free to share, and let us know if you have any questions or feedback.

take care,
Dhanaraj

--

Dhanaraj Thakur (he/him) | Research Director
Center for Democracy & Technology | cdt.org 
E: dthakur@cdt.org | P: +1 202 407 8849



_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php