Humanist Discussion Group, Vol. 37, No. 38. Department of Digital Humanities, University of Cologne Hosted by DH-Cologne www.dhhumanist.org Submit to: humanist@dhhumanist.org Date: 2023-05-23 14:28:39+00:00 From: Dhanaraj Thakur <dthakur@cdt.org> Subject: New CDT report + event tomorrow May 24, 2023 (10 am ET) - Can Large Language Models Analyze Non-English Content? Hi everyone, We are excited to announce the publication of our new CDT research report, “Lost in Translation: Large Language Models in Non-English Content Analysis <https://cdt.org/insights/lost-in-translation-large-language-models-in-non- english-content-analysis/>.” The report explains the capabilities of a new AI technology called “multilingual language models” that technology companies claim can understand content in over 100 languages by extrapolating linguistic patterns from high-resource languages. We further describe how these models work, and argue that they have significant limitations <https://cdt.org/press/cdt-finds-key-shortcomings-when-large-language-models- analyze-non-english-languages/>, particularly in “low-resource languages” — languages for which AI developers have little text data available to train AI models, regardless of the number of speakers around the world. Companies, researchers, civil society advocates, and policymakers should be aware of these limitations, as they can create real barriers to information access and equitable online participation for individuals. We also offer guidance on how to help close the gap between companies’ ability to moderate content in English versus the world’s other 7,000 languages. The full report is available on CDT’s website, along with executive summaries <https://cdt.org/insights/lost-in-translation-large-language-models-in-non- english-content-analysis/> in Spanish, French, and Arabic<https://cdt.org/press/cdt-finds-key-shortcomings-when-large-language- models-analyze-non-english-languages/>. Tomorrow, we’ll discuss the paper at an event called “Mind the Gap” <https://cdt.org/event/mind-the-gap-can-large-language-models-analyze-non- english-content/>(see below for more details) — we hope you can join us! Finally, we have an article out in WIRED <https://www.wired.com/story/content-moderation-language-artificial- intelligence/>about how social media companies specifically use multilingual language models to moderate content in languages other than English. Feel free to share, and let us know if you have any questions or feedback. take care, Dhanaraj -- Dhanaraj Thakur (he/him) | Research Director Center for Democracy & Technology | cdt.org E: dthakur@cdt.org | P: +1 202 407 8849 _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php