Humanist Discussion Group

Humanist Archives: Oct. 7, 2024, 10:13 a.m. Humanist 38.172 - PhD studentship in language tech (Groningen)

				
              Humanist Discussion Group, Vol. 38, No. 172.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org




        Date: 2024-10-04 12:58:32+00:00
        From: Federico Pianzola <f.pianzola@gmail.com>
        Subject: PhD in Language technology for cultural heritage (Groningen)

Deadline: 13th October

The Computational Linguistics group (GroNLP) of the Center for Language and
Cognition Groningen (CLCG) is looking for a PhD student in “Language
technology for cultural heritage: New discoveries with little data” within
the HAICu research project. The HAICu project is a large-scale Dutch
research project by universities and cultural-heritage institutions into
new forms of Artificial Intelligence-based access to multimodal
Cultural-Heritage data, both contemporary and historical. Within HAICu, AI
researchers, Digital Humanities researchers and a wide range of public and
private partners will co-develop scientific solutions to unlock the true
societal potential of the current heterogeneous digital heritage
collections. It will provide easier, richer and more reliable data access
to citizens, journalists, civic organisations, and various other
stakeholders.

HAICu is funded by the NWO National Science Agenda (NWA) and has a budget
of about EUR 10 million. HAICu has started in January 2024 and will last 6
years (until Jan 2030). For more information about HAICu, please see
https://www.haicu.science/

The PhD Project
This specific PhD position is about effectively dealing with missing and
sparse labels in humanities datasets such as literature, history,
philosophy. Cultural heritage institutions, and especially the National
Library of the Netherlands, offer access to a lot of digitized data which
can be leveraged through computational approaches. However, it is very
common that the data is incomplete. This is a challenge for typical machine
learning methods that rely on being fed with representative and complete
data, leading to systems that cannot handle distribution shifts or
extrapolating beyond their training set.

Recent developments in artificial intelligence have shown that large
language models are able to learn from small amounts of training data, or
even none at all (few shot and zero shot learning). Paired with more and
more accessible techniques for specializing existing models for target
domains and tasks, a lot of new possibilities open up for cultural heritage
data, which will be explored within this project. Examples of possible
topics include

- Investigating literary reception and prestige over time.
- Detecting and mapping intertextuality within texts.
- Uncovering the influences and biases over time in datasets.
- Monitoring the evolution of concepts in textual datasets.
- Improving the robustness of models to out-of-distribution data.

The project will, in collaboration with the National Library of The
Netherlands, be coordinated by Andreas van Cranenburgh, Tommaso Caselli,
and Malvina Nissim at the University of Groningen. This is an
interdisciplinary project at the intersection of Computational
Linguistics/Natural Language Processing (NLP) and the humanities.

You will be asked to

- Develop a specific research proposal within the proposed theme.
- Review the academic literature relevant to the project’s goals.
- Carry out research, present your results and author scientific articles
on the above mentioned topics.
- Collaborate with members of the Computational Linguistics group at the
University of Groningen, the National Library, and with the broader Haicu
consortium.
- Engage and collaborate with other researchers working on computational
humanities research.
- Complete a PhD thesis written in English in the specified timeframe (4
years).
- Collaborate on outreach and public engagement activities.
- Gain teaching experience.

This PhD project offers a unique opportunity to work in an international
environment and to acquire valuable research experience: You will be
carrying out research in the context of the Computational Linguistics group
of the Center for Language and Cognition (CLCG) of the University of
Groningen, and will be spending at least one day a month at the National
Library in The Hague.

For more information, see
https://www.rug.nl/about-ug/work-with-us/job-
opportunities/?details=00347-02S000AYDP


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php