Programme – NLPerspectives

The workshop is in Room #4

Accepted papers

What is Truth in NLP? Reflecting on Progress, Lessons, and Open Challenges as NLPerspectives turns Five
Gavin Abercrombie
The Rashomon Wikipedia: A Data-Perspectivist Analysis of Divergent Historical Narratives
Claudiu Creanga, Liviu P. Dinu and Anca Dinu
Cognitive Interaction Layers as a Perspectivist Framework for Neuro-Symbolic AI
Yevheniia Babenko
GSI:detect – A Perspectivist Approach to Gender Stereotypes Identification in Italian
Davide Testa, Sofia Brenna, Manuela Speranza, Gloria Comandini, Stefania Cavagnoli and Bernardo Magnini
Quantifying and Predicting Disagreement in Graded Human Ratings
Leixin Zhang and Çağrı Çöltekin
HurtLens: A Perspectivist Corpus Analysis of Hurtful Language
Samuele D’Avenia, Eliana Di Palma, Marta Marchiori Manerba and Valerio Basile
A Measure of Systematic Disagreement
Valerio Basile
Fine-Grained Perspectives: Modeling Explanations with Annotator-Specific Rationales
Olufunke O. Sarumi, Charles Welch and Daniel Braun
Structured Disagreement in Health-Literacy Annotation: Epistemic Stability, Conceptual Difficulty, and Agreement-Stratified Inference
Olga Kellert, Sriya Kondury, Candice Koo, Nemika Tyagi and Steffen Eikenberry
SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs
Pietro Bernardelle, Leon Froehling, Stefano Civelli and Gianluca Demartini
Beyond WEIRD: Benchmarking Perspective Coverage of LLMs on Serbian Sociopolitical Topics
Vuk Dinic and Nikola Ljubešić
ChatGPT, why can’t anyone afford a house?:On the Effects of LLM pre-annotation on Annotator Subjectivity
Emilie Francis, Ricardo Muñoz Sánchez, Céline Leuzinger and Lee D. Gauthier
An Overview of Current Practices and Recommendations for Working with Stereotypes in NLP
Alessandra Teresa Cignarella and Matteo Pellegrini
Research Communication: Towards a Perspectivist Turn in Argument Quality Assessment
Julia Romberg, Maximilian Maurer, Henning Wachsmuth and Gabriella Lapesa
From Standpoints to Conflicts: Community Perspectives on AI Translation
Yujun Wang, Wei Zhao and Ehud Reiter
Modeling Perspectives in NLP: Parameter-Efficient Perspective Conditioning for Span Extraction and Summarization
Harikrishnan Gurushankar Saisudha and Sabine Bergler
Your Model is Overconfident, and Other Lies We Tell Ourselves
Aman Sinha, Timothee Mickus and Raul Vazquez
A Pilot Study Investigating Stakeholder Subjectivity in Collaborative Dialog Analysis
Ananya Ganesh, Martha Palmer and Katharina von der Wense

Invited talk

Federico Cabitza

“Plurality and singularity. Ideas on less traveled paths of people’s interaction with their decision-making tools”

Abstract

In this talk, with a deliberately vaguely Calvino-esque title, I’d like to share some reflections on the importance of not neglecting the plural nature of the most recent applications of artificial intelligence, both predictive and generative.

What do I mean by plural nature or plurality? First, the fact that next-generation systems rely on data produced by a plurality of people, in a variety of different contexts and situations: consider, for example, medical cases. Paradoxically, AI is conceived as a “simplifier” of this plurality (for example, when it provides a single label plausibly considered the “correct” one to characterize a given instance or object). Taking this plurality seriously opens up very interesting research developments, which we could explore together, extending and evolving some seminal studies I’ve conducted in recent years.

For example, I will suggest how important it is, when involving a group of experts in the ground truthing process—that is, in building the training data for a predictive system—for example, to support a diagnosis, it is important to collect not only their interpretation but also some complementary information: how confident they are in their judgment (confidence), how difficult they consider the case, the degree of uncertainty surrounding each individual assessment, and the accuracy of each rater. These are valuable insights that can significantly improve the way we design, evaluate, and use AI systems, returning to considering each training case as a singularity, that is, unique and different from the others (which is indeed what it is).

I will also argue that, during the requirements gathering, readiness assessment, or deployment phase within an action research project, it is particularly useful to ask prospective users two specific questions, which I call the k-question and the alpha-question, respectively. The first, the “key” question, concerns the tolerance threshold for certain types of error; the second—where alpha is understood as a privative alpha—concerns the cases in which it is preferable not to use AI. The answers to these questions, if analyzed correctly, can help us make systems more calibrated and personalized to the cognitive preferences of the individual decision-makers who will use them. Here too, personalizing the system means caring for the individual who will use it.

I will then discuss machine semiotics, that is, what users understand about the visual cues produced by machines, such as saliency maps in diagnostic imaging; algorithmic theory of mind, that is, the mental models that users build to understand whether a machine is competent, reliable, or plausibly correct; and how these constructs can help us better understand the phenomenon of appropriate reliance, guiding more rigorously evidence-based system development and focusing more on optimizing the appropriateness of using AI systems—that is, reliance on them.

I’ll conclude with two provocative remarks. The first is that we should begin developing systems that optimize not only classification accuracy, but also—and perhaps above all—the actual utility of their performance in real-world contexts. The second is that we should design systems capable not only of getting close to the correct answer, but also of recognizing when the user is making a mistake: which is not the same thing at all.

Federico Cabitza is an Associate Professor at the University of Milano-Bicocca, where he teaches Human-Computer Interaction and Decision Support Systems and he is currently the head of the MUDI Laboratory, and also coordinator of the local node of the national “Informatics and Society” laboratory of the CINI consortium; since 2026, he is also Director of the Digital Health and Wellbeing Center at the Fondazione Bruno Kessler (FBK) of Trento (Italy). Since 2016, he has been collaborating with several hospitals, including the IRCCS Galeazzi Sant’Ambrogio Hospital in Milan (Italy), with which he has held a formal affiliation for 10 years. He is also a founding partner and scientific director of the university spin-off Red Open srl, which specializes in AI impact assessment. To date, he has authored over 300 scientific publications and has been ranked for several years among the world’s top 2% of scientists, according to the Stanford University database. He is also the co-author, together with Luciano Floridi, of the book Intelligenza Artificiale. L’uso delle nuove macchine (2021), published by Bompiani..

Selected publications:

Cabitza, F., Campagner, A., & Sconfienza, L. M. (2020). As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI. BMC Medical Informatics and Decision Making, 20(1), 1-21.
Cabitza, R Rasoini, GF Gensini (2017) Unintended consequences of machine learning in medicine Jama 318 (6), 517-518F
Cabitza, F., Campagner, A. and Basile, V., 2023, June. Toward a perspectivist turn in ground truthing for predictive computing. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 6, pp. 6860-6868).
Cabitza, F., Fregosi, C., Campagner, A. and Natali, C., 2024, July. Explanations considered harmful: the impact of misleading explanations on accuracy in hybrid human-AI decision making. In World conference on explainable artificial intelligence (pp. 255-269). Cham: Springer Nature Switzerland.