Program at a Glance

Registration desk open on both days: 08:30-12:30 and 14:00-16:00

Thursday, April 16, 2026

TimeSession
09:30-10:00Opening Session / Welcome Reception
10:00-10:30Coffee Break
10:30-11:30Keynote – Dr. Nasredine Semmar
11:30-12:30Track 2: AI for Language Preservation
12:30-14:00Lunch
14:00-16:00Track 1: AI for Cultural Heritage
16:00-16:30Coffee Break
16:30-18:00Track 3: AI for Language Revitalization
19:45 (22:30) Banquet (cruise on the Seine river)-TBC

Friday, April 17, 2026

TimeSession
09:00-10:00Keynote – Dr. Mélanie Jouitteau
10:00-10:30Coffee Break
10:30-11:30Track 5: AI for Community Empowerment and Sovereignty
11:30-12:30Track 4: AI for Ethical Frameworks and Data Governance
12:30-14:00Lunch
14:00-16:00Track 6: AI for Environmental and Climate Challenges of Indigenous Communities
15:00-16:20Coffee Break
16:20Closing Session
18:00-21:00 The Arts et Métiers Museum (Extra Event)

Full Program

Thursday, April 16, 2026

09:30-10:00 — Opening Session / Welcome Reception
10:00-10:30 — Coffee Break
10:30-11:30 — Keynote

Multilinguality in Large Language Models

Presenter: Dr. Nasredine Semmar

Abstract (EN)

General-purpose Large Language Models (LLMs) have achieved impressive performances in a wide range of Natural Language Processing tasks and applications. However, nowadays LLMs with the best performance are those built for resource-rich languages where annotated and non-annotated corpora are available. In this talk, the main challenges faced in extending LLMs to new languages are discussed. This includes the fundamental concepts behind LLMs and their architectures, the current state of the art, and the different approaches used to extend their abilities to handle multiple languages, especially low-resource languages.

Speaker bio

Nasredine Semmar is a Director of Research at CEA List – Université Paris-Saclay. He obtained his PhD in computer science from University of Paris Sud (France) in 1995 and he received an Accreditation to Supervise Research (HDR) in 2021 from Paris-Saclay University. He worked in industry from 1996 to 2002, first at Lionbridge Technologies–Bowne Global Solutions as R&D engineer and then at SAP-Business Objects as expert in software internationalization and localization. Dr. Nasredine Semmar joined CEA List in 2002 and his current research interests include emerging methods and technologies in cutting edge areas of Natural Language Processing (NLP) and Artificial Intelligence (AI). His expertise emphasis is on the use of Generative AI (GenAI) for inducing multilingual resources and tools for low-resource languages. Dr. Nasredine Semmar has supervised five completed PhD theses and is currently supervising four PhD students. He has published over than 120 papers in refereed journals and conferences and he is in the editorial board of the “Natural Language Processing” Journal and is member of the Scientific Committees of major NLP conferences (ACL, EMNLP, NAACL, IJCAI, COLING, LREC…). Dr. Nasredine Semmar participated to the evaluation campaign EVALDA-ARCADE II in the field of sentence and word alignment from parallel corpora and to the shared task on the discrimination and identification of Similar Languages, Varieties and Dialects at VarDial workshops. He has coordinated and participated in more than 20 research projects in EU FP7, H2020, international and national projects. He has acted as keynote speaker at the 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT 2016) and INFOL@NGUES 2019, and has provided a tutorial at the 16th International Conference on Human System Interaction (HSI 2024). He is co-chair of the Track NLP of the ACS/IEEE International Conference on Computer Systems and Applications (AICCSA) since 2020.

11:30-12:30
Track 2 — AI for Language Preservation
Paper 1

Enabling Digital Documentation of Odia, a Low-Resource Indo-Aryan Language, through Efficient Bidirectional Odia-German Machine Translation

Abhinandan Samal and Tianxiang Lu

Abstract (EN)

Recent advances in multilingual neural machine translation have been driven by large Transformer-based models trained on web-scale data. Nonetheless, many languages remain underrepresented due to the scarcity of high-quality, ethically usable parallel corpora, a challenge especially acute for low-resource Indo-Aryan languages like Odia. This work presents Odia-German translation as a case study for AI-enabled language documentation, choosing this pair to investigate translation across a significant typological divide. An end-to-end framework is proposed that integrates corpus creation under real-world legal constraints with resource-efficient model adaptation. A bidirectional Odia-German parallel corpus is curated from contemporary journalistic sources using script-aware preprocessing and a hybrid human-in-the-loop translation pipeline. The study provides a reproducible blueprint for adapting generalist models to specialized domains. Experimental results show that parameter-efficient Low-Rank Adaptation achieves competitive translation quality while substantially reducing computational requirements, offering a sustainable infrastructure for preserving underdocumented languages.

Paper 2

Byte-Level Neural Machine Translation for Low-Resource Aramaic

Bechara Bou Abdo

Abstract (EN)

Aramaic is one of the world’s oldest living languages and remains central to the cultural, religious, and historical identity of several Middle Eastern communities. Despite its importance, Aramaic is severely underrepresented in contemporary Natural Language Processing research due to the scarcity of digital resources. This paper presents AramAI, an end-to-end system designed to support machine translation from Aramaic to English, focusing specifically on Classical Syriac Aramaic, the variety written in the Syriac script used in liturgical and biblical texts. While large multilingual language models provide broad coverage across many languages and scripts, they are not specifically optimized for Syriac-script Aramaic, and their subword tokenization can be brittle when faced with sparsely represented characters and orthographic variation. AramAI contributes a curated English-Syriac parallel corpus of approximately 31,000 sentence pairs and a systematic evaluation of a byte-level transformer. The model proves effective in a low-resource, non-Latin script setting through fine-tuning ByT5, which operates directly on raw bytes and avoids tokenizer adaptation. Experimental results demonstrate strong performance under constrained conditions for English-to-Syriac translation. Beyond model training, AramAI integrates an interactive web-based platform that enables real-time user interaction, structured feedback collection, and administrative monitoring. By combining dataset creation, byte-level MT evaluation, and community-oriented deployment, this work illustrates a practical path toward improved Aramaic NLP tools and sustained, community-driven language preservation.

Paper 3

Enabling OCR for Low-Resource Languages Through Synthetic Data: A Case Study on Kabyle

Lydia Lazib, Salem Hammadi, Anis Hachi, Youakim Badr, and Samia Bouzefrane

Abstracts (EN / Kabyle)

The digitization of written materials for language preservation increasingly relies on Optical Character Recognition technologies to transform printed and handwritten texts into machine-readable formats. While modern OCR systems achieve high performance for well-resourced languages, their applicability to low-resource and minority languages remains limited by scarce annotated data, non-standardized writing systems, and diverse visual conditions in real-world documents. This paper presents an OCR system designed for low-resource settings through a data-centric and system-oriented approach. The proposed solution integrates document or scene classification, DBNet-based text detection, and SAR-based text recognition within a unified workflow, supported by large-scale synthetic data generation to compensate for the lack of annotated corpora. The system is designed to handle both document images and natural scene text under realistic digitization constraints. The approach is evaluated through a case study on Kabyle, a Berber language written in a Latin-based script with extended characters. Experiments conducted on manually annotated benchmarks show that synthetic data plays a critical role in improving detection robustness and word-level recognition accuracy, and that the proposed system outperforms a traditional OCR baseline under challenging scene text conditions. Beyond Kabyle, this work provides transferable insights and practical design strategies for developing OCR systems for other low-resource languages facing similar challenges in language documentation and preservation.

Asmiḍen n yiḍrisen yettwarun, i uḥraz n tutlayt, ibedd s waṭas s titiknulujiyin n uɛqal n yisekkilen (OCR) i wakken iselkimen ad ɣṛen iḍrisen iqbuṛen. Ma yella inagrawen n OCR imaynuten teddun akken i wata lḥal i tutlayin yesɛan iɣbula, aseqdec-nsen deg tutlayin yenḥarfen mazal-it d ugur imi xuṣṣen isefka d inagrawen n tira yeṭṭafaṛen ilugan. Amagrad-a yettmeslay-d ɣef unagraw OCR yettwaxedmen i tutlayin ixuṣṣen iɣbula. Tifrat i d-newwi tgellu s usemxallef n yiḍsen/timuɣliwin, s tifin n uḍris s ttawil n DBNet, d uɛqal n uḍris s ttawil n SAR. Annect-a akk s ttawil n yisefka iragmawanen imi ikurpisen yettwarun xuṣṣen. Anagraw-a yettwassebded-d akken ad yesseqdec ama d tugniwin n yiḍrisen, ama d aḍris daxel n tmuɣliwin tiɣelnawin. Tarrayt-a nesqedc-it i tutlayt taqbaylit, yellan d tutlayt tamaziɣt yettwarun s yisekkilen ilatiniyen. Tirmitin yellan ɣef tferkiwin n yisefka ay d-yettwasbedden s ufus sseknent-d d akken isefka iragmawanen sɛan azal muqqren i usnerni n uswir n uɛqal. Dɣa, iban-d d akken anagraw nneɣ yugar OCR aqbur (Tesseract) deg tmuɣliwin tiɣelnawin. Nnig n Teqbaylit, leqdic-a yefka-d timsirin a nelmed i wakken a nessali anagraw OCR i yal tutlayt yenḥerfen i wumi xuṣṣen ikurpisen

12:30-14:00 — Lunch
14:00-16:00
Track 1 — AI for Cultural Heritage
Paper 1

Historical Tibetan Normalisation: Rule-Based vs Neural & n-Gram LM Methods for Extremely Low-Resource Languages

Marieke Meelen and Rachael M. Griffiths

Abstract (EN / Tibetan)

Historical Tibetan manuscripts present significant normalisation challenges due to extensive abbreviations, non-standard orthography, and the complete lack of established gold-standard data. This paper presents a hybrid approach combining rule-based methods with character-level encoder-decoder transformer models enhanced with n-gram-based language models to normalise extremely difficult diplomatic Tibetan texts into Standard Classical Tibetan. The study addresses the scarcity of parallel training data through data augmentation, compares tokenised and non-tokenised approaches, and evaluates performance on different types of test sets. This work contributes to the understudied task of historical text normalisation, with implications beyond Tibetan for digital humanities and no- or low-resource language work.

དེའང་བོད་ཀྱི་ལོ་རྒྱུས་ཀྱི་དུས་རིམ་སོ་སོར་བྱུང་བའི་དཔེ་རྙིང་བྲིས་མ་མང་ཆེ་བའི་ནང་། བསྡུས་པ་དང་བསྐུང་བའི་ཆ་མང་བ་དང་། ཚད་གཞི་གཅིག་མཐུན་དུ་གྱུར་བའི་དག་ཡིག་གི་འགྲོས་དང་མཐུན་པའི་ཐ་སྙད་དང་ཚིག་སྦྱོར་ཤིན་ཏུ་དཀོན་ཞིང་། རྩ་བ་ནས་མ་ལོན་པའི་རིགས་ཀྱང་མང་དུ་ཡོད་པའི་ཕྱིར། སྤྱིའི་ཆ་ནས་བསྐུང་ཡིག་དེ་དག་ཚད་ལྡན་གྱི་དག་ཡིག་ཏུ་ལམ་ནས་བཀྲལ་ཐུབ་པར་དཀའ་ངལ་མི་ཉུང་བ་ཞིག་བྱུང་ཡོད། རྩོམ་ཡིག་འདིའི་ནང་འཕྲུལ་རིག་གི་མ་ལག་ལ་བརྟེན་པའི་ཐབས་ལམ་གཉིས་ཏེ། བསྐུང་ཡིག་ངོས་འཛིན་བསྡུས་དགྲོལ་(encoder-decoder transformer models) ཞེས་པའི་མ་ལག་བེད་སྤྱད་དེ་བསྐུང་ཡིག་རྣམས་རིམ་པར་བསྒྱུར་བཅོས་བྱེད་པའི་ཐབས་ལམ་དང་། ན་སྲང་(N-gram models) ཞེས་པའི་མ་ལག་བེད་སྤྱད་དེ་ཡིག་འབྲུ་གང་མངོན་པ་གསལ་བར་མ་བྱུང་ན་ཚོད་དཔག་གིས་གཏན་འབེབས་བྱས་པའི་ཐབས་ལམ་དེ་གཉིས་མཉམ་དུ་སྤྱད་དེ། དཀའ་ངལ་ཤིན་ཏུ་ཆེ་བའི་བསྡུས་བསྐུང་གི་ཡིག་འབྲུའི་རིགས་རྣམས། ཚིག་རྐང་རེ་རེ་བཞིན་རྒྱུན་སྤྱོད་དག་ཡིག་གི་འགྲོས་དང་མཐུན་པར་གསལ་བོར་བཀྲལ་ཐུབ་པའི་ཐབས་ལམ་ཞིག་བསྟན་ཡོད། དེ་བཞིན་དཀའ་ངལ་དེ་དག་སེལ་བའི་ཆེད་དུ་འཕྲུལ་རིག་སྦྱོང་བརྡར་གྱི་ཐབས་ལམ་དེ་རིགས་ངོ་སྤྲོད་དང་། མཚན་གཞིར་འཛིན་འོས་པའི་དཔེ་མཚོན་ཡིག་ཚོགས་ཁ་སྣོན། བྲིས་ཤོག་གི་ངོས་ནས་མཚོན་རྟགས་ཡོད་མེད་དང་། མིང་ཚིག་དང་ཚིག་ཕྲད་བར་གྱི་སྟོང་ཆ་ཡོད་མེད་ཀྱི་ཁྱད་པར་སོགས་ལ་དབྱེ་ཞིབ་བྱས་ཏེ། དཔྱད་ཐབས་མི་འདྲ་བའི་སྒོ་ནས་ཚོད་ལྟ་མི་འདྲ་བ་བྱས་པ་ལས་དཔྱད་འབྲས་ཇི་ལྟར་བྱུང་བའི་གནས་ཚུལ་རེ་རེ་བཞིན་གསལ་པོར་བསྟན་ཡོད། མདོར་ན། རྩོམ་ཡིག་འདིར་མཚམས་སྦྱོར་ཞུས་པའི་བསྐུང་ཡིག་རྒྱས་དགྲོལ་བྱེད་ཐབས་ཀྱི་དཔྱད་ཞིབ་འདི་ནི། བོད་ཡིག་ཙམ་ལས་མ་ཡིན་པར། སྐད་རིགས་གཞན་གྱི་ཐོག་ཏུ་གནས་པའི་ལོ་རྒྱུས་ཡིག་ཆ་སོགས་གྲངས་འཛིན་ཅན་དུ་གཏོང་བ་དང་། མི་ཆོས་རིག་གནས་ཀྱི་ངོས་ནས་གྲངས་ཉུང་སྐད་རིགས་ཀྱི་ཡིག་ཆའི་རིགས་ལ་དཔྱད་ཞིབ་བྱེད་པར་ཕན་ཐོགས་མི་དམན་པ་ཞིག་བྱུང་ཡོད།.

Paper 2

Tsi Ionkwanikòn: Ra Otekhnótshera Ionhwéntsare Ne Kahnhó: A Research-Creation Praxis for Building Land-Based AI Ecologies

Jackson Leween (Two Bears), Tanya Doody, and Cassie Packham

Abstract (EN)

This paper articulates an Onkwehonwe framework for rethinking artificial intelligence, data, and technology as relational systems embedded within land-based ecologies of kinship, responsibility, and reciprocity. Writing from within Haudenosaunee cosmologies, it argues that mainstream AI is shaped by a colonial metaphysics that abstracts intelligence from land, bodies, and relations, reducing knowledge to extractable data and reproducing logics of hierarchy, control, and elimination. Against this, ionhwéntsare is understood as communal, distributed, and emergent, circulating through human and more-than-human worlds as a living field of relations sustained by obligation and care. The paper maps these commitments through three interrelated research-creation projects: a pit-fire project that treats fire, clay, and sensors as co-thinking agents; a digital wampum project that frames data as treaty and responsibility; and virtual environments where AI operates as a storytelling presence rather than an abstract machine. Together, these projects propose land-based AI ecologies that refuse abstraction and foreground kinship as the condition of intelligence itself.

Paper 3

When AI Had a Dream: A Novel LLM to Generate Speeches from Martin Luther King Jr. – Perspectives and Limitations

Kate James, Antoine Vacavant, Florian Pelerin, and Eric Agbessi

Abstract (EN)

This article introduces a novel historical large language model specifically trained on the speeches of Martin Luther King Jr., complemented by a curated collection of documents related to the American civil rights movement. The objective is twofold: to explore the capacity of contemporary language models to emulate historically situated rhetoric, and to assess the authenticity of the generated content through a responsible and transparent evaluation framework. Neural text generation is combined with methods from corpus linguistics, using statistical stylistic features to compare generated speeches with the original source texts. Results show that the proposed model can produce linguistically convincing outputs that capture key rhetorical patterns characteristic of MLK’s discourse, including lexical choices, syntactic structures, and thematic coherence. At the same time, the analysis reveals measurable discrepancies between generated texts and the authentic corpus, highlighting limitations in stylistic consistency and rhetorical balance. These divergences provide useful insights into the boundaries of historical text generation and suggest directions for improving model training, data selection, and evaluation protocols.

Paper 4

Asylum Processing Algorithms and Epistemic Violence: A Review of AI’s Role in Refugee Status Determination

Loso Judijanto and Sri Nurhayati

Abstract (EN)

The increasing adoption of artificial intelligence technologies in refugee status determination processes has sparked critical debates regarding fairness, credibility, and transparency in asylum adjudication. This study investigates how AI-based decision-making systems in asylum governance impact epistemic justice, particularly in terms of how refugee narratives are interpreted, evaluated, and potentially silenced through algorithmic processes. The paper systematically analyses existing academic literature to identify patterns of epistemic violence and structural bias embedded in automated asylum processing systems. Employing a systematic literature review as a qualitative research method, the study follows the PRISMA protocol to ensure methodological rigour. Results reveal that while AI may enhance administrative efficiency, it often reproduces testimonial injustice, misclassifies applicants, and lacks mechanisms for transparent justification. These systems risk marginalising refugees as knowers and compromising procedural fairness.

Paper 5

The Meta-Body of Artworks: Sequence-Based AI Storytelling for Culturally Situated Heritage Narratives

Abol Froushan, Cristine H. Legare, Marianne Magnin, and Marco Cappellini

Abstract (EN)

Cultural heritage AI systems face a central challenge: how to support narrative meaning-making without flattening cultural specificity or imposing external interpretive frames. Many existing approaches treat artworks as isolated objects and narratives as generated outputs, risking the loss of relational meaning developed through curatorial practice, exhibition contexts, and community interpretation over time. This paper proposes sequence-based AI storytelling grounded in the meta-body of artworks, understood as the evolving relational field through which artworks acquire meaning across exhibitions, narratives, and interpretive encounters. In computational terms, the meta-body can be represented as a semantic infrastructure linking artworks, motifs, curatorial framings, and narrative sequences. Within this architecture, AI facilitates traversal across culturally governed relational configurations rather than acting as an autonomous narrative author. A system blueprint and exploratory empirical findings from a pilot study are presented. From these findings, the authors derive design principles for culturally situated narrative AI, including sequence-first storytelling, metadata as architectural boundary, calibrated visual anchoring, and governed co-construction with refusal mechanisms.

Paper 6

From Lost Melodies to Living Soundscapes: Open Generative Pipelines for Reconstructing Indic Musical Heritage as Contemporary Electronic Media

Sai Gattupalli, Poulomi Chakravarty, and Ivon Arroyo

Abstract (EN)

Intangible musical traditions transmitted through oral pedagogy and lineage practice face displacement within platform-mediated digital ecologies dominated by electronic genres. When original musical settings are lost and survive only as text, the challenge shifts from preservation to reconstruction. This paper presents a modular, tool-agnostic generative digitization pipeline employing text-to-music systems as bounded reconstruction engines, transforming Indic intangible musical heritage into contemporary electronic media formats while preserving linguistic fidelity, instrumental intentionality, and cultural recognizability. Using a seventeenth-century Telugu devotional composition as an anchor case, the authors conduct cross-platform evaluation across three text-to-music systems under identical prompt conditions. A staged human-in-the-loop verification protocol governs iterative refinement across lyric orthography, pronunciation, rhythm-lyric alignment, and instrumental recognizability. Professional Carnatic music educators confirmed strong linguistic clarity and structural retention while noting expected limitations in microtonal ornamentation. The paper contributes a reproducible digitization pipeline enabling students to become active creators of culturally grounded digital heritage media.

16:00-16:30 — Coffee Break
16:30-18:00
Track 3 — AI for Language Revitalization
Paper 1

Community-Governed AI Platform for Indigenous Language Revitalization and Cultural Knowledge Access

Karunya Srinivasan and Jaswitha Krovi

Abstract (EN)

Recent advancements in large language models and multilingual artificial intelligence raise both the challenge and the opportunity of improving the availability of Indigenous languages and knowledge systems. Yet nearly all currently available conversational AI capabilities are based on datasets that do not represent Indigenous worldviews and linguistic systems effectively. This paper proposes a community-managed bilingual conversational AI system developed to facilitate the preservation, revitalization, and culturally appropriate access of knowledge for an Indigenous community. The system combines a bilingual language understanding component capable of understanding both English and Mohawk, a community-managed knowledge base organized according to culturally identified management guidelines, and a retrieval-augmented generation system that aligns responses with approved cultural content while following community access rules. It supports both text and speech interaction and explicitly signals cultural constraints to users. The paper presents the architecture, data curation processes, and a governance-conscious evaluation framework.

Paper 2

Probing Discrete Units in Unit-Based Speech-to-Speech Translation: A Case Study on Central Kurdish

Lu Zuo, Mohammad Mohammadamini, and Aghilas Sini

Abstract (EN)

This paper presents the first systematic study of speech-to-unit translation for Central Kurdish, a low-resource Indo-Iranian language. While this paradigm has performed well in high-resource settings, its generalizability to distant languages remains an open question. The study evaluates different encoder architectures and target-speech synthesis strategies to bridge the data scarcity gap. Results reveal a critical dependence on acoustic consistency: while pretrained encoders and single-speaker target data enable fluent translation, multi-speaker configurations trigger phonetic instability during quantization. Through phonetic mapping and latent space visualization, the paper provides empirical evidence that speaker variability disrupts the formation of a stable phonetic-level representation. These findings clarify the mechanistic necessity of speaker normalization in unit-based models and establish a benchmark for expanding speech-to-speech translation to broader low-resource linguistic settings.

Paper 3

Constructing Hybrid Remote Co-Design Spaces for AI Tools with Indigenous Communities

Raquel Cordeiro, Claudio Pinhanez, João Paulo Bento, Nicole Grell, and Thomas Finbow

Abstract (EN)

This paper describes the construction of geographically distant, hybrid design spaces able to support a co-design process between IT professionals, designers, teachers, and students in two Indigenous communities in the Amazon. The goal is to co-design AI-based tools that support the digital writing of the local Indigenous language, Nheengatu. Two digital classrooms were installed in the communities, enabling weekly remote workshops with teachers and students. Initial challenges included infrastructure limitations, logistical constraints, and digital literacy. The authors observed that the creation of these third-space digital environments fostered community participation and emerging collaborative practices. The paper concludes by discussing ethical issues and reflecting on how this case informs broader approaches to remote co-design.

Paper 4

Assessing Open-Source LLMs for Anishinaabemowin: A Rapid Evaluation Framework for Low-Resource Indigenous Languages

Andrew McConnell and Jasmine Ly

Abstract (EN)

This paper presents two complementary methodologies for rapidly assessing the capacity of open-source large language models to work with Anishinaabemowin. The first methodology uses the GROQ cloud platform to compare multiple open models on common tasks such as basic translation, sentence completion, cultural vocabulary, and grammar sensitivity. The second methodology evaluates local deployment using Ollama, focusing on smaller-footprint models suitable for offline or community-controlled use. The framework emphasizes both practical accessibility and cultural caution, making it possible for communities and researchers to identify useful entry points before investing in fine-tuning or dataset development. Results suggest that most general-purpose open models remain inconsistent and often hallucinate, but some show promising partial competence, especially when prompted carefully. The paper offers a low-cost, repeatable evaluation protocol for early-stage LLM assessment in Indigenous language contexts.

17:50-18:00 — Track discussion
18:30-22:30 — Symposium Banquet

Friday, April 17, 2026

09:00-10:00 — Keynote

Working with and from the speaking communities

Presenter: Dr. Mélanie Jouitteau

Abstract (EN)

In the pressing age of AI, reducing the digital gap is a matter of survival for the most languages in the world. In practice, reducing this gap means for mostly minoritized and economically challenged communities to provide NLP developers with suitably licensed and accurately diverse linguistic data, enriched by community made metadata labeling. Dr. Jouitteau presents two citizen science projects addressing these challenges and deployed for Breton, a Celtic highly endangered language whose 110.000 speakers are bilingual with French. The ARBRES project supports a descriptive wikigrammar of the language. By design, the illustrative examples of the grammar constitute a corpus of exceptionally high structural diversity, which is a high quality product for the fine-tuning of translation AI models (Grobol & Jouitteau 2024). The YAR project addresses aligned sound/text corpora. It consists of a phone and a web application that collect sound clips to be geotagged on a map, and a platform to collectively transcribe them. It transversally addresses several scientific and social needs. First, the mapping of the recordings visibilizes this highly minoritized language in public uses and supports different specific cultural practices, including teaching. Doing so, it addresses variation at its source because the collected Breton varieties will reflect those in actual contemporary uses, providing raw data to of direct interest for transcription practices (Jouitteau, Antoine, Grobol & Millour 2025).

Speaker bio

Mélanie Jouitteau is a researcher on the Breton language for the CNRS in France since 2007. She specializes in grassroots collaborative science projects in minoritized contexts, formal and descriptive linguistics as well as the multidisciplinary bridges with both sociolinguistics and NLP.

10:00-10:30 — Coffee Break
10:30-11:30
Track 5 — AI for Community Empowerment and Sovereignty
Paper 1

AI and Project Leadership in Palm Oil Community Development Projects: A Review of Technology-Enabled Social Transformation and Stakeholder Engagement

Loso Judijanto and Sri Nurhayati

Abstract (EN)

Artificial Intelligence is increasingly transforming leadership practices and stakeholder dynamics in community development projects, particularly within the palm oil sector, where complex social, environmental, and economic interactions demand adaptive and data-driven governance. This study investigates how AI reshapes project leadership and enables social transformation and stakeholder engagement in palm oil community development initiatives. Using a systematic literature review grounded in PRISMA guidelines, the paper analyses peer-reviewed articles published between 2020 and 2025. The findings reveal that AI applications such as predictive analytics, stakeholder sentiment analysis, participatory platforms, and decision support systems can enhance project leaders’ ability to manage social risk, foster trust, and support inclusive planning. However, the study also identifies key challenges, including digital inequality, ethical concerns, and the need for leadership capabilities that combine technical literacy with social sensitivity.

Paper 2

By What Right and by Whose Land? Rethinking the Problem of Data Sovereignty Through Infrastructure, Community Empowerment and Ally-Ship

Micheal Ziegler

Abstract (EN)

Data is valuable and Indigenous data is sometimes more valuable than most. This paper argues that data sovereignty is not properly understood and therefore not always taken seriously because of limitations in conceptual and practical framings. Looking at geospatial knowledge infrastructure and related sciences, it identifies shortcomings in current data management strategies and argues that allyship and infrastructural justice are necessary conditions for meaningful implementation of Indigenous data sovereignty. Rather than treating sovereignty as a technical compliance problem, the paper repositions it as a question of rights, land, infrastructure, and material control over information systems.

Paper 3

IKTRACE: A Natural Language Processing Framework for Identifying Indigenous Knowledge Representation in Academic Literature

Tara Azin, Christy Caudill, Mako Sorensen, and Peter Pulsifer

Abstract (EN)

As artificial intelligence systems increasingly rely on vast textual data, respectful and accurate representation of Indigenous Knowledge has become critical. This paper introduces IKTRACE, a natural language processing framework for identifying, analyzing, and organizing Indigenous Knowledge representations in academic literature while adhering to Indigenous Data Sovereignty principles. The authors present a community-informed taxonomy co-developed with Inuit partners, comprising nine knowledge domains. Building on this taxonomy, they implement a multi-method knowledge identification pipeline using complementary NLP approaches. Outputs are integrated into a knowledge graph to support future expert and community validation. Applied to a corpus of Canadian Arctic research documents, IKTRACE demonstrated high accuracy and strong community relevance in expert validation.

11:30-12:30
Track 4 — AI for Ethical Frameworks and Data Governance
Paper 1

Multi-Agents System and Large Language Models for Studying Writers’ Creative Processes

Farès Fadili, Fatiha Idmhand, and Paolo D’Iorio

Abstract (EN)

Genetic criticism aims to shed light on creative processes by analyzing archives, manuscripts, and traces of the author’s work such as drafts, variants, deletions, additions, and rearrangements. These traces are essential to understanding thinking and writing methods, yet they are often scattered and heterogeneous, making them difficult to access and exploit within digital humanities. To address this challenge, the paper proposes an architecture combining large language models and multi-agent systems to support the study of writers’ creative processes. The system models specialized agents responsible for document analysis, chronology reconstruction, thematic grouping, and interpretive assistance. The paper presents the conceptual architecture, identifies the methodological stakes of using AI in genetic criticism, and outlines how such a system could assist researchers while preserving scholarly interpretability.

Paper 2

Ecological Extraction, Administrative Violence, and Discourse Mutation: Indigenous Sovereignty in the Landscape of Artificial Intelligence and Digital Democracy

Faisal Hamdan

Abstract (EN)

As Artificial Intelligence increasingly shapes digital publics through engagement-driven recommender systems, dominant narratives frame platforms as expanding autonomy through participation, visibility, and expression. This paper contests that assumption by asking how Indigenous sovereignty is transformed when environmental conflict becomes mediated through algorithmic visibility and administrative extraction. Focusing on a case of Indigenous struggle against fossil fuel pipeline expansion, the study investigates how claims to land, jurisdiction, and consent are reconfigured through digital discourse mutation. Drawing on an evolved netnographic design, the study analyses digital artifacts, interviews, and state documents to trace how territorial struggles are filtered by administrative verification regimes and transformed within comment fields shaped by platform amplification. The analysis identifies procedural recursion, metric capture, and counterframe escalation as recurring mechanisms. The paper concludes by outlining implications for ethical AI governance and Indigenous Data Sovereignty.

Paper 3

A Multi-Model Framework for Bias Analysis in Large Language Models

Sara Brahiti, Rachid Rebiha, Samia Bouzefrane, and Ryma Boussaha

Abstract (EN)

This paper presents a multi-model framework for the systematic analysis of bias in large language model outputs. The approach relies on a tailored set of metrics including gender polarity, toxicity, sentiment, regard, and stereotypical association. Adapted models specifically fine-tuned for toxicity detection and sentiment analysis are integrated, and novel formulations such as an advanced gender polarity analysis and a newly introduced stereotype association score are proposed. The analysis reveals the uniformity of systemic biases across different LLMs and enables the identification of a bias hierarchy across categories. The framework also supports model-to-model comparisons along specific bias dimensions and highlights overcorrection phenomena where LLMs exhibit compensatory behaviors that may introduce new distortions.

12:30-14:00 — Lunch
14:00-16:00
Track 6 — AI for Environmental and Climate Challenges of Indigenous Communities
Paper 1

AI as a Tool for Community Empowerment: Rethinking Technology Through Indigenous Perspectives in Climate Crisis Response

Alexandra Okada, Thuareag Santos, Eila Oliveira and Giseli Vaz

Abstract (EN)

This paper advances a framework for participatory climate research conducted with and for Indigenous and riverside communities in the Amazon. Grounded in the 2023–2024 Rio Negro drought crisis—one of the most severe on record—the study examines how open schooling partnerships, supported by Generative AI (GAI) mapping tools, can enable communities to understand, document, and respond to climate challenges in culturally grounded, ethically responsible, and politically meaningful ways. The central argument is that AI’s most important contribution lies not in technical sophistication, but in its potential to support relational, participatory, and protection-oriented approaches to knowledge: rather than functioning as a purely analytical or extractive tool, AI can be mobilised to strengthen community agency, intercultural collaboration, and the integration of diverse knowledge systems. The study is structured through the CARE-KNOW-DO framework that connects affective engagement and relational responsibility (CARE), intercultural and intergenerational knowledge coproduction (KNOW), and community-led action, advocacy, and artivism (DO). AI mapping tools are critically appropriated to integrate ancestral, local, and scientific knowledge in real time, supporting epistemic pluralism, Indigenous data sovereignty, critical emancipatory education, and principles of Responsible AI. Methodologically, the research combines narrative inquiry, participatory mapping, and transdisciplinary collaboration among a school, two universities (education and Amazon oceanography researchers), and the Tupé Indigenous community. Key contributions include: (1) epistemic plural documentation of climate impacts in the Tupé territory; (2) methodological innovation in AI-supported participatory mapping; (3) transdisciplinary reconceptualisation of AI in education toward protection-centered environmental knowledge; and (4) practical implications for climate resilience, Indigenous data governance, biocultural autonomy, and critical emancipatory pedagogy. By reconceptualising AI as a relational, protection-oriented, and educational tool, this study advances a transferable model for AI ethics, participatory climate resilience, and inclusive environmental governance in Amazonian contexts of historical marginalisation and ecological vulnerability.

Paper 2

Technology-Enhanced Climate Resilience Education for Indigenous Communities: A Frugal Approach

Imran S. A. Khan, Emmanuel G. Blanchard, and Sébastien George

Abstract (EN)

Indigenous communities are among the most affected by climate change because their livelihoods and cultures are closely tied to local ecosystems. Indigenous Knowledge has long supported environmental understanding and climate adaptation, but the increasing speed and scale of climate change place growing pressure on these knowledge systems, making it more difficult to rely on them alone. As a result, there is a growing need to support and strengthen Indigenous Knowledge by carefully integrating scientific knowledge and transmitting it through appropriate educational approaches. Delivering such climate-change education remains challenging in many Indigenous contexts due to limited access to trained educators and climate experts. In such contexts, well-designed technology can act as a supportive tool, helping to extend educational capacity where human expertise is scarce. This paper presents a frugal learning system that uses generative AI to power a virtual expert supporting climate resilience education. The system grounds AI-generated content in a curated knowledge base and integrates local atmospheric data to provide context-specific explanations. Designed to run on low-cost, locally deployable hardware with offline capabilities, it aims to complement educators rather than replace them.

15:00-15:30 — Coffee Break
Paper 3

Indigenous Environmental Knowledges as Governance Infrastructure for Climate AI in Climate Resilience and Adaptation

Sai Gattupalli, Poulomi Chakravarty, Urjani Chakravarty, Gulab Chand, and Ivon Arroyo

Abstract (EN)

Climate AI is rapidly expanding across forecasting, monitoring, and adaptation, yet deployments in Indigenous territories often treat Indigenous Knowledges as supplemental data and prioritize centralized, energy-intensive automation. This paper reframes Indigenous Environmental Knowledges not as data inputs, but as foundational governance infrastructure for climate AI. It synthesizes literature on Indigenous data sovereignty, community-based observing networks, and the material footprint of AI to identify structural mismatches between standard AI pipelines and place-based stewardship systems. The paper contributes a normative governance framework centered on the Human Interpretation Before Automation principle, which requires model outputs to be routed through Indigenous institutional oversight prior to any operational action. It combines this with the constraint of computational sufficiency to embed consent, energy sovereignty, and ecological limits directly into system architecture.

Paper 4

An Audit of Residual Social Associations in EO-Based AI Models for Wildfire Prediction

Rachid Rebiha and Mikael Chobert

Abstract (EN)

Earth Observation-based machine learning models are increasingly used to characterize wildfire activity and support climate risk assessment. Although typically framed as physically driven, wildfire dynamics unfold within socially structured landscapes shaped by development, governance, and vulnerability gradients. Whether such systems encode residual social associations after conditioning on environmental drivers remains underexplored. This paper introduces a counterfactual audit framework to evaluate representational sensitivity of wildfire prediction models to social context under strong physical conditioning. Using a continental-scale dataset built from wildfire detections in the contiguous United States, the study enriches events with landcover composition, vegetation and fuels, wildland-urban interface indicators, population density, and pre-event meteorology. Social attributes are used exclusively as audit dimensions. Across targets, residual social associations are statistically detectable but small relative to model error. The findings delineate the representational limits of conventional EO feature stacks and provide a reproducible audit methodology for climate and hazard AI systems where physical and social processes are spatially entangled.

Paper 5

AI-Supported Habitat Protection and Climate Adaptation in Indigenous Agricultural Territories

Pattharaporn Thongnim

Abstract (EN)

Indigenous agroforestry systems in tropical regions are increasingly affected by climate variability, complicating irrigation management and ecosystem stewardship. This study presents a climate-adaptive decision-support framework for agroforestry systems in Chanthaburi, Eastern Thailand. The framework integrates environmental monitoring with machine learning to forecast vapor pressure deficit (VPD), a key indicator linking atmospheric conditions to plant water stress and microclimate dynamics. Hourly VPD observations were modeled using a Temporal Convolutional Network (TCN) with dilated causal convolutions to capture multi-scale temporal dependencies while preventing information leakage. The model generated one-hour-ahead predictions using a sliding window of lagged VPD inputs. Model performance was evaluated using sequential training, validation, and testing datasets, achieving stable predictive accuracy (test RMSE = 0.243 kPa; R2 = 0.820). To support operational decision-making, forecasted VPD values were translated into an interpretable zone-based classification representing optimal, warning, and critical atmospheric moisture-demand conditions. The results demonstrate that TCN-based one-hour-ahead VPD forecasting combined with zone-based interpretation can provide practical short-term guidance for irrigation management while supporting agroforestry habitat conditions in Indigenous agricultural territories.

Paper 6

From Multilingual Corpus to Tokens: A Practical Pipeline for Kabyle LLaMA 3 Adaptation

Cylia Messar, Elissa Tagzirt, Samia Saad-Bouzefrane, Badr Youakim, and Lydia Lazib

Abstracts (EN / Kabyle)

Low-resource languages remain largely underrepresented in modern AI, particularly in large language models, due to limited data and basic NLP tools. This work focuses on adapting LLaMA 3 to the low-resource Kabyle language rather than developing a dedicated monolingual model. It highlights the importance of rigorous adaptation based on a systematic framework that compensates for the lack of data by collecting and preserving heterogeneous sources while taking into account legal and ethical aspects. Adaptation is achieved primarily through a carefully designed tokenization strategy, an often overlooked but critical step for morphologically rich languages whose orthography and writing system differ from multilingual tokenizer standards. The framework is evaluated using quantitative metrics and principled design choices, demonstrating improved linguistic coverage for Kabyle and providing a reproducible methodology for integrating low-resource languages into open-source multilingual LLMs.

Tutlayin yesɛan cwiṭ n yiɣbula d ifecka umḍinen qqiment ur tetḥuza yara tigzi n tmacint (AI), ladɣa ayen yaɛnan timudmiwin n tutlayin timuqranin (LLM), imi tutlayin-a xuṣṣent isefka d wallalen NLP. Deg umagrad-a, nettmuqul i usnerni n LLaMA 3 s tutlayt taqbaylit, wala ad d-nesnulfu LLM amaynut. Leqdic-a yesskanay-d s uqader n usaḍuf amek anessali aseɣẓan LLM a yeddun xas mayella xuṣṣen isefka. Asemres yella-d s ttawil n useqzuzem d tarrayt iwulmen ladɣa i tutlayin timaṛkantiyin i yesɛan tira d tilɣa yemmxallafen d ilugan n tutlayin yesɛan iɣbula. Tiɣerɣart-a tettwasqerdec s iferdisen d tarrayin iṣeḥḥan. Dɣa, teskanay-d d akken tegla yakk s tutlayt taqbaylit. Leqdic-a yella-d d asurif amenzu i usemres n Teqbaylit s usekcem n LRL di LLM.

16:20 — Closing Session
18:00-21:00 — Museum (Extra Event)