Multilingual Word Sense Disambiguation under Resource Constraint

Attention, la plate-forme sera en maintenance toute la journée du 23 avril 2024. De plus, l'outil "Habillage" des vidéos n'est pas encore totalement fonctionnel. Merci de ne pas l'utiliser pour le moment.

Recherche

Laboratoires.Ecoles, Collège et Formation doctorales

Stratégie. Séminaires, colloques. Sciences et société

Réseaux et pôles d'excellence. Investissements d'avenir (Labex)

Dr. Pushpak Bhattacharyya / LIG

Word Sense Disambiguation (WSD) is a fundamental problem in Natural Language Processing (NLP). Amongst various approaches to WSD, it is the supervised machine learning (ML) based approach that is the dominant paradigm today. However, ML based techniques need significant amount of resource in terms of sense annotated corpora which takes time, energy and manpower to create. Not all languages have this resource, and many of the languages cannot afford it.

In the current presentation, we discuss ways of making use of whatever resource is created for WSD. First we describe a novel scoring function and an iterative algorithm based on this function to do WSD. This function separates the influence of the annotated corpus (corpus parameters) from the influence of wordnet (wordnet parameters), in deciding the sense. Next we describe how the corpus of one language can help WSD of another language, i.e., LANGUAGE ADAPTATION. This is presented in three setting of "complete", "some" and "no" annotation. From this we move on to DOMAIN ADAPTATION where the notion of active learning and injection are pursued to do WSD in a domain with little or no annotated corpora. The extensive evaluation and good accuracy figures lend credence to the viability of our approach which points to the possibility of expanding from one language-domain combination to all language-domain combinations for WSD, i.e., multilingual general domain WSD, a long standing dream of NLP.

The talk is presented in a multilingual setting of Indian languages. There are 22 official languages in India with strong requirements of machine translation and cross lingual search. Our languages of focus in this talk are Hindi and Marathi along with English and the domains of focus are Tourism and Health which are important to India.

The presentation is based on work done with PhD and Masters students Mitesh, Salil, Saurabh, Anup, Sapan and Piyush, published ACL10, COLING10, EMNLP09 and GWC10.

Mots clés : lig

Ajouté par : Gricad Vidéos
Mis à jour le : 1 janvier 2021 00:00
Chaîne :
- Recherche
Type : Conférences
Langue principale : Français

Les commentaires ont été désactivés pour cette vidéo.

Prendre des notes

Il n’y a pas de note disponible pour vous pour cette vidéo.

Connectez-vous pour en créer une nouvelle.

Disciplines

Types

Mots clés

perform 304 fle 290 sciences 290 techniques 290 filipé 284 fos 282 lig 182 cpp 178 mathematiques 165 soutenance 151 gricad 146 prepa inp 145 prepa des inp 139 thèse 135 innovation 114 sante 106 pedagogie 97 2a 87 dgd bapso 86 recherche 85

Recherche

Description de la chaîne

Multilingual Word Sense Disambiguation under Resource Constraint

Informations