The PARSEME-FR project aims at improving linguistic representativeness, precision, robustness and computational efficiency of Natural Language Processing (NLP) applications, notably parsing of French. The project focuses on a major bottleneck of these applications: MultiWord Expressions (MWEs), that is, groups of words that must be treated as units at some level of linguistic processing, such as hot dog, hard disk, kick the bucket, United Nations and pay attention.
Despite recent advances, the state of the art concerning MWE representation and processing is largely unsatisfactory. Current research concentrates either on creating MWE lexicons or on the automatic recognition of MWEs in running text. Only few approaches address the links between MWEs and a comprehensive linguistic analysis of text. These approaches confirm that a proper MWE treatment increases both linguistic precision and robustness of NLP systems. They are, however, mostly limited to specific MWE classes and syntactic parsing. This unsatisfactory state is mainly due to the lack of linguistic knowledge bases encoding MWE information, that could be fed into linguistic analyzers. In French, such resources exist, but are incomplete in terms of syntactic and semantic representation, coverage and adequacy to NLP tools.
We propose to bridge the gap between linguistic precision and computational efficiency in NLP applications by investigating the syntactic and semantic representation of MWEs in language resources, the integration of MWE analysis in syntactic parsing and its links to semantic processing. Expected deliverables include enhanced language resources (lexicons, grammars and annotated corpora), MWE-aware statistical and symbolic parsers and tools linking predicted MWEs to knowledge bases. This proposal is a spin-off of PARSEME, an European IC1207 COST action on the same topic.
For more information, see the project website.
Project dates : 2016-2020
Consulting: Agnès Tutin, Université Grenoble Alpes