The ODIL project aims at building a French corpus of spontaneous speech representative of a large variety of linguistic practices. 150 hours of spoken French in the Centre-Val de Loire Region will be recorded in order to reach a critical mass of 10 million transcribed words and to obtain the coverage of new registers based on an analysis of linguistic variations.
In this general framework, the LIFAT, LIFO and LLL laboratories will develop in a specific subproject (Temporal@ODIL) the largest corpus of spoken French annotated in terms of temporal relations. The annotation is based on an adaptation of the ISO TimeML standard that consists in grounding the annotation on a treebank and not on raw text, in order to ease the manual annotation but also to reflect more accurately the complexity of description of temporal eventualities (events, states...). Our annotation scheme maintains however an operational correspondance with the TimeML standard.
The outcomes of the project will be the free distribution of the Temporal@ODIL corpus (under a Creative Commons licence), but also of an open source generic tool (Contemplata) for treebank annotation.
L'action se réalisera sur les années 2016 à 2020. Elle est financée dans le cadre de l'APR-IA de la Région Centre Val de Loire