The joint MWE-LEX workshop addresses two domains – multiword expressions and (electronic) lexicons – with partly overlapping communities and research interests, but divergent practices and terminologies.
Multiword expressions (MWEs) are word combinations, such as by and large, hot dog, pay a visit or pull one’s leg, which exhibit lexical, syntactic, semantic, pragmatic or statistical idiosyncrasies. MWEs encompass closely related linguistic objects: idioms, compounds, light-verb constructions, rhetorical figures, institutionalised phrases and collocations. Because of their unpredictable behavior, notably their non-compositional semantics, MWEs pose problems in linguistic modelling (e.g. treebank annotation, grammar engineering), NLP pipelines (notably when orchestrated with parsing), and end-user applications (e.g. information extraction). Modelling and processing of MWEs has been the topic of the MWE workshop, organised over the past years by the MWE section of SIGLEX.
Because MWE-hood is a largely lexical phenomenon, appropriately built electronic MWE lexicons turn out to be quite important for NLP. Their conception opens up, among others, the issues of lemmatization and of standardised representation of morphological, syntactic and semantic properties of MWEs. Large standardised multilingual, possibly interconnected, NLP-oriented MWE lexicons prove indispensable for NLP tasks such as MWE identification, due to its critical sensitivity to unseen data. But the development of such lexicons is challenging and calls for tools which would leverage, on the one hand, MWEs encoded in pre-existing NLP-unaware lexicons and, on the other hand, automatic MWE discovery in large non-annotated corpora.