NLP meets Lexicography: Sense Categorization for Dictionaries

Luis Espinosa Anke is an NLP expert and is currently working on a software to support lexicographers’ work: He thinks ‘outside the box’, explores and exploits new ways for creating and compiling dictionaries.

© Luis Espinosa Anke, 2019

How did you learn about the ELEXIS travel grants?

I was a recipient of a travel grant (for Short Term Scientific Missions) in the ENEL (European Network of Electronic Lexicography) COST action .
I spent 3 months at the Sapienza University of Rome, hosted by Roberto Navigli, and ever since, I attended several meetings where I met many of the people who are involved in the organization of ELEXIS. I decided to apply for the second time, firstly because it is a great honor to be able to work at the Spanish Royal Academy as a Spanish computational linguist and get a sense of their invaluable lexicographic data, and secondly because my research interests in NLP (Natural Language Processing) are mostly about the meaning of words, which is very important for what we find in dictionaries.

Which hosting institution did you apply to?

I applied to the Royal Spanish Academy because, as I said, having the opportunity to look into RAE’s corpora and dictionaries is similar to liking football and getting to know your favourite player in person. Moreover, I have often spoken with my host there (Jordi Porta), and I am sure I will learn a lot from his expertise and his enthusiasm. In fact, we are both excited about this stay and intend to deepen our relationship in terms of working together to do great research in the area of computational lexicography and NLP (Natural Language Processing).  

Find out more about ELEXIS visiting grants and former winning projects:

Having the opportunity to look into RAE’s corpora and dictionaries is similar to liking football and getting to know your favourite player in person.

What is your project about?

I am looking into developing a piece of software based on NLP in order to facilitate lexicographers to write dictionaries:
The main idea is to be able to group semantically similar senses, so that when lexicographers need to write a new or updated definition of a term, they have access to previously written definitions about similar concepts, enabling them to quickly access recurrent or prototypical expressions used in the definitions in that particular domain.

 

What is your background that brought you up to this point?

I have a background in English Philology and in my previous life I was a language teacher.
Now I am a lecturer in the School of Computer Science and Informatics at Cardiff University, where I conduct research on computational semantics (word, phrase and sentence-level), and teach modules related to data analysis, natural language processing and databases. I strongly believe that for building better, social and more accurate AI (Artificial Intelligence) models, we should keep a close eye on the expert knowledge that has been developed over the years, and use that to refine the organic datasets we can easily acquire from social media and the web.

For building better social and more accurate AI models, we should keep a close eye on already established expert knowledge to refine the organic datasets we can easily acquire from social media and the web.

Where does your interest in languages/lexicography come from and what keeps you motivated?

My PhD focused on the development of NLP models which, in addition to plain text corpora, had access to expert knowledge, like Wikipedia or Wikidata, but also lexicographic resources like WordNet or knowledge bases like MusicBrainz or BabelNet.
I am excited about a future in AI where we can give an answer to the decisions made by algorithms based on data, and I believe a direction to achieve this could be the interaction between lexicographic information and big data. I also believe that, from a historical linguistics perspective, developing tools for maintaining and extending the number or quality of current lexicographic resources is something that can impact the development of low-resourced languages dramatically.

Profile: Luis Espinosa Anke
Travel Grant Call 3
Period of stay 9.12.2019 – 20.12.2019
Project title

Sense Categorization in the Diccionario de la Real Academia Española with Distributional and Lexicographic Supervision

Home institution

Cardiff University

#elexis_uk
Hosting institution Real Academia Española (RAE, Spain) #elexis_es