#elexis_es – RAE

Visiting grants contact:

José Luis Sancho

sancho@rae.es

  • Felipe IV, 4, 28014 Madrid, Spain
,

Methods for detection and evaluation of neologisms for the Croatian language – Denis Gaščić

Denis Gaščić is an ambitious computer & language enthusiast with a strong interest in Neologisms. He applied for a research grant to visit the Institute of the Estonian Language in order to be introduced to tools and methods used for dictionary compilation and automatic detection of neologisms. 
,

New travelgrant reports available!

We published four new reports of various research grant holders describing their research visit and the tasks carried out.
,

A Study on legal texts & terminological databases in Dutch – Dóra Mária Tamás

Dóra Mária Tamás Is an experienced researcher focusing on legal terminology. Her background in legal translation and interpretation brought her to the lexicographic and terminological work in her home country of Hungary.

Find out more about ELEXIS visiting grants and former winning projects:

Real Academia Española (RAE)

provides access to the following resources, tools, infrastructure or research facilities:

Advanced Search Interface to the DLE 23

The advanced DLE 23 interface aims at providing an easy way to perform onomasiological, semasiological and paradigmatic searches to the Diccionario de la lengua española (DLE). The combination of textual and faceted searches allows for both intentional and accidental discovery and exploration of lexical information.

This advanced interface allows users to start a query by typing one or more terms into a search box and to progressively narrow the result set by selecting facet values or textualblocks of the entries where the information is to be queried.
Alternatively, facet values alone can be selected to obtain a list of definitions matching the selected criteria. Either way, the usual lemma-based access to dictionaries can be overcome.
Facet values are an orthogonal set of categories representing the metainformation conveyed by explicit (abbreviations, etymology, PoS, etc.) and implicit (subject, language families, etc.) tagging.
The textual indexing is aware of structural divisions of entries and results can be restricted accordingly.
The facet system or the textual indexing, as well other other aspects of the interface could be extended in several directions.
Facet values are represented in RDF and the search engine is implemented on top of SWI Prolog, which provides core packages for RDF storage and querying and XML indexing.
It also gives support to HTTP and JSON to create a RESTful web service to access the dictionary.
The Advanced DLE is available within the Enclave Platform.

CORPORA

  • CORPES (Corpus del español del siglo XXI):

It is the current reference corpus for Spanish language.

It is a continuously growing corpus that by 2016 contained 237678 texts and 225 million words from different geographical areas.
CORPES contains written and spoken material produced since 2000 and has a rich variety of text types, genres and topics.

The CORPES has been morphosyntactically annotated and lemmatised.

The public access to this corpus can be found here.

  • CREA (Corpus de referencia del español actual):

It is a morphosyntactically annotated and lemmatised balanced reference corpus comprising a wide variety of written texts and spoken transcriptions produced in all the Spanish-speaking countries from 1975 to 2004.

This corpus can be accessed at here or here.

  • CORDE (Corpus diacrónico del español):

It is a diachronic corpus with written texts ranging from the origins of the Spanish language to 1974.

It contains 250 millions of written words from different genres, types and geographical origins. The CORDE has been morphosyntactically annotated and lemmatised.

The public textual version of this corpus can be accessed here.

  • CNDHE (Corpus del nuevo diccionario histórico):

It is the corpus used for the New Historical Dictionary of the Spanish Language (NDHE).

It has more than 350 million words, many of them extracted from the CREA and the CORDE, with texts from the 12th century to the year 2000.

This corpus can be accessed here.

  • DRAE 23 Access Log:

The DLE 23 online receives on average sixty million lookups per month from both the web and mobile devices apps.

Access log records are preprocessed and stored in a noSQL database to extract information or analyse tendencies.

In addition to the information provided by the web server, processed log records contain information on search term(s), GeoIP (country, city, coordinates, etc.), whether search terms are present in the DLE or not, corresponding lemma(s) of the searched terms, etc.

These data provide very useful hints on lexical use, evolution or sociologically motivated lexical trends and has an interface that can be accessed within the Enclave Platform.

  • DLE 23:

Rooted in the first dictionary ever published by RAE, the DLE is edited periodically since 1780.

It describes the Spanish general vocabulary while also registering local, terminological or obsolete specificities.
It is conceived as a decoding (semasiological) tool for native speakers and is collectively updated by all national academies from Spanish-speaking countries.

The last version to date (23rd edition, 2014) contains more than 93.000 entries, 26.000 multiword expressions and 195.000 senses and can be accessed here.

On top of the semantic information characteristic to dictionaries, it provides hints on regional, obsolete or classic, register or domain specific uses. Etymology, variants, spelling or morphology directions are also given when appropriate. More than 16.000 senses display examples show the behaviour of the word in context.
It is stored in a relational database with XML, HTML and printable exporting capabilities.

A friendly, tailor made Dictionary Writing System works on top of the database to ease lexicographic work.
The DLE is freely available online since 2001.
Lately it receives an average of sixty million lookups per month.

The interface provides several linguistically motivated search facilities such as term autocompletion, inflected, derived or affixed forms lookup or ortophonografic neutralisation.

In addition, definitions and examples have been lemmatised to provide intratextual navigation.