Tools and services

The ELEXIS project provides cost-free access to tools and infrastructures developed by the project partners not only for academic institutions in the EU, but also for researchers, teachers and scholars in Lexicography, Linguistics, Terminology, Natural Language Processing, etc., as well as any other entities interested in the tools & services provided. The number of available resources will grow in the course of the project. There are no financial implications to the institutions/researchers/scholars/teachers for accessing them.

Graphic Guide to ELEXIS Dictionary Tools

If you need help regarding which tools to use for your dictionary, please refer to the following diagram.
Note that certain help material and tools are still under development.

Tools & services available

for Lexicographers

Terminologists, Linguists, Translators, Teachers

Sketch Engine logo

Sketch Engine

The Sketch Engine corpus query, corpus building and corpus management system allows users to build and work with 550+ text corpora in over 90 languages and 30 scripts. Sketch Engine contains a number of unique tools to analyse large corpora of up to 60 billion words. Each user can benefit from fully automated dictionary-building functionality.

The access to Sketch Engine is funded by the EU through the ELEXIS project between 2018 and 2022. The access is provided at no cost to academic institutions and ELEXIS observers and applies to non-commercial use only. Currently, more than 450 institutions have been using the tool.

Access after ELEXIS

The ELEXIS funding of access to Sketch Engine terminates on 31 March 2022 for academic users from the EU.
Contact your institution about the future of your access.

Lexonomy

Lexonomy is a cloud-based dictionary-writing and also online-dictionary-publishing system which is highly scalable to adapt to large dictionary projects as well as small lexicographic works such as editing and online publishing of domain-specific glossaries or terminology resources. Lexonomy already interacts with Sketch Engine and the aim of the project is to develop and expand this interaction further. Sketch Engine can push lexicographic data into Lexonomy to create automatically generated dictionary drafts and Lexonomy can pull data from Sketch Engine’s corpora during the entry editing process.

OneClick Dictionary

OneClick Dictionary (OCD) is a dictionary drafting module. It interconnects a corpus management system (e.g. SketchEngine, noSketch Engine) or even excel sheets with our dictionary writing and online dictionary publishing system Lexonomy and provides an automatically created dictionary draft (e.g. headwords,  wordforms, collocations, examples), to be post-edited in Lexonomy by the lexicographer.

OneClick Dictionary enables lexicographers to shift all lexicographers work and intellectual input into the post-editing phase instead of manually analyzing the input data before creating a dictionary draft.

Hence, the tool is not limited to professionals but also designed for spontaneous lexicography – small projects of lexicographic nature such as glossaries and domain-specific wordlists and dictionaries often prepared by teachers or other professionals without formal training in lexicography.
The source code for the OneClick Dictionary module is available on Github, additional information is available in the Deliverable 4.2:

Elexifier

Elexifier is a cloud-based dictionary conversion service. It uses advanced XML parsing and machine learning techniques to help you convert your PDF and XML dictionaries in a standardized machine-readable format. Users can upload their PDF and custom XML dictionaries to Elexifier, define mapping rules for XML transformation or create a machine learning training set for PDF conversion and download the transformed XML or PDF dictionary in a TEI-compliant file format based on the Elexis Data Model.

EDiE: ELEXIS Dictionary Evaluator

This tool is evaluating the availability and usability of linked lexical resources and dictionaries published, using the ELEXIS dictionary API, which are accessible when using the ELEXIS infrastructure.
It allows users to assess different aspects of dictionaries based on their metadata and entries. Furthermore, aggregated metrics over dictionaries of interests/contexts let users compare different dictionaries for their specific use cases.

Tools & services available

for NLP researchers

natural language processing, machine learning, computational linguistics

Clusty

Clusty is an innovative algorithm designed to perform lexical-semantic analytics for NLP: sense clustering.
The team at the Linguistic Computing Laboratory of the Sapienza University of Rome investigated clustering approaches which allow to effectively and easily scale across languages whilst dropping the requirement of large amounts of data which is typically needed when employing neural networks.
Clusty’s results can be used for improving word sense disambiguation systems.

The demonstration of the efficacy of Clusty for performing one of the most challenging tasks in natural language processing, sense clustering, is presented in D3.1 (below).

For installation we provide the link to GitHub repository.

VerbAtlas

VerbAtlas is a novel large-scale manually-crafted semantic resource for wide-coverage, intelligible & scalable Semantic Role Labeling. The goal of VerbAtlas is to manually cluster WordNet synsets that share similar semantics into sets of semantically-coherent frames. The main features are:

  • 466 semantically-coherent frames using 26 cross-frame VerbNet-inspired semantic roles for their argument structure.
  • Available both for download and via RESTful API.
  • Full coverage of WordNet 3.0 verb synsets (13,000+).
  • Complete linkage to BabelNet 4.0, which supports 280+ languages (new version to come later this year!).
  • Manual mapping to PropBank of all CoNLL-2009 and CoNLL-2012 dataset occurrences (5000+ mappings).
  • Selectional preferences: the superconcept most probably associated with a semantic role in a frame (e.g. food for the patient role of the EAT frame).
  • Default/shadow arguments: arguments logically implied or already incorporated into a verb.
  • Implicit arguments: arguments that are implicit in the argument structure of a verb.

SyntagNet

SyntagNet is a manually-curated large-scale lexical-semantic combination database which associates pairs of concepts with pairs of co-occurring words. The goal of SyntagNet is to capture sense distinctions evoked by syntagmatic relations, hence providing information which complements the essentially paradigmatic knowledge shared by currently available Lexical Knowledge Bases such as WordNet. Its main features are:

  • Wide coverage, with 78,000 noun-verb and noun-noun lexical combinations extracted from the English Wikipedia and the British National Corpus.
  • High-qualityfully manual disambiguation for all of the lexical combinations, according to the WordNet 3.0 sense inventory.
  • A resulting Lexical Knowledge Base made up of 88,019 semantic combinations linking 20,626 WordNet 3.0 unique synsets with a relation edge.
  • user-friendly web interface for looking up terms and their lexical-semantic combinations, with complete linkage to BabelNet 4.0.

NAISC

‘NAISC’ means ‘links’ in Irish and is pronounced ‘nashk’.

NAISC 1.0 is a tool for linking datasets and was created by the SFI Insight Centre for Data Analytics and the ELEXIS project. NAISC serves as a system for aligning RDF datasets: It takes as input 2 RDF documents (referred to as ‘left’ and ‘right) and outputs an alignment (set of RDF triples) between these two documents. NAISC typically relies on a configuration, which is a JSON document.

For installation and usage please see our short intro video.

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

MultiMirror is a cross-lingual sense projection approach for multilingual WSD based on a novel discriminative word alignment model, capable of jointly aligning all source and target tokens with each other, surpassing its competitors across several language combinations. The sense-tagged datasets it produces lead a standard WSD classifier to achieve state-of-the-art performances on established benchmarks in French, German, Italian, Spanish and Japanese.

MultiMirror was developed by the Sapienza Natural Language Processing Group (Sapienza NLP) and the ELEXIS project.

BabelNet Linker

The BabelNet Linker is a linking web service which produces a mapping between two dictionary definitions in a cross-lingual scenario.

The BabelNet-linker API allows a dictionary to be linked to BabelNet at definition level. Specifically, this API allows a definition in any language to be mapped to a semantically-equivalent English definition in BabelNet by relying on state-of-the-art Transformer-based architectures. Importantly, this API will make it possible to map the dictionaries made available within the ELEXIS Consortium at definition level by pivoting through BabelNet.

BabelNet Linker was developed by the Sapienza Natural Language Processing Group (Sapienza NLP) and the ELEXIS project.

Services available

for everybody

Elexifinder

The search tool ELEXIFINDER is dedicated to helping lexicographers and other researchers find scientific output in lexicography and related fields. It enables users to search through papers and videos, using concepts, i.e. words or set of words with a Wikipedia page, and various other conditions, e.g. source (conference etc.), author, language etc. Each paper/video is linked to its page where the users can download or view it.

News feed

Lexicographic news feed is an ELEXIS service that uses the Event Registry API to extract latest news articles identified to be related to lexicography. News articles are extracted from 30,000 news sources, and over 35 languages are currently supported.

Cross The Word

CrossTheWord is a crossword puzzle game for Android with small and big crossword puzzles, available for free download via the GooglePlay Store.

Features:
– Hundreds of automatically generated crosswords (in constant growth!)
– Power-ups to boost your game experience and help you solve the unsolvable!
– A dynamic tap and swipe interface to surf through crosswords
– A subgame of lexical substitution to earn extra points!

Crosswords are currently available in English only, but more languages are on their way!

More resources will be provided during the lifetime of the project and will be listed here as soon as they are available. Follow us and subscribe to our newsletter to get notified.