Tools and services

The ELEXIS project provides cost-free access for academic institutions in the EU of the infrastructures developed by the project partners. The number of available resources will grow in the course of the project. There are no financial implications to the institutions for accessing them.

Graphic Guide to ELEXIS Dictionary Tools

If you need help regarding which tools to use for your dictionary, please refer to the following diagram.
Note that certain help material and tools are still under development.

Tools & services available

for Lexicographers

Terminologists, Linguists, Translators, Teachers

Sketch Engine logo

Sketch Engine

The Sketch Engine corpus query, corpus building and corpus management system allows users to build and work with 300+ text corpora in over 90 languages and 20 scripts. Sketch Engine contains a number of unique tools to analyse large corpora of up to 30 billion words. Each user can benefit from fully automated dictionary-building functionality.

The access to Sketch Engine is funded by the EU through the ELEXIS project between 2018 and 2022. The access is provided at no cost to academic institutions and ELEXIS observers and applies to non-commercial use only. Currently, more than 350 institutions have been using the tool.

Lexonomy logo

Lexonomy

Lexonomy is a cloud-based dictionary-writing and also online-dictionary-publishing system which is highly scalable to adapt to large dictionary projects as well as small lexicographic works such as editing and online publishing of domain-specific glossaries or terminology resources. Lexonomy already interacts with Sketch Engine and the aim of the project is to develop and expand this interaction further. Sketch Engine can push lexicographic data into Lexonomy to create automatically generated dictionary drafts and Lexonomy can pull data from Sketch Engine’s corpora during the entry editing process.

OneClick Dictionary

OneClick Dictionary (OCD) is a dictionary drafting module. It interconnects a corpus management system (e.g. SketchEngine, noSketch Engine) or even excel sheets with our dictionary writing and online dictionary publishing system Lexonomy and provides an automatically created dictionary draft (e.g. headwords,  wordforms, collocations, examples), to be post-edited in Lexonomy by the lexicographer.

OneClick Dictionary enables lexicographers to shift all lexicographers work and intellectual input into the post-editing phase instead of manually analyzing the input data before creating a dictionary draft.

Hence, the tool is not limited to professionals but also designed for spontaneous lexicography – small projects of lexicographic nature such as glossaries and domain-specific wordlists and dictionaries often prepared by teachers or other professionals without formal training in lexicography.
The source code for the OneClick Dictionary module is available on Github, additional information is available in the Deliverable 4.2:

Elexifier

Elexifier is a cloud-based dictionary conversion service. It uses advanced XML parsing and machine learning techniques to help you convert your PDF and XML dictionaries in a standardized machine-readable format. Users can upload their PDF and custom XML dictionaries to Elexifier, define mapping rules for XML transformation or create a machine learning training set for PDF conversion and download the transformed XML or PDF dictionary in a TEI-compliant file format based on the Elexis Data Model.

Tools & services available

for NLP researchers

natural language processing, machine learning, computational linguistics

Clusty

Clusty is an innovative algorithm designed to perform lexical-semantic analytics for NLP: sense clustering.
The team at the Linguistic Computing Laboratory of the Sapienza University of Rome investigated clustering approaches which allow to effectively and easily scale across languages whilst dropping the requirement of large amounts of data which is typically needed when employing neural networks.
Clusty’s results can be used for improving word sense disambiguation systems.

The demonstration of the efficacy of Clusty for performing one of the most challenging tasks in natural language processing, sense clustering, is presented in D3.1 (below).

For installation we provide the link to GitHub repository.

VerbAtlas

VerbAtlas is a novel large-scale manually-crafted semantic resource for wide-coverage, intelligible & scalable Semantic Role Labeling. The goal of VerbAtlas is to manually cluster WordNet synsets that share similar semantics into sets of semantically-coherent frames. The main features are:

  • 466 semantically-coherent frames using 26 cross-frame VerbNet-inspired semantic roles for their argument structure.
  • Available both for download and via RESTful API.
  • Full coverage of WordNet 3.0 verb synsets (13,000+).
  • Complete linkage to BabelNet 4.0, which supports 280+ languages (new version to come later this year!).
  • Manual mapping to PropBank of all CoNLL-2009 and CoNLL-2012 dataset occurrences (5000+ mappings).
  • Selectional preferences: the superconcept most probably associated with a semantic role in a frame (e.g. food for the patient role of the EAT frame).
  • Default/shadow arguments: arguments logically implied or already incorporated into a verb.
  • Implicit arguments: arguments that are implicit in the argument structure of a verb.

SyntagNet

SyntagNet is a manually-curated large-scale lexical-semantic combination database which associates pairs of concepts with pairs of co-occurring words. The goal of SyntagNet is to capture sense distinctions evoked by syntagmatic relations, hence providing information which complements the essentially paradigmatic knowledge shared by currently available Lexical Knowledge Bases such as WordNet. Its main features are:

  • Wide coverage, with 78,000 noun-verb and noun-noun lexical combinations extracted from the English Wikipedia and the British National Corpus.
  • High-qualityfully manual disambiguation for all of the lexical combinations, according to the WordNet 3.0 sense inventory.
  • A resulting Lexical Knowledge Base made up of 88,019 semantic combinations linking 20,626 WordNet 3.0 unique synsets with a relation edge.
  • user-friendly web interface for looking up terms and their lexical-semantic combinations, with complete linkage to BabelNet 4.0.

NAISC

‘NAISC’ means ‘links’ in Irish and is pronounced ‘nashk’.

NAISC 1.0 is a tool for linking datasets and was created by the SFI Insight Centre for Data Analytics and the ELEXIS project. NAISC serves as a system for aligning RDF datasets: It takes as input 2 RDF documents (referred to as ‘left’ and ‘right) and outputs an alignment (set of RDF triples) between these two documents. NAISC typically relies on a configuration, which is a JSON document.

For installation and usage please see our short intro video.

Services available

for everybody

Elexifinder

The search tool ELEXIFINDER is dedicated to helping lexicographers and other researchers find scientific output in lexicography and related fields. It enables users to search through papers and videos, using concepts, i.e. words or set of words with a Wikipedia page, and various other conditions, e.g. source (conference etc.), author, language etc. Each paper/video is linked to its page where the users can download or view it.

News feed

Lexicographic news feed is an ELEXIS service that uses the Event Registry API to extract latest news articles identified to be related to lexicography. News articles are extracted from 30,000 news sources, and over 35 languages are currently supported.

More resources will be provided during the lifetime of the project and will be listed here as soon as they are available. Follow us and subscribe to our newsletter to get notified.