#elexis_dk – DSL

Visiting grants contact:

Sanni Nimb

Sanni Nimb

sn@dsl.dk 

  • Christians Brygge 1 1219 København K, Denmark

Find out more about ELEXIS visiting grants and former winning projects:

,

Methods for detection and evaluation of neologisms for the Croatian language – Denis Gaščić

Denis Gaščić is an ambitious computer & language enthusiast with a strong interest in Neologisms. He applied for a research grant to visit the Institute of the Estonian Language in order to be introduced to tools and methods used for dictionary compilation and automatic detection of neologisms. 
,

New travelgrant reports available!

We published four new reports of various research grant holders describing their research visit and the tasks carried out.
,

A Study on legal texts & terminological databases in Dutch – Dóra Mária Tamás

Dóra Mária Tamás Is an experienced researcher focusing on legal terminology. Her background in legal translation and interpretation brought her to the lexicographic and terminological work in her home country of Hungary.
,

Integration of lexicographic data: the diachronic plane – Dorota Mika

Dorota Mika is a passionate aspiring lexicographer, focusing on the etymology of the Polish language. She worked on a variety of lexicographic projects in the past including the electronic Conceptual Dictionary of Old Polish.In order to generate a clear workflow on how to integrate and merge diachronic lexicographic data from electronic dictionaries, she applied to visit the Instituut voor the Nederlandse Taal to benefit from its long expertise.
,

How is the corpus base influencing collocation sets in dictionaries?

Carolin Müller-Spitzer is an experienced linguist and lexicographer, working on the influence of the corpus base on collocation sets in dictionaries. She applied for an ELEXIS research grant to visit the Instituut voor de Nederlandse Taal in Leiden (the Netherlands) to conduct a contrastive German-Dutch study on the influence of the corpus base on collocation sets in dictionaries.
Ana Vujasić
,

Adapting dictionary writing systems and other platforms to online dictionaries of idioms – Jelena Parizoska

Being an avid dictionary user herself, Jelena Parizoska wrote her PhD on the ‘variability of verbal idioms in English and Croatian within the cognitive linguistic framework.’ To learn how to incorporate certain features into the Online Dictionary of Croatian Idioms, she applied for a research grant to visit the Jožef Stefan Institute in Slovenia.
,

Annika Simonsen: Report on ELEXIS research visit out now

Annika Simonsen was granted to visit ELEXIS infrastructures in Denmark, one of the most important research centres for Natural Language Processing (NLP) in the North, specialising in language technology for the West Nordic languages - her perfect matches to push her grant winning project Ravnur, a Faroese Speech Recognizer, to the next level.

Det Danske Sprog- og Litteraturselskab (DSL)

Can only be visited in combination with a stay at University of Copenhagen (UCPH)!

Det Danske Sprog- og Litteraturselskab provides access to the following resources, tools, infrastructure and research facilities:

Tools

Tools for internal use at DSL, to be used for investigations on Danish texts and dictionaries:

  • Corpus tool, statistic tool (Word2Vec model).
  • Access to the xml editing system ‘iLEX’ and the Danish resources which are edited in this system at DSL.

Resources (dictionaries, thesauri, lexicons)

A dictionary, a thesaurus and two computational lexicons for modern Danish sharing sense id numbers.
The four resources are described in detail below.

The first two are only available for research at DSL.

1) Traditional dictionary, modern Danish:
The Danish Dictionary, DDO (ongoing project, DSL only)

The DDO dictionary (Den Danske Ordbog) is a comprehensive monolongual dictionary of contemporary Danish, edited at a scholarly basis.
It was originally published in print in 6 volumes in the years 2003-2005, but is nowadays published online at ordnet.dk/ddo (and also as app ”Den Danske Ordbog”).
Currently it involves seven editors/computational linguists and is being extended with the full description of 10,000 lemmas (2015-2018).
The dictionary is edited in xml in a custom-designed structure and provides information on form, meaning and use of words belonging to the general vocabulary of Danish.
The dictionary-making process is based on corpus inspection, and the development of the corpus as well as the corpus tools is part of the project. The dictionary covers 81,000 lemmas and 13,000 fixed expressions which are described by 120,000 sense definitions with identified genus proximum, and furthermore 14,000 lemmas without definition.
The sense inventory, the fixed expressions, the collocations and the valency patterns have been used to compile a series of other lexical resources at DSL: the Danish Thesaurus and, together with CST/the University of Copenhagen, the two computational lexicons the Danish Wordnet DanNet and the Danish frame lexicon.
All resources share sense id numbers with the dictionary.

We aim at linking the DDO dictionary at either lemma and/or sense level to elder Danish dictionaries published by DSL, especially Dictionary of the Danish Language (ODS) and Old Danish Dictionary (GO).
Based on the shared id numbers we recently identified and integrated relevant data on related words from the Danish Thesaurus into the dictionary entries. Both the linking from DDO to the other resources and its detailed xml structure which allows for the identification of very specific
types of lexical information open up for many types of data combinations and lexical studies.

Published: online 2008 (ordnet.dk/ddo), in print 2003-2005
Language: modern Danish (1955-present) / general language
Type: monolingual, corpus-based, traditional dictionary
Data: xml-structure
Dictionary-making process: corpus, corpus tools, lemma selection, editing, digital publishing
Entries: 95.000 (of which 14,000 have no sense definition yet)
Fixed expressions: 13,000
Sense definitions: 120,000
Linked at sense level: Yes. The Danish Thesaurus, Danish WordNet, Danish Frame Lexicon

Visiting period: all year except 20 Dec. – 2. January, 1. July- 15. August
Currently involved editors: 7 lexicographers/computational linguists
Contact:
Lars Trap-Jensen
Henrik Lorentzen
Sanni Nimb
Thomas Troelsgård

2) Thesaurus, modern Danish:
The Danish Thesaurus “Den Danske Begrebsordbog”
(Ongoing project, DSL only)

The Danish Thesaurus was published in print in 2015.
It is based on the lemmas and fixed expressions in the DDO dictionary, but also on the many collocations described in the dictionary.
It organises most of the word senses described in DDO in 22 chapters and 888 sections and presents the words and expressions in semantic groups with keywords.
In each group the words are presented in semantic order. The underlying xml-document contains formal information on the semantic groups allowing for the identification of e.g. persons, acts etc.
The thesaurus, which is edited in xml, is currently being extended with the DDO senses which are not yet integrated.
We hope to be able to publish an online version of the book in the next years to come, depending on funding.
The most relevant parts of the thesaurus sections have already been automatically identified and integrated in the DDO online dictionary in the form of related words for many senses, based on the shared id numbers in the two resources. Its organizing of the Danish vocabulary into annotated semantic groups was used to compile the Danish FrameNet lexicon, and the two resources share sense id numbers with DDO.
The linking from the thesaurus data to the other resources opens up for many types of data combinations and lexical studies.

Published: in print 2015
Language: modern Danish (1955-present) / general language
Type: monolingual, corpus-based, traditional dictionary
Data: xml-structure
Dictionary-making process: lemma selection, editing, semantic annotation
Words and expressions: 204,000 (119,000 unique)
Linked at sense level: Yes. The Danish Dictionary (DDO), Danish WordNet DanNet, Danish
FrameNet Lexicon

Visiting period: all year except 20 Dec. – 2. January, 1. July- 15. August
Currently involved editors: 2 lexicographers/computational linguists
Contact:
Sanni Nimb
Thomas Troelsgård

3) Computational lexicon, modern Danish:
The Danish FrameNet Lexicon
(also available for research outside DSL)

The Danish Framenet Lexicon was compiled 2016-2017 and describes 12,142 Danish lemmas with one or more frame values from the Berkeley FrameNet model.
Furthermore it gives information on the type of phrases and multiword units that would typically evoke the different frames of a lemma.
It was compiled in 2016-2017 in collaboration with the University of Copenhagen on the basis of the vocabulary from the Danish Thesaurus, and aims at supplying semantic annotators of Danish texts with a reduced set of frame values, typically 3-4 per verb and 1-2 per verbal noun which are most likely to be relevant out of more than 1,000 values in Berkeley FrameNet when the text is to be annotated.
671 different frames were used to describe the lemmas which represents 80 % of the DDO dictionary. It is available as a comma-separated file at github.

The lexicon allows the study of different semantic groups of especially Danish verbs (and deverbal nouns), e.g. Danish motion verbs.

Published: 2017 
Language: modern Danish (1955-present) / general language
Type: monolingual, corpus-based, computational lexicon
Data: comma-separated file (spreadsheet)
Dictionary-making process: linked data (thesaurus/dictionary), assignment of English frames
Entries: 12,142
Verbs: 5,300, Nouns: 6,490
Linked at sense level: Yes. The Danish Dictionary, The Danish Thesaurus, the Danish WordNet
Visiting period: all year except 20 Dec. – 2. January, 1. July- 15. August
Currently involved editors: 1 lexicographer/computational linguist
Contact: Sanni Nimb

4) Computational lexicon, modern Danish:
The Danish WordNet DanNet

(Ongoing project at the University of Copenhagen, also available for research outside DSL)

DanNet was compiled from 2004-2013 together with the University of Copenhagen. The compilation was based on the DDO lexical data and its sense inventory, including the sense definitions and especially the fact that the genus proximum of each sense is tagged in the xml structure of the dictionary.
The WordNet contains 65,000 synsets which are provided with an ontological type and a link to the closest hypernym. The synset members are linked to DDO senses. 5,000 Danish synsets are furthermore linked to the equivalent English synset in Princeton WordNet (by the relation eq_has_synonym), also labelled Princeton Core.

Published: 2013 (more info)
Language: modern Danish (1955-present) / general language
Type: monolingual, computational lexicon
Synsets: 65,000
Linked at synset member level: Yes. The Danish Dictionary, The Danish Thesaurus, the Danish WordNet
Visiting period: all year except 20 Dec. – 2. January, 1. July- 15. August
Contact:
Sanni Nimb
Nicolai H. Sørensen,

5) Other resources, also available for research outside DSL

– A number of Danish corpora and word lists
– Manually POS-tagged corpus of Danish (PAROLE)
– SemDaX: manually tagged semantic corpus.

Research facilities / infrastructure

DSL provides access to two research infrastructures, one with the focus on dictionary editing and one with the focus on digitalisation of historic dictionaries:

1) The process of editing and publishing corpus-based dictionaries
on a scholarly basis and the linking of lexical resources:

DSL is currently editing a modern Danish dictionary which is published online.
The dictionary-making process includes all steps, from corpus creation, the creation of corpus tools and statistic investigations, the lemma selection based on corpus, the editing process based on corpusinvestigations, and finally the digital publishing.
The sense inventory of the dictionary constitutes the skeleton to which all other DSL resources of modern Danish are linked by shared id numbers: the Danish thesaurus, the Danish WordNet and the Danish FrameNet lexicon.
The thesaurus is currently being extended with senses from the dictionary in order to cover the full sense inventory. The linked data is already used for different purposes, one of which is the integration of related words from the thesaurus into the dictionary sense desriptions.
In the ELEXIS project, we carry out research on how to link dictionaries of elder Danish to the modern resources.

2) Digitisation and online publishing of historic dictionaries:

Since 2008 DSL has specialized in digitising and online publishing historic Danish dictionaries, as well as dictionaries that have been published by DSL only in print.
In some cases the online publishing includes an app version.
Two computational linguists are involved in this work.
The first (and most comprehensive) online published Danish dictionary is still being developed with even more fine-grained tags based on the typography of the printed version.
Also a new Swedish-Danish dictionary published in 2010 is to be published online in 2018 (and we plan to publish the Danish Thesaurus in the years to come, probably before 2021, depending on funding).

The dictionaries are:

– Dictionary of the Danish Language (ODS)
– Meyer’s Loanword Dictionary
– Moth’s Dictionary
– Holberg Dictionary
– Dictionary of older Danish (Kalkar’s Dictionary)
–  Swedish-Danish Dictionary
– Jensen & Goldschmidt

Detailed information on the dictionaries can be found here.

Find out more about ELEXIS visiting grants and former winning projects: