#elexis_dk – UCPH

Visiting grants contact:

Bolette Sandford Pedersen

Bolette Sandford Pedersen

bspedersen@hum.ku.dk

Sussi Olsen

saolsen@hum.ku.dk 

  • Njalsgade 136, building 27, 2300 Copenhagen S

Find out more about ELEXIS visiting grants and former winning projects:

,

Creating a German-Latvian LSP glossary – Silga Sviķe

Silga Svike is a teacher, linguist and researcher focusing on special domain dictionaries. Her research visit takes her to the ACDH-CH where she aims to create a bilingual digital LSP corpus in order to translate specialized literature.
,

Building a specialized Corpus in Macedonian – Nikolche Mickoski

As an experienced translator, interpreter and terminologist, Nikolche Mickoski set his goal to tackle the problem of the lack of a national Macedonian corpus by building one from specialized scientific publications, published by the Macedonian Academy of Sciences and Arts.
,

A corpus-based method for extracting polylexical units – Eglantina Gishti

In order to overcome the methodological, quantitative and qualitative gap between various dictionaries Eglantina Gishti is working with on a daily basis, she decided to visit ELEXIS infrastructures in Denmark to learn how to develop and improve tools and services needed for her work.

University of Copenhagen (UCPH, Denmark)

Can only be visited in combination with a stay at Det Danske Sprog- og Litteraturselskab (DSL)!

At the University of Copenhagen, the Centre for Language Technology and the Department of Nordic Studies and Linguistics provide resources to the ELEXIS-DK Infrastructure.

Visits are welcome all year round but for visits between July 1 and August 15 a specific appointment with one or more researchers must be made.

Resources

at the Centre for Language TechnologyUniversity of Copenhagen:

STO

STO is a computational lexicon intended for computational applications as opposed to other dictionaries intended primarily for human application.
It comprises 81,000 words of modern Danish mostly from the general language vocabulary, primarily based on a newspaper corpus, but 13,000 words originate from the subject fields: computerterminology, environment, health, finance, administration and trade & industry. All the words have a thorough morphological description, about half of the vocabulary has syntactic information, andabout 10 % has semantic information.

The information types of STO are:
– Morphology: part of speech, inflection, spelling variants and for nouns also information about compounds.
– Syntax: construction possibilities of the word and for verbs also specification of auxiliary verbs. For each construction pattern a prototypical corpus example is also given.
– Semantics: descriptions have different levels of detail including ontologic type, semantic relation, argument structure, selection restrictions etc.

The STO data are stored in a relational database and is available either in the Lexical Markup Language format or a CSV format.

Published: 2004, updated 2017
Contact: Sussi Olsen

The Danish WordNet DanNet
(with the Society of Danish Language and Literature DSL)

DanNet was compiled from 2004-2013 together with the Society of Danish Language and Literature. The compilation was based on the Danish Dictionary lexical data and its sense inventory, including the sense definitions and especially the fact that the genus proximum of each sense is tagged in the xml structure of the dictionary.
The WordNet contains 65,000 synsets which are provided with an ontological type and a link to the closest hypernym.
The synset members are linked to DDO senses. 5,000 Danish synsets are furthermore linked to the equivalent English synset in Princeton WordNet (by the relation eq_has_synonym), also labelled Princeton Core.
A subpart of DanNet is linked to other wordnets in the Baltic and Nordic countries through WordTies, an initiative developed in the MetaNord project.

Published: 2013
Language: modern Danish (1955-present) / general language
Type: monolingual, computational lexicon
Synsets: 65,000
Contact: Bolette S. Pedersen, bspedersen@hum.ku.dk

Tools

For a list of tools please see here.

Resources

at the Department of Nordic Studies and Linguistics, University of Copenhagen:

 ONP

A Dictionary of Old Norse Prose (ONP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
ONP records the vocabulary of prose writing in Old Norse, as transmitted in Norwegian and Icelandic manuscripts, the earliest of which date from the middle of the 12th century.
For some decades work on the dictionary consisted in selected excerption of texts covering all Old Norse prose genres.
This citation corpus constituted the basis for subsequent editorial work and has now been digitized and integrated into the online version of ONP. More info can be found here.

Lexicon Poeticum

Lexicon Poeticum (LP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
LP aims to supplement ONP by covering thepoetic corpus is Old Norse.

The project is based on the Skaldic Project’s corpus, which is a rich digital edition of the majority of poetry in Old Norse in the period 800-1400, consisting of about 150,000 words, 75% of which has been published.

The entire corpus is linked to a supplemented version of ONP’s wordlist (which contains poetic words found in existing dictionaries plus a good number of proper nouns), and the project also adds lexical variants from the poetic manuscripts.
Currently a concordance of the published Skaldic corpus is available, and a draft concordance of the Eddic corpus (the remaining part of the project) has also been prepared based on the Menota TEI edition by Haraldur Bernharðsson. Afacility for organising entries into senses has been implemented in the database, but only trial entries have been produced. More information available here.

Ømålsordbogen
(ØMO, the Dictionary of Danish Insular Dialects, henceforth DID) 

is an historical dictionary giving thorough descriptions of the dialects on the Danish isles Seeland, Funen and surrounding islands.

It covers the period from 1750 to 1950, the core period being 1850 to 1920.
Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s.

The DID project and the underlying collections of data are an important part of Danish cultural heritage and cultural preservation. First, the collections and DID contain unique information about Danish language, not only the spoken vernaculars but also Danish language in an historical context. Second, DID gives thorough descriptions of the culture and life world of the dialect-speaking peasants and fishermen along with the detailed linguistic information about pronunciation, morphology, syntax and semantics.
More information available here.

Find out more about ELEXIS visiting grants and former winning projects: