Resources at the Centre for Language Technology, University of Copenhagen:
STO is a computational lexicon intended for computational applications as opposed to other dictionaries intended primarily for human application.
It comprises 81,000 words of modern Danish mostly from the general language vocabulary, primarily based on a newspaper corpus, but 13,000 words originate from the subject fields: computerterminology, environment, health, finance, administration and trade & industry. All the words have a thorough morphological description, about half of the vocabulary has syntactic information, andabout 10 % has semantic information.
The information types of STO are:
– Morphology: part of speech, inflection, spelling variants and for nouns also information about compounds.
– Syntax: construction possibilities of the word and for verbs also specification of auxiliary verbs. For each construction pattern a prototypical corpus example is also given.
– Semantics: descriptions have different levels of detail including ontologic type, semantic relation, argument structure, selection restrictions etc.
The STO data are stored in a relational database and is available either in the Lexical Markup Language format or a CSV format.
Published: 2004, updated 2017
Contact: Sussi Olsen
The Danish WordNet DanNet
(with the Society of Danish Language and Literature DSL)
DanNet was compiled from 2004-2013 together with the Society of Danish Language and Literature. The compilation was based on the Danish Dictionary lexical data and its sense inventory, including the sense definitions and especially the fact that the genus proximum of each sense is tagged in the xml structure of the dictionary.
The WordNet contains 65,000 synsets which are provided with an ontological type and a link to the closest hypernym.
The synset members are linked to DDO senses. 5,000 Danish synsets are furthermore linked to the equivalent English synset in Princeton WordNet (by the relation eq_has_synonym), also labelled Princeton Core.
A subpart of DanNet is linked to other wordnets in the Baltic and Nordic countries through WordTies, an initiative developed in the MetaNord project.
Language: modern Danish (1955-present) / general language
Type: monolingual, computational lexicon
Contact: Bolette S. Pedersen, firstname.lastname@example.org
For a list of tools please see here.
A Dictionary of Old Norse Prose (ONP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
ONP records the vocabulary of prose writing in Old Norse, as transmitted in Norwegian and Icelandic manuscripts, the earliest of which date from the middle of the 12th century.
For some decades work on the dictionary consisted in selected excerption of texts covering all Old Norse prose genres.
This citation corpus constituted the basis for subsequent editorial work and has now been digitized and integrated into the online version of ONP. More info can be found here.
Lexicon Poeticum (LP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
LP aims to supplement ONP by covering thepoetic corpus is Old Norse.
The project is based on the Skaldic Project’s corpus, which is a rich digital edition of the majority of poetry in Old Norse in the period 800-1400, consisting of about 150,000 words, 75% of which has been published.
The entire corpus is linked to a supplemented version of ONP’s wordlist (which contains poetic words found in existing dictionaries plus a good number of proper nouns), and the project also adds lexical variants from the poetic manuscripts.
Currently a concordance of the published Skaldic corpus is available, and a draft concordance of the Eddic corpus (the remaining part of the project) has also been prepared based on the Menota TEI edition by Haraldur Bernharðsson. Afacility for organising entries into senses has been implemented in the database, but only trial entries have been produced. More information available here.
(ØMO, the Dictionary of Danish Insular Dialects, henceforth DID)
is an historical dictionary giving thorough descriptions of the dialects on the Danish isles Seeland, Funen and surrounding islands.
It covers the period from 1750 to 1950, the core period being 1850 to 1920.
Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s.
The DID project and the underlying collections of data are an important part of Danish cultural heritage and cultural preservation. First, the collections and DID contain unique information about Danish language, not only the spoken vernaculars but also Danish language in an historical context. Second, DID gives thorough descriptions of the culture and life world of the dialect-speaking peasants and fishermen along with the detailed linguistic information about pronunciation, morphology, syntax and semantics.
More information available here.