#elexis_dk – UCPH

Visiting grants contact:

Bolette Sandford Pedersen

Bolette Sandford Pedersen

bspedersen@hum.ku.dk

Sussi Olsen

saolsen@hum.ku.dk 

  • Njalsgade 136, building 27, 2300 Copenhagen S

Find out more about ELEXIS visiting grants and former winning projects:

,

Integration of lexicographic data: the diachronic plane – Dorota Mika

Dorota Mika is a passionate aspiring lexicographer, focusing on the etymology of the Polish language. She worked on a variety of lexicographic projects in the past including the electronic Conceptual Dictionary of Old Polish.In order to generate a clear workflow on how to integrate and merge diachronic lexicographic data from electronic dictionaries, she applied to visit the Instituut voor the Nederlandse Taal to benefit from its long expertise.
,

How is the corpus base influencing collocation sets in dictionaries?

Carolin Müller-Spitzer is an experienced linguist and lexicographer, working on the influence of the corpus base on collocation sets in dictionaries. She applied for an ELEXIS research grant to visit the Instituut voor de Nederlandse Taal in Leiden (the Netherlands) to conduct a contrastive German-Dutch study on the influence of the corpus base on collocation sets in dictionaries.
Ana Vujasić
,

Adapting dictionary writing systems and other platforms to online dictionaries of idioms – Jelena Parizoska

Being an avid dictionary user herself, Jelena Parizoska wrote her PhD on the ‘variability of verbal idioms in English and Croatian within the cognitive linguistic framework.’ To learn how to incorporate certain features into the Online Dictionary of Croatian Idioms, she applied for a research grant to visit the Jožef Stefan Institute in Slovenia.

University of Copenhagen (UCPH, Denmark)

Can only be visited in combination with a stay at Det Danske Sprog- og Litteraturselskab (DSL)!

At the University of Copenhagen, the Centre for Language Technology and the Department of Nordic Studies and Linguistics provide resources to the ELEXIS-DK Infrastructure.

Visits are welcome all year round but for visits between July 1 and August 15 a specific appointment with one or more researchers must be made.

Resources

at the Centre for Language TechnologyUniversity of Copenhagen:

STO

STO is a computational lexicon intended for computational applications as opposed to other dictionaries intended primarily for human application.
It comprises 81,000 words of modern Danish mostly from the general language vocabulary, primarily based on a newspaper corpus, but 13,000 words originate from the subject fields: computerterminology, environment, health, finance, administration and trade & industry. All the words have a thorough morphological description, about half of the vocabulary has syntactic information, andabout 10 % has semantic information.

The information types of STO are:
– Morphology: part of speech, inflection, spelling variants and for nouns also information about compounds.
– Syntax: construction possibilities of the word and for verbs also specification of auxiliary verbs. For each construction pattern a prototypical corpus example is also given.
– Semantics: descriptions have different levels of detail including ontologic type, semantic relation, argument structure, selection restrictions etc.

The STO data are stored in a relational database and is available either in the Lexical Markup Language format or a CSV format.

Published: 2004, updated 2017
Contact: Sussi Olsen

The Danish WordNet DanNet
(with the Society of Danish Language and Literature DSL)

DanNet was compiled from 2004-2013 together with the Society of Danish Language and Literature. The compilation was based on the Danish Dictionary lexical data and its sense inventory, including the sense definitions and especially the fact that the genus proximum of each sense is tagged in the xml structure of the dictionary.
The WordNet contains 65,000 synsets which are provided with an ontological type and a link to the closest hypernym.
The synset members are linked to DDO senses. 5,000 Danish synsets are furthermore linked to the equivalent English synset in Princeton WordNet (by the relation eq_has_synonym), also labelled Princeton Core.
A subpart of DanNet is linked to other wordnets in the Baltic and Nordic countries through WordTies, an initiative developed in the MetaNord project.

Published: 2013
Language: modern Danish (1955-present) / general language
Type: monolingual, computational lexicon
Synsets: 65,000
Contact: Bolette S. Pedersen, bspedersen@hum.ku.dk

Tools

For a list of tools please see here.

Resources

at the Department of Nordic Studies and Linguistics, University of Copenhagen:

 ONP

A Dictionary of Old Norse Prose (ONP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
ONP records the vocabulary of prose writing in Old Norse, as transmitted in Norwegian and Icelandic manuscripts, the earliest of which date from the middle of the 12th century.
For some decades work on the dictionary consisted in selected excerption of texts covering all Old Norse prose genres.
This citation corpus constituted the basis for subsequent editorial work and has now been digitized and integrated into the online version of ONP. More info can be found here.

Lexicon Poeticum

Lexicon Poeticum (LP) is a dictionary project based at the Department of Nordic Studies and Linguistics at the University of Copenhagen.
LP aims to supplement ONP by covering thepoetic corpus is Old Norse.

The project is based on the Skaldic Project’s corpus, which is a rich digital edition of the majority of poetry in Old Norse in the period 800-1400, consisting of about 150,000 words, 75% of which has been published.

The entire corpus is linked to a supplemented version of ONP’s wordlist (which contains poetic words found in existing dictionaries plus a good number of proper nouns), and the project also adds lexical variants from the poetic manuscripts.
Currently a concordance of the published Skaldic corpus is available, and a draft concordance of the Eddic corpus (the remaining part of the project) has also been prepared based on the Menota TEI edition by Haraldur Bernharðsson. Afacility for organising entries into senses has been implemented in the database, but only trial entries have been produced. More information available here.

Ømålsordbogen
(ØMO, the Dictionary of Danish Insular Dialects, henceforth DID) 

is an historical dictionary giving thorough descriptions of the dialects on the Danish isles Seeland, Funen and surrounding islands.

It covers the period from 1750 to 1950, the core period being 1850 to 1920.
Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s.

The DID project and the underlying collections of data are an important part of Danish cultural heritage and cultural preservation. First, the collections and DID contain unique information about Danish language, not only the spoken vernaculars but also Danish language in an historical context. Second, DID gives thorough descriptions of the culture and life world of the dialect-speaking peasants and fishermen along with the detailed linguistic information about pronunciation, morphology, syntax and semantics.
More information available here.

Find out more about ELEXIS visiting grants and former winning projects: