How is the corpus base influencing collocation sets in dictionaries?

Carolin Müller-Spitzer is an experienced linguist and lexicographer, working on the influence of the corpus base on collocation sets in dictionaries. She applied for an ELEXIS research grant to visit the Instituut voor de Nederlandse Taal in Leiden (the Netherlands) to conduct a contrastive German-Dutch study on the influence of the corpus base on collocation sets in dictionaries.

© Carolin Müller-Spitzer, 2022

How did you learn about the ELEXIS travel grants?

I learned about the research grants available from the ELEXIS newsletter.

What is your project about?

I would like to work together with Carole Tiberius and other colleagues at Instituut voor de Nederlandse Taal (INT) on a contrastive German-Dutch study: The topic of the study is the influence of the corpus base on collocation sets in dictionaries, exemplified on the entries man and woman.
The starting point of our case study is the observation that even in modern corpus-based dictionaries of German, e.g. elexiko, the descriptions of entries, such as man or woman, are more influenced by stereotypes than we expected.
In elexiko, for example, collocation sets are listed for each keyword. In the case of 
Mann (man) and Frau (woman), the selection of the most frequent collocation partners leads to strongly different descriptions. In particular, it is striking that in the article Mann, the agent role forms the second collocation set (“What does a man do?”), whereas in the case of Frau, the patient role (“What happens to a woman?”) is listed second; an imbalance that some researchers have already criticised as doing gender.
The fact that this has been presented in the dictionary in this way is due to the frequency of the groups, i.e. in the case of women, the patient role is much more strongly addressed in the corpus texts of the elexiko corpus than the agent role. For men it is the other way round. The corpora on the dictionary base are – like the large linguistic corpora on German in general – dominated by newspaper texts.
In our case study for German, we show how the linguistic contexts of use of
man and woman obtained on the basis of newspaper texts differ from other language collections, e.g. texts of fiction or popular magazines, and how different the “reality” shown in the dictionary would therefore look if the corpus was composed differently. In doing so, we address the fundamental question of what effect the choice of the empirical dictionary base has on semantic descriptions in dictionary entries and what consequences can be drawn from this for empirically oriented lexicographic work. The visit in Leiden should be used to work together on replicating this study for Dutch.

“We address the fundamental question of what effect the choice of the empirical dictionary base has on semantic descriptions in dictionary entries and what consequences can be drawn from this for empirically oriented lexicographic work.”

What is your background that brought you up to this point?

I studied linguistics with a focus on lexicography, in the class of Herbert Ernst Wiegand in Heidelberg, where I also obtained my doctorate. I wrote my habilitation on the topic of “Empirical research into dictionary use”. Even though I am now more involved with gender linguistics, I still teach regularly in the “European Master of Lexicography” and thus still have contact with lexicography.

Which hosting institution did you apply to and why?

I applied to visit the Instituut vor de Nederlandse Taal because it has the corpora, the corpus analysis systems and the knowledge necessary to conduct the contrastive study.

Where does your interest in languages/lexicography come from and what keeps you motivated?

The current study combines the social discussions of gender linguistics (e.g., how do we want to represent man and woman in the dictionary) with empirical research. And I think that could bring very interesting insights.

Profile: Carolin Müller-Spitzer
Travel Grant Call 5
Period of stay 12.6.–22.6.2022
Project title The influence of the corpus base on collocation sets in dictionaries
Home institution
Leibniz-Instiut für Deutsche Sprache
(Leibniz-Insitute for the German Language)
#elexis_de
Hosting institution Instituut vor de Nederlandse Taal
(INT, the Netherlands)
#elexis_nl