Michal Křen

Director of the Institute


Institute of the Czech National Corpus, Faculty of Arts, Charles University

Ústav Českého národního korpusu

The Institute of the Czech National Corpus is one of the departments of the Faculty of Arts, at the Charles University in Prague. The Institute’s principal research focus is to maintain and further develop the Czech National Corpus (CNC). CNC is a project aiming to continuously map the Czech language by building large electronic language corpora and providing access to them. CNC focuses on broad-scale and complex data collection, including contemporary written Czech in all its genres and varieties, spoken Czech (covering the whole area of the Czech Republic), older Czech, as well as translated Czech (the InterCorp parallel corpus and the database of translation equivalents accessible via Treq).

Given its large scope, diverse and balanced design, high processing standard, reliable metadata and high-quality linguistic annotation, the CNC language data can compete with similar resources for major world languages. What is crucial is the continuity of data collection that enables researchers to carry out longitudinal studies of the language’s development, as well as to study changes in language awareness and public discourse in different periods of time. CNC language corpora serve as a primary research material for a wide range of research topics mainly within the social sciences and humanities, but also in natural language processing.

CNC provides user access to the corpora through specialized analytical tools in the form of web-based applications, enabling user-friendly, yet effective work with language data. Together with complex user support (an online user forum, documentation and knowledge base for corpus linguistics), these applications are located at the CNC research web portal. The CNC portal is open access, the only requirement to make use of all the applications and their features is a free online registration.

CNC is recognized as one of the Large Research Infrastructures of the Czech Republic. It actively cooperates with the CLARIN European research infrastructure and with its Czech national node, LINDAT/CLARIAH-CZ. CNC is an associated member of the CLARIN-CZ consortium with the K-centre status and maintains active contacts with many foreign research institutions with similar focus.