Skip to content
Home » The corpus of Pomak language is online

The corpus of Pomak language is online

    Our first language resource is ready and openly available: it is a morphologically annotated corpus of Pomak language with about 86,000 words. We have uploaded the new corpus on the Universal Dependencies treebank repository, for international visibility and research availability. You can follow the progress of the project and the corpus at the treebank repository page. Also, check it out at the repository page.

    Furthermore, a paper describing the corpus of Pomak language was accepted at the Fifth Workshop on Computational Methods for Endangered Languages (Computel-5)/Association for Computational Linguistics (ACL), taking place in Dublin, Ireland, 26-27 May 2022. The paper with the title Morphologically Annotated Corpora of Pomak, was written by Ritván Jusúf Karahóǧa, Panagiotis Krimpas, Vivian Stamou, Vasileios Arampatzakis, Dimitrios Karamatskos, Vasileios Sevetlidis, Nikolaos Constantinides, Nikolaos Kokkas, George Pavlidis, Stella Markantonatou.

    The resource and the announcement were the result of the collaboration of the ATHENA Research Centre with the community of Greek Pomak language researchers and the native speakers of the language. They demonstrate the great potential of the application of artificial intelligence in the study and preservation of languages, as well as the dynamics of the researchers at the ATHENA Research Centre.

    Action Philotis aims to facilitate the development of linguistic written and narrated resources, especially targeting living languages and dialects at the limits of survival. Philotis is a research and infrastructure action, implemented at the ATHENA Research Centre, under the “Action for the Support of Regional Excellence”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).

    The treebank is licensed under the terms of Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA 3.0.


    We wish to thank all contributors to the original annotation efforts. Morphological annotation was carried out by Ritvan Karahoǧa and Nicolaos Constantinides. Panagiotis Krimpas supported the annotation with expertise in Slavic languages and Stella Markantonatou with expertise in formal grammatical frameworks. Nicolaos Kokkas contributed to the collection of Pomak texts.


    • Karahóǧa, R. Krimpas, P., Stamou, V., Arampatzakis, V., Karamatskos, D., Sevetlidis, V., Constantinides, N., Kokkas, N., Pavlidis, G., Markantonatou,S. (2022). Morphologically annotated corpora of Pomak. 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages: The Use of Computational Methods in the Study of Endangered Languages. Association for Computational Linguistics. Dublin, May 26-27, 2022.
    Please follow and like us:
    Tweet 20

    Leave a Comment

    This site uses Akismet to reduce spam. Learn how your comment data is processed.