Skip to content

Detail of publication


Kolář, J. and Švec, J. and Strassel, S. and Walker, Ch. and Kozlíková, D. and Psutka, J. : Czech spontaneous speech corpus with structural metadata . Interspeech Lisboa 2005, p. 1165-1168, ISCA, Bonn, 2005.

Download PDF



This paper describes a Czech spontaneous speech corpus consisting of radio talk show recordings. As the first complete non-English MDE corpus, it has been annotated with structural metadata information beyond the words that is critical to both increasing transcript readability and allowing application of downstream NLP methods. Metadata annotation involves partitioning verbatim transcripts into syntactic/semantic units (SUs) that function to express a complete idea; and identifying fillers and edit disfluencies. Annotation guidelines for English metadata developed by Linguistic Data Consortium were taken as the starting point, with changes applied to accommodate specific phenomena of Czech. In addition to the necessary language-dependent modifications, we further propose some language-independent modifications including limited prosodic labeling at SU boundaries. Statistics about the structural metadata annotation present in the corpus and inter-annotator agreement numbers are also presented.

Detail of publication

Title: Czech spontaneous speech corpus with structural metadata
Author: Kolář, J. ; Švec, J. ; Strassel, S. ; Walker, Ch. ; Kozlíková, D. ; Psutka, J.
Language: English
Date of publication: 4 Sep 2005
Year: 2005
Type of publication: Papers in proceedings of reviewed conferences
Title of journal or book: Interspeech Lisboa 2005
Page: 1165 - 1168
Publisher: ISCA
Address: Bonn
Date: 4 Sep 2005 - 8 Sep 2005
/ 2008-04-18 14:23:16 /


SUs, structural metadata, spontaneous speech, disfluencies, fillers


 author = {Kol\'{a}\v{r}, J. and \v{S}vec, J. and Strassel, S. and Walker, Ch. and Kozl\'{i}kov\'{a}, D. and Psutka, J.},
 title = {Czech spontaneous speech corpus with structural metadata},
 year = {2005},
 publisher = {ISCA},
 journal = {Interspeech Lisboa 2005},
 address = {Bonn},
 pages = {1165-1168},
 url = {},