Skip to content

Detail of publication


Kolář, J. and Švec, J. and Psutka, J. : Automatic punctuation annotation in Czech broadcast news speech . SPECOM´2004, p. 319-325, SPIIRAS, Saint-Petersburg, 2004.

Download PDF



This paper reports our initial experiments with automatic punctuation annotation from speech. We have focused on Czech broadcast news speech. We employed two statistical models - prosodic model and language model. The prosodic model expresses relationships between prosodic quantities (such as pitch, speaking rate or loudness) and punctuation marks. We tested two implementations of this model -- decision tree and multi-layer perceptron. Hidden-event N-gram models were employed for language modeling. Instead of using an ordinary word-based model, we replaced infrequent word forms by their morphological tags and trained a mixed model. Scores from both models can be combined. The model combining language model with the decision tree yielded superior results. Testing on true words we achieved classification accuracy 95.2% and F-measure 78.2%.

Detail of publication

Title: Automatic punctuation annotation in Czech broadcast news speech
Author: Kolář, J. ; Švec, J. ; Psutka, J.
Language: English
Date of publication: 20 Sep 2004
Year: 2004
Type of publication: Papers in proceedings of reviewed conferences
Title of journal or book: SPECOM´2004
Page: 319 - 325
ISBN: 5-7452-0110-X
Publisher: SPIIRAS
Address: Saint-Petersburg
Date: 20 Sep 2004 - 22 Sep 2004
/ 2008-04-18 14:21:45 /


automatic punctuation, prosody, hidden-event n-gram model, sentence boundary, broadcast news, tag-based models


 author = {Kol\'{a}\v{r}, J. and \v{S}vec, J. and Psutka, J.},
 title = {Automatic punctuation annotation in Czech broadcast news speech},
 year = {2004},
 publisher = {SPIIRAS},
 journal = {SPECOM?2004},
 address = {Saint-Petersburg},
 pages = {319-325},
 ISBN = {5-7452-0110-X},
 url = {},