Přejít na obsah

Detail publikace


J. Kolář and J. Švec : Czech Broadcast Conversation MDE Transcripts . vol. LDC2009T20, Linguistic Data Consortium, Philadelphia, USA, 2009.

Další informace

Corpus Description at the LDC Catalog


Czech Broadcast Conversation MDE Transcripts was created to extend Metadata Extraction (MDE) research to conversational Czech. The goal of MDE is to take raw speech recognition output and refine it into forms that are of more use to humans and to downstream automatic processes. In simple terms, this means the creation of automatic transcripts that are maximally readable. This readability might be achieved in a number of ways: removing non-content words like filled pauses and discourse markers from the text; removing sections of disfluent speech; and creating boundaries between natural breakpoints in the flow of speech so that each sentence or other meaningful unit of speech might be presented on a separate line within the resulting transcript. Natural capitalization, punctuation and standardized spelling, plus sensible conventions for representing speaker turns and identity are further elements in the readable transcript.

Detail publikace

Název: Czech Broadcast Conversation MDE Transcripts
Autor: J. Kolář ; J. Švec
Jazyk publikace: anglicky
Datum vydání: 17.7.2009
Rok vydání: 2009
Typ publikace: Prototyp, uplatněná metodika, autorizovaný software
Číslo vydání: LDC2009T20
ISBN: 1-58563-520-0
Nakladatel: Linguistic Data Consortium
Místo vydání: Philadelphia, USA
/ 2009-08-06 13:50:00 /


 author = {J. Kol\'{a}\v{r} and J. \v{S}vec},
 title = {Czech Broadcast Conversation MDE Transcripts},
 year = {2009},
 publisher = {Linguistic Data Consortium},
 address = {Philadelphia, USA},
 volume = {LDC2009T20},
 ISBN = {1-58563-520-0},
 url = {http://www.kky.zcu.cz/en/publications/JKolar_2009_CzechBroadcast_1},