Skip to content

Detail of publication


Krňoul, Z. : Visual Speech Synthesis - Talking Head . University of West Bohemia, 2008.

Download PDF



This PhD thesis describes the research conducted in the field of visual speech synthesis. The main aim of the thesis is to create a complete system of automatic visual speech synthesis, which converts written text into animation of talking head (talking head synthesis system). To meet this objective, the thesis describes a summary of current knowledge in this field, the analysis of the different approaches and methods and the solution divided into several separate parts. The first part is to create images of human faces and animation in such a way that it is possible to make visual speech intelligible. Addressing this part includes an analysis of possible methods for facial animation, and advantages and disadvantages of different approaches are discussed. A new approach of talking head animation is designed and implemented. This face animation method is suitable for expression of articulatory movements of the lips and tongue as well as for deformations observed in the upper half of the face. Another part is the problem of preparing, recording and processing audiovisual data. Addressing the problem, new approach involving the three-dimensional reconstruction of human faces based on scanning with the strip light is designed. The problem of capturing the visual speech is dealt with the proposal of two new methods of tracking the movements of the lip and chin. In the context of the thesis, two audio-visual databases are created for Czech speech suitable for the visual speech synthesis. The databases include also speech segmentation and the articulatory trajectories describing the shape and movement of the lips. The research on the audio-visual synthesis deals also with the issue of controlling of animation. It carries out a summary of existing methods of automatic creation of articulatory trajectories from arbitrary input text. With a focus on issues of lip coarticulation, one current approach is selected and trained according to speech recorded in the audio-visual databases. In order to address this task, new synthesis method of articulatory trajectories is also proposed and implemented to solve the lip coarticulation problem in another way. The automatic synthesis of visual speech has been tested. Two levels of testing are included. The first test level compares the newly created articulatory trajectories synthesized using the method of selection of articulatory targets. The outcome of this test does not indicate a significant difference between articulatory trajectories synthesized by the current method and the newly proposed method. The task of the second test level is to verify the overall intelligibility of the talking head. Two studies of visual speech perception testing 19 normally hearing subjects are designed and carried out. The results confirm that proposed talking head system has significant visual contribution to speech perception, but also the possibility of further improvement. At the end of the PhD thesis, several applications of the talking head are mentioned.

Detail of publication

Title: Visual Speech Synthesis - Talking Head
Author: Krňoul, Z.
Language: Czech
Date of publication: 9 Oct 2008
Year: 2008
Type of publication: Habilitation and dissertation theses
Publisher: University of West Bohemia
/ 2011-12-16 15:52:17 /


talking head, visual speech synthesis, selection of articulatory targets, faceanimation, perception tests


 author = {Kr\v{n}oul, Z.},
 title = {Visual Speech Synthesis - Talking Head},
 year = {2008},
 publisher = {University of West Bohemia},
 url = {},