Skip to content

Flag-cs zobrazit českou verzi

Identification:
ZCU/KKY/2012/042



Year: 2012
Author: Lukáš Machlica

gmm-estimator

In the task of spoken speech processing for recognition/classification (but also in other classification tasks) databases containing huge amounts of training data are available. Gaussian mixture models are often used in order to represent these data. The developed software allows to train parameters of the Gaussian mixture model given the input data so that the probability of the data given this model is maximized. Software is focused on processing large amounts of data, the estimation algorithm supports SSE instructions on the CPU or modern graphics cards yielding very fast calculations. The software includes several training methods, which can be selected via parameters from the command line. Algorithms are implemented robustly with a focus on the stability of calculations.

VERIFICATION A VALIDATION

The software has been verified and validated on Department of Cybernetics, University of West Bohemia in Pilsen. Verification was performed on a set of suitably chosen dataset, for which parameters of a Gaussian mixture model have been estimated and the probability of this dataset in this model was calculated. The modelling was utilized in the classification task where the goal was to determine the correct class of an unknown input vector/image. Each class was represented by a single Gaussian mixture model. Classification was based on the maximum likelihood of an unknown vector given a model of a class.

Validation was performed using the software in practical tasks solved on the Department of Cybernetics. These are the tasks of speech, speaker and image recognition. The software has significantly speeded up the calculations and it ensured their stability when multiple inputs coming from different areas of recognition are assumed.

USER MANUAL

The software is operated from the command line of LINUX/WINDOWS OS 32bit/64bit. The input of the software is a set of vectors stored in one or multiple input files. The input file consists of a header and the data itself. The format of the input file is:

  • at first the string: SV-ES-PARAM is written, which identifies the type of the file
  • subsequently the dimension and count of feature vectors are written as int32 data types
  • next, for each vector one number from interval < 0, 1 > is written, which represents its certainty; since this software does not utilize such information it is possible to chose the number arbitrary and write the number as a float data type - i.e. N numbers is written successively into the file, where N is the number of written feature vectors
  • at last the feature vectors are written successively - all dimensions of first feature vector, all dimensions of second feature vector, etc. as float data types

The Gaussian mixture model estimation is performed from the command line with syntax:

> trainGMM.exe -i input_file -I input_directory_with_files -o output_model_filename [optional_parameters]

optional_parameters:

  • -g N : N represents the number of Gaussians in the Gaussian mixture model
  • -t X : X represents the type of estimation of the model, and its value is chosen from the set {1,2,3,4}, where MLLR=1, MAP=2, fMLLR=3, MLLR+MAP=4 are possible methods (see below)
  • -u S : S represents the filename of an existing input Gaussian mixture model
  • if the parameter -t X is not specified one has to specify the parameter -g and number N > 0; the method of sequential split of Gaussians with smallest weight is then used in order to train the model
  • if the parameter -t X is specified, also the initial model has to be specified via the parameter -u S, because individual methods require prior information in the form of a trained model

The output is the estimated Gaussian mixture model, which maximizes the likelihood of input data in this model. Detailed description of estimation methods, which can be chosen via the parameter -t can be found in this publication. Sofware along with more detailed description can be found here.




Licence

For information please contact:

Jan Vaněk

Univerzitní 8

30614 Plzeň

Email: vanekyj(at)kky.zcu.cz

Tel: +420 377 632 529



Confirmation of usage

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Pilsen


Contact form

This software is protected by license. To download or get more information, please fill in the form below: