TY - JOUR

T1 - Maximum Margin Hidden Markov Models for Sequence Classification

AU - Mutsam, Nikolaus

AU - Pernkopf, Franz

PY - 2016

Y1 - 2016

N2 - Discriminative learning methods are known to work well in pattern classification tasks and often show benefits compared to generative learning. This is particularly true in case of model mismatch, i.e. the model cannot represent the true data distribution. In this paper, we derive discriminative maximum margin learning for hidden Markov models (HMMs) with emission probabilities represented by Gaussian mixture models (GMMs). The focus is on single-label sequence classification where the margin objective is specified by the probabilistic gap between the true class and the most competing class. In particular, we use the extended Baum-Welch (EBW) framework to optimize this probabilistic margin embedded in a hinge loss function. Approximations of the margin objective and the derivatives are necessary. In the experiments, we compare maximum margin HMMs to generative maximum likelihood and discriminative conditional log-likelihood (CLL) HMM training. We present results of classifying trajectories of handwritten characters, Australian sign language data, digits of speech data and UCR time-series data. Maximum margin HMMs outperform in many cases CLL-HMMs. Furthermore, maximum margin HMMs achieve a significantly better performance than generative maximum likelihood HMMs.

AB - Discriminative learning methods are known to work well in pattern classification tasks and often show benefits compared to generative learning. This is particularly true in case of model mismatch, i.e. the model cannot represent the true data distribution. In this paper, we derive discriminative maximum margin learning for hidden Markov models (HMMs) with emission probabilities represented by Gaussian mixture models (GMMs). The focus is on single-label sequence classification where the margin objective is specified by the probabilistic gap between the true class and the most competing class. In particular, we use the extended Baum-Welch (EBW) framework to optimize this probabilistic margin embedded in a hinge loss function. Approximations of the margin objective and the derivatives are necessary. In the experiments, we compare maximum margin HMMs to generative maximum likelihood and discriminative conditional log-likelihood (CLL) HMM training. We present results of classifying trajectories of handwritten characters, Australian sign language data, digits of speech data and UCR time-series data. Maximum margin HMMs outperform in many cases CLL-HMMs. Furthermore, maximum margin HMMs achieve a significantly better performance than generative maximum likelihood HMMs.

U2 - 10.1016/j.patrec.2016.03.017

DO - 10.1016/j.patrec.2016.03.017

M3 - Article

SN - 1872-7344

VL - 77

SP - 14

EP - 20

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -