Hidden Markov models (HMMs) are an efficient tool to describe and model the underlying
behaviour of many phenomena. HMMs assume that the observed data are generated
independently from a parametric distribution, conditional on an unobserved process that
satisfies the Markov property. The model selection or determining the number of hidden states
for these models is an important issue which represents the main interest of this thesis.
Applying likelihood-based criteria for HMMs is a challenging task as the likelihood function
of these models is not available in a closed form. Using the data augmentation approach, we
derive two forms of the likelihood function of a HMM in closed form, namely the observed
and the conditional likelihoods. Subsequently, we develop several modified versions of the
Akaike information criterion (AIC) and Bayesian information criterion (BIC) approximated
under the Bayesian principle. We also develop several versions for the deviance information
criterion (DIC). These proposed versions are based on the type of likelihood, i.e. conditional
or observed likelihood, and also on whether the hidden states are dealt with as missing data or
additional parameters in the model. This latter point is referred to as the concept of focus.
Finally, we consider model selection from a predictive viewpoint. To this end, we develop the
so-called widely applicable information criterion (WAIC). We assess the performance of these
various proposed criteria via simulation studies and real-data applications.
In this thesis, we apply Poisson HMMs to model the spatial dependence analysis in count data
via an application to traffic safety crashes for three highways in the UK. The ultimate interest
is in identifying highway segments which have distinctly higher crash rates. Selecting an
optimal number of states is an important part of the interpretation. For this purpose, we
employ model selection criteria to determine the optimal number of states. We also use several
goodness-of-fit checks to assess the model fitted to the data. We implement an MCMC
algorithm and check its convergence. We examine the sensitivity of the results to the prior
specification, a potential problem given small sample sizes. The Poisson HMMs adopted can
provide a different model for analysing spatial dependence on networks. It is possible to
identify segments with a higher posterior probability of classification in a high risk state, a task
that could prioritise management action.
Date of Award | 2017 |
---|
Original language | English |
---|
Awarding Institution | |
---|
Supervisor | David McMullan (Other Supervisor) |
---|
Model Fit Diagnostics for Hidden Markov Models
Kadhem, S. K. (Author). 2017
Student thesis: PhD