TY - JOUR
T1 - Kullback-Leibler Divergence and Akaike Information Criterion in General Hidden Markov Models
AU - Fuh, Cheng Der
AU - Kao, Chu Lan Michael
AU - Pang, Tianxiao
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - To characterize the Kullback-Leibler divergence and Fisher information in general parametrized hidden Markov models, in this paper, we first show that the log likelihood and its derivatives can be represented as an additive functional of a Markovian iterated function system, and then provide explicit characterizations of these two quantities through this representation. Moreover, we show that Kullback-Leibler divergence can be locally approximated by a quadratic function determined by the Fisher information. Results relating to the Cramér-Rao lower bound and the Hájek-Le Cam local asymptotic minimax theorem are also given. As an application of our results, we provide a theoretical justification of using Akaike information criterion (AIC) model selection in general hidden Markov models. Last, we study three concrete models: a Gaussian vector autoregressive-moving average model of order (p,q) , recurrent neural networks, and temporal restricted Boltzmann machine, to illustrate our theory.
AB - To characterize the Kullback-Leibler divergence and Fisher information in general parametrized hidden Markov models, in this paper, we first show that the log likelihood and its derivatives can be represented as an additive functional of a Markovian iterated function system, and then provide explicit characterizations of these two quantities through this representation. Moreover, we show that Kullback-Leibler divergence can be locally approximated by a quadratic function determined by the Fisher information. Results relating to the Cramér-Rao lower bound and the Hájek-Le Cam local asymptotic minimax theorem are also given. As an application of our results, we provide a theoretical justification of using Akaike information criterion (AIC) model selection in general hidden Markov models. Last, we study three concrete models: a Gaussian vector autoregressive-moving average model of order (p,q) , recurrent neural networks, and temporal restricted Boltzmann machine, to illustrate our theory.
KW - AIC
KW - Boltzmann machine
KW - Cramèr-Rao lower bound
KW - Fisher information
KW - hidden Markov model
KW - Hájek-Le Cam theorem
KW - Kullback-Leibler divergence
KW - Markovian iterated function system
KW - recurrent neural network
UR - http://www.scopus.com/inward/record.url?scp=85191314270&partnerID=8YFLogxK
U2 - 10.1109/TIT.2024.3392983
DO - 10.1109/TIT.2024.3392983
M3 - Article
AN - SCOPUS:85191314270
SN - 0018-9448
VL - 70
SP - 5888
EP - 5909
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 8
ER -