Static finite mixture model of multivariate skew-normal distributions to cluster multivariatetime series based on generalized autoregressive score approach

Document Type : Research Paper

Authors

1 Science and Research Branch, Islamic Azad University, Tehran, Iran

2 School of Mathematics, Iran University of Science and Technology, Tehran, Iran

Abstract

This paper proposes an observation-driven finite mixture model for clustering high-dimension data. A simple algorithm using static hidden variables statically clusters the data into separate model components. The model accommodates normal and skew-normal distributed mixtures with time-varying component means, covariance matrices and skewness coefficient. These parameters are estimated using the EM algorithm and updated with the Generalized Autoregressive Scale (GAS) approach. Our proposed model is preferably clustered using a skew-normal distribution rather than a normal distribution when dealing with real data that may be skewed and asymmetrical. Finally, our proposed model will be evaluated using a simulation study and the results will be discussed using a real data set.

Keywords

[1] A. Azzalini, A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), 171–178.
[2] A. Azzalini and A.D. Valle, The multivariate skew-normal distribution, Biometrika 83 (1996), no. 2, 715–726.
[3] C.C. Aggarwal and C.K. Reddy, Data Clustering, Algorithms and Applications, Chapman and Hall/CRC, 2014.
[4] J. Bai and S. Ng, Determining the number of factors in approximate factor models, Econometrica 70 (2002), no. 1, 191–221.
[5] M.A. Benjamin, R.A. Rigby, and D.M. Stasinopoulos, Generalized autoregressive moving average models, J. Amer. Statist. Assoc. 98 (2003), no. 461, 214–223.
[6] L. Catania, Dynamic adaptive mixture models, University of Rome Tor Vergata, arXiv preprint arXiv:1603.01308, 2016.
[7] D. Creal, S. Koopman, and A. Lucas, Generalized autoregressive score models with applications, J. Appl. Economet. 28, (2013) no. 5, 777–795.
[8] D. Creal, B. Schwaab, S.J. Koopman, and A. Lucas, An observation-driven mixed measurement dynamic factor model with application to credit risk, Rev. Econ. Statist. 96 (2014), no. 5, 898–915.
[9] D.L. Davies and D.W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Machine Intell. 2 (1979), 224–227.
[10] R.C. De Amorim and C. Hennig, Recovering the number of clusters in data sets with noise features using feature rescaling factors, Inf. Sci. 324 (2015), no. 10, 126–145.
[11] A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statist. Soc.: Ser. B 39 (1977), no. 1, 1–22. 
[12] S. Fruehwirth-Schnatter and S. Kaufmann, Model-based clustering of multiple time series, J. Bus. Econ. Statist. 26 (2008), 78–89.
[13] A. Hajrajabi and M. Maleki, Nonlinear semiparametric autoregressive model with finite mixtures of scale mixtures of skew normal innovations, J. Appl. Statist. 46 (2019), no. 11, 2010–2029.
[14] A.C. Harvey, Dynamic Models for Volatility and Heavy Tails, with Applications to Financial and Economic Time Series, Cambridge University Press, 2013.
[15] C.M. Hurvich and C.-L. Tsai, Regression and time series model selection in small samples, Biometrika 76 (1989), 297–307.
[16] T.I. Lin, J.C. Lee, and W.J. Hsieh, Robust mixture modeling using the skew t distribution, Statist. Comput. 17 (2007), 81—92.
[17] T.I. Lin, J.C. Lee, and S.Y. Yen, Finite mixture modelling using the skew normal distribution, Statist. Sin. 17 (2007), 909–927.
[18] T.I. Lin, Maximum likelihood estimation for multivariate skew normal mixture models, J. Multivar. Anal. 100 (2009), no. 2, 257-265.
[19] A. Maruotti, A. Punzo, and L. Bagnato, Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series, J. Financ. Economet. 17 (2019), no. 1, 91–117.
[20] G. McLachlan and D. Peel, Finite Mixture Models, Wiley, 2000.
[21] D. Peel and G.J. McLachlan, Robust mixture modelling using the t distribution, Statist. Comput. 10 (2000), 339–348.
[22] A. Punzo and A. Maruotti, Clustering multivariate longitudinal observations: The contaminated Gaussian hidden Markov model, J. Comput. Graph. Statist. 25 (2016), no. 4, 1097–1116.
[23] N. Shephard, Generalized linear Autoregressions, Nuffield College, University of Oxford, 1995.
[24] Y. Wang, R.S. Tsay, J. Ledolter, and K.M. Shrestha, Forecasting simultaneously high-dimensional time series: A robust model-based clustering approach, J. Forecast. 32 (2013), no. 8, 673–684.
Volume 16, Issue 4
April 2025
Pages 27-39
  • Receive Date: 17 May 2021
  • Revise Date: 25 September 2021
  • Accept Date: 24 October 2021