Fitting a Gaussian mixture model through the Gini index

López-Lobato, Adriana Laura; Avendaño-Garrido, Martha Lorena

Geodesic

Parcourir par

Fitting a Gaussian mixture model through the Gini index

López-Lobato, Adriana Laura ; Avendaño-Garrido, Martha Lorena

International Journal of Applied Mathematics and Computer Science, Tome 31 (2021) no. 3, pp. 487-500.

Voir la notice de l'article provenant de la source Library of Science

Résumé

A linear combination of Gaussian components is known as a Gaussian mixture model. It is widely used in data mining and pattern recognition. In this paper, we propose a method to estimate the parameters of the density function given by a Gaussian mixture model. Our proposal is based on the Gini index, a methodology to measure the inequality degree between two probability distributions, and consists in minimizing the Gini index between an empirical distribution for the data and a Gaussian mixture model. We will show several simulated examples and real data examples, observing some of the properties of the proposed method.

Keywords: Gini index problem, Gaussian mixture model, clustering
Mots-clés : indeks Giniego, model mieszaniny Gaussa, grupowanie

@article{IJAMCS_2021_31_3_a8,
     author = {L\'opez-Lobato, Adriana Laura and Avenda\~no-Garrido, Martha Lorena},
     title = {Fitting a {Gaussian} mixture model through the {Gini} index},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {487--500},
     publisher = {mathdoc},
     volume = {31},
     number = {3},
     year = {2021},
     language = {en},
     url = {https://geodesic-test.mathdoc.fr/item/IJAMCS_2021_31_3_a8/}
}

TY  - JOUR
AU  - López-Lobato, Adriana Laura
AU  - Avendaño-Garrido, Martha Lorena
TI  - Fitting a Gaussian mixture model through the Gini index
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2021
SP  - 487
EP  - 500
VL  - 31
IS  - 3
PB  - mathdoc
UR  - https://geodesic-test.mathdoc.fr/item/IJAMCS_2021_31_3_a8/
LA  - en
ID  - IJAMCS_2021_31_3_a8
ER  -

%0 Journal Article
%A López-Lobato, Adriana Laura
%A Avendaño-Garrido, Martha Lorena
%T Fitting a Gaussian mixture model through the Gini index
%J International Journal of Applied Mathematics and Computer Science
%D 2021
%P 487-500
%V 31
%N 3
%I mathdoc
%U https://geodesic-test.mathdoc.fr/item/IJAMCS_2021_31_3_a8/
%G en
%F IJAMCS_2021_31_3_a8

López-Lobato, Adriana Laura; Avendaño-Garrido, Martha Lorena. Fitting a Gaussian mixture model through the Gini index. International Journal of Applied Mathematics and Computer Science, Tome 31 (2021) no. 3, pp. 487-500. https://geodesic-test.mathdoc.fr/item/IJAMCS_2021_31_3_a8/

Bibliographie
Cité par

[1] [1] Bassetti, F., Bodini, A. and Regazzini, E. (2006). On minimum Kantorovich distance estimators, Statistics and Probability Letters 76(12): 1298–1302.

[2] [2] Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer, New York.

[3] [3] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B (Methodological) 39(1): 1–22.

[4] [4] Elkan, C. (1997). Boosting and naive Bayesian learning, Proceedings of the International Conference on Knowledge Discovery and Data Mining, Newport Beach, USA.

[5] [5] Flach, P.A. and Lachiche, N. (2004). Naive Bayesian classification of structured data, Machine Learning 57(3): 233–269.

[6] [6] Giorgi, G.M. and Gigliarano, C. (2017). The Gini concentration index: A review of the inference literature, Journal of Economic Surveys 31(4): 1130–1148.

[7] [7] Greenspan, H., Ruf, A. and Goldberger, J. (2006). Constrained Gaussian mixture model framework for automatic segmentation of MR brain images, IEEE Transactions on Medical Imaging 25(9): 1233–1245.

[8] [8] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science 30(4): 703–715, DOI: 10.34768/amcs-2020-0052.

[9] [9] Kulczycki, P. (2018). Kernel estimators for data analysis, in M. Ram and J.P. Davim (Eds), Advanced Mathematical Techniques in Engineering Sciences, CRC/Taylor Francis, Boca Raton, pp. 177–202.

[10] [10] López-Lobato, A.L. and Avendaño-Garrido, M.L. (2020). Using the Gini index for a Gaussian mixture model, in L. Martínez-Villaseñor et al. (Eds), Advances in Computational Intelligence. MICAI 2020, Lecture Notes in Computer Science, Vol. 12469, Springer, Cham, pp. 403–418.

[11] [11] Mao, C., Lu, L. and Hu, B. (2020). Local probabilistic model for Bayesian classification: A generalized local classification model, Applied Soft Computing 93: 106379.

[12] [12] Meng, X.-L. and Rubin, D.B. (1994). On the global and componentwise rates of convergence of the EM algorithm, Linear Algebra and its Applications 199(Supp. 1): 413–425.

[13] [13] Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Rastrow, A., Rose, R., Schwarz, P. and Thomas, S. (2011). The subspace Gaussian mixture model: A structured model for speech recognition, Computer Speech Language 25(2): 404–439.

[14] [14] Rachev, S., Klebanov, L., Stoyanov, S. and Fabozzi, F. (2013). The Methods of Distances in the Theory of Probability and Statistics, Springer, New York, pp. 659–663.

[15] [15] Reynolds, D.A. (2009). Gaussian mixture models, in S.Z. Li (Ed.), Encyclopedia of Biometrics, Springer, New York, pp. 659–663.

[16] [16] Rubner, Y., Tomasi, C. and Guibas, L.J. (2000). The Earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision 40(2): 99–121.

[17] [17] Singh, R., Pal, B.C. and Jabr, R.A. (2009). Statistical representation of distribution system loads using Gaussian mixture model, IEEE Transactions on Power Systems 25(1): 29–37.

[18] [18] Torres-Carrasquillo, P.A., Reynolds, D.A. and Deller, J.R. (2002). Language identification using Gaussian mixture model tokenization, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, pp. 1–757.

[19] [19] Ultsch, A. and Lötsch, J. (2017). A data science based standardized Gini index as a Lorenz dominance preserving measure of the inequality of distributions, PloS One 12(8): e0181572.

[20] [20] Vaida, F. (2005). Parameter convergence for EM and MM algorithms, Statistica Sinica 15(2005): 831–840.

[21] [21] Villani, C. (2003). Topics in Optimal Transportation, American Mathematical Society, Providence.

[22] [22] Xu, L. and Jordan, M.I. (1996). On convergence properties of the EM algorithm for Gaussian mixtures, Neural Computation 8(1): 129–151.