Reducing explicit word vectors dimensions using BPSO-based labeling algorithm and voting method

Document Type : Research Paper


School of Computer Engineering, Iran University of Science and Technology,Tehran, Iran.


Interpretability of word vector components is very important for obtaining conceptual relations. Word vectors derived from counting models are interpretable but suffer from the high-dimensionality problem. Our goal in this study is to obtain interpretable low-dimensional word vectors in such a way that the least accuracy loss occurs. To achieve this goal, we propose an approach to reduce the dimensions of word vectors using a labeling method based on the BPSO algorithm and a voting method for selecting final context words. In this approach, we define several different base models to solve the labeling problem using different data and different objective functions. Then, we train each base model and select 3 of the best solutions for each model. We create the target word vectors of the dictionary based on the context words labeled "1". Next, we use the three best solutions of each base model to build the ensemble. After creating the ensemble, we use the voting method to assign the final label to the primary context words and select N final context words. In this study, we use the corpus ukWaC to construct word vectors. We evaluate the resulting word vectors on the MEN, RG-65, and SimLex-999 test sets. The evaluation results show that by reducing the word vectors dimensions from 5000 to 1507, the Spearman correlation coefficient of the proposed approach has been reduced to a lesser extent compared to each base model. Therefore, the accuracy drop of the proposed approach is justified after reducing the dimensions from 5000 to 1507. It is not a large penalty because the resulting word vectors are low-dimensional and interpretable.


[1] Abdelsalam, A., Bojar, O., & El-Beltagy, S. R, Bilingual embeddings and word alignments for translation quality
estimation, In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, (2016,
August) 764-771.
[2] Amoozegar, M., & Minaei-Bidgoli, B, Optimizing multi-objective PSO based feature selection method using a
feature elitism mechanism, Expert Systems with Applications, 113, (2018) 499-514.
[3] Alguliyev, R. M., Aliguliyev, R. M., & Abdullayeva, F. J, PSO+ K-means algorithm for anomaly detection in
Big Data, Statistics, Optimization & Information Computing, 7(2), (2019) 348-359.
[4] Baroni, M, Composition in distributional semantics, Language and Linguistics Compass, 7(10),(2013) 511-522.
[5] Baroni, M., Dinu, G., & Kruszewski, G, Don’t count, predict! a systematic comparison of context-counting vs.
context-predicting semantic vectors, In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2014, June) 238-247.
[6] Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E, The WaCky wide web: a collection of very large
linguistically processed web-crawled corpora, Language resources and evaluation, 43(3), (2009) 209-226.
[7] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T, Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, 5, (2017) 135-146.
[8] Biemann, C., & Riedl, M, Text: Now in 2D! a framework for lexical expansion with contextual similarity, Journal
of Language Modelling, 1(1), (2013) 55-95.
[9] Bordag, S, A comparison of co-occurrence and similarity measures as simulations of context, In International
Conference on Intelligent Text Processing and Computational Linguistics, (2008, February) 52-63, Springer,
Berlin, Heidelberg.
[10] BinSaeedan, W., & Alramlawi, S, CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis, Knowledge-Based Systems, (2021) 107224.
[11] Chen, J., Gong, Z., & Liu, W, A nonparametric model for online topic discovery with word embeddings, Information Sciences, 504, (2019) 32-47.
[12] Esposito, M., Damiano, E., Minutolo, A., De Pietro, G., & Fujita, H, Hybrid query expansion using lexical
resources and word embeddings for sentence retrieval in question answering, Information Sciences, 514, (2020)
[13] Eberhart, R. C., & Shi, Y, Computational intelligence: concepts to implementations, Elsevier, (2011).
[14] F´evotte, C., & Idier, J, Algorithms for nonnegative matrix factorization with the β-divergence, Neural computation, 23(9),(2011) 2421-2456.
[15] Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., & Smith, N, Sparse overcomplete word vector representations,
arXiv preprint arXiv:1506.02004,(2015).
[16] Gamallo, P, Comparing explicit and predictive distributional semantic models endowed with syntactic contexts,
Language Resources and Evaluation, 51(3), (2017) 727-743.
[17] Gamallo, P., & Bordag, S, Is singular value decomposition useful for word similarity extraction?, Language
resources and evaluation, 45(2), (2011) 95-119.
[18] Giatsoglou, M., Vozalis, M. G., Diamantaras, K., Vakali, A., Sarigiannidis, G., & Chatzisavvas, K. C, Sentiment
analysis leveraging emotions and word embeddings. Expert Systems with Applications, 69, (2017) 214-224.
[19] Jha, K., Wang, Y., Xun, G., & Zhang, A, Interpretable word embeddings for medical domain, In 2018 IEEE
international conference on data mining (ICDM) (2018, November) 1061-1066.
[20] Kim, S. Y., & Upneja, A, Majority voting ensemble with a decision trees for business failure prediction during
economic downturns, Journal of Innovation & Knowledge, 6(2), (2021) 112-123.[21] Khan, F.H., Qamar, U. and Bashir, S SentiMI: Introducing point-wise mutual information with SentiWordNet to
improve sentiment polarity detection, Applied Soft Computing, 39, (2016) 140-153.
[22] Kennedy, J.,& Eberhart, R, Particle swarm optimization, In Proceedings of ICNN’95-international conference on
neural networks, Vol. 4, (1995, November) 1942-1948, IEEE.
[23] Lenci, A, Distributional models of word meaning, Annual review of Linguistics, 4, (2018) 151-171.
[24] Landauer, T. K., & Dumais, S. T, A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge, Psychological review, 104(2),(1997) 211.
[25] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J, Distributed representations of words and phrases
and their compositionality, In Advances in neural information processing systems, (2013) 3111-3119.
[26] Murphy, B., Talukdar, P., & Mitchell, T, Learning effective and interpretable semantic models using non-negative
sparse embedding, In Proceedings of COLING (2012, December) 1933-1950.
[27] Naderalvojoud, B., & Sezer, E. A, Sentiment aware word embeddings using refinement and senti-contextualized
learning approach, Neurocomputing, 405, (2020) 149-160.
[28] Orhan, U., & Tulu, C. N, A novel embedding approach to learn word vectors by weighting semantic relations:
SemSpace, Expert Systems with Applications, 180, (2021) 115146.
[29] Pennington, J., Socher, R., & Manning, C. D, Glove: Global vectors for word representation, In Proceedings of
the 2014 conference on empirical methods in natural language processing (EMNLP), (2014, October) 1532-1543.
[30] Padr´o, M., Idiart, M., Villavicencio, A., & Ramisch, C, Nothing like good old frequency: Studying context filters
for distributional thesauri, In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP), (2014, October) 419-424.
[31] Paul, D., Jain, A., Saha, S., & Mathew, J, Multi-objective PSO based online feature selection for multi-label
classification, Knowledge-Based Systems, 222, (2021) 106966.
[32] Rostami, M., Forouzandeh, S., Berahmand, K., & Soltani, M, Integration of multi-objective PSO based feature
selection and node centrality for medical datasets, Genomics, 112(6), (2020) 4370-4384.
[33] Stein, R. A., Jaques, P. A., & Valiati, J. F, An analysis of hierarchical text classification using word embeddings,
Information Sciences, 471, (2019) 216-232.
[34] Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X, Sparse word embeddings using l1 regularized online learning, In
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (2016, July) 2915-2921.
[35] Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., & Hovy, E, Spine: Sparse interpretable neural
embeddings, In Thirty-Second AAAI Conference on Artificial Intelligence, (2018, April).
[36] Wahde, M, Biologically inspired optimization methods: an introduction, WIT press, (2008).
[37] Werbin-Ofir, H., Dery, L., & Shmueli, E, Beyond majority: Label ranking ensembles based on voting rules, Expert
Systems with Applications, 136, (2019) 50-61.
Volume 12, Issue 2
November 2021
Pages 2161-2178
  • Receive Date: 15 February 2021
  • Revise Date: 19 July 2021
  • Accept Date: 01 August 2021