A bankruptcy based approach to solving multi-agent credit assignment problem

Document Type : Research Paper

Authors

1 Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

2 Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran

3 Department of Mathematics and Computer Science, Shahed University, Tehran, Iran

Abstract

Multi-agent systems (MAS) are one of the prominent symbols of artificial intelligence (AI) that, in spite of having smaller entities as agents, have many applications in software development, complex system modeling, intelligent traffic control, etc. Learning of MAS, which is commonly based on Reinforcement Learning (RL), is one of the problems that play an essential role in the performance of such systems in an unknown environment. A major challenge in Multi-Agent Reinforcement Learning (MARL) is the problem of credit assignment in them. In this paper, in order to solve Multi-agent Credit Assignment (MCA) problem, we present a bottom-up method based on the bankruptcy concept for the effective distribution of the credits received from the environment in a MAS so that its performance is increased. In this work, considering the Task Start Threshold (TST) of the agents as a new constraint and a multi-score environment, as well as giving priority to agents of lower TST, three methods PTST, T-MAS and T-KAg are presented, which are based on the bankruptcy concept as a sub branch of game theory. In order to evaluate these methods, seven criteria were used among which density was a new one. The simulation results of the proposed methods indicated that the performance of the proposed methods was enhanced in comparison with those of the existing methods in six parameters while it proved a weaker performance in only one parameter.    

Keywords

[1] van Steen, Maarten, and Andrew S. Tanenbaum. A brief introduction to distributed systems, Computing
98(10)(2016) 967-1009.
[2] Yadav, Satya Prakash, Dharmendra Prasad Mahato, and Nguyen Thi Dieu Linh, eds. Distributed Artificial
Intelligence: A Modern Approach, CRC Press, 2020.
[3] Vlassis, Nikos. A concise introduction to multiagent systems and distributed artificial intelligence, Synthesis Lectures on Artificial Intelligence and Machine Learning 1(1)(2007) 1-71.
[4] Qadir, Muhammad Zuhair, Songhao Piao, Haiyang Jiang, and Mohammed El Habib Souidi. A novel approach
for multi-agent cooperative pursuit to capture grouped evaders, The Journal of Supercomputing 76(5)(2020) 3416-
3426.[5] Habibi, M., Broumandnia, A., Harounabadi, A. An Intelligent Traffic Light Scheduling Algorithm by using fuzzy
logic and gravitational search algorithm and considering emergency vehicles, International Journal of Nonlinear
Analysis and Applications, 2020; 11(Special Issue) 475-482. doi: 10.22075/ijnaa.2020.4706
[6] Kazemi, A., Shiri, M., Sheikhahmadi, A., Khodamoradi, M. A new parallel deep learning algorithm for breast
cancer classification, International Journal of Nonlinear Analysis and Applications, 2021; 12(Special Issue) 1269-
1282. doi: 10.22075/ijnaa.2021.24247.2702
[7] Li, Xueyan, and Hankun Zhang. A multi-agent complex network algorithm for multi-objective optimization, Applied Intelligence (2020) 1-28.
[8] Challenger, Moharram, and Hans Vangheluwe. Towards employing ABM and MAS integrated with MBSE for the
lifecycle of sCPSoS, In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering
Languages and Systems: Companion Proceedings, pp. 1-7. 2020.
[9] Amiri, Ehsan, Mina Rahmanian, Saeed Amiri, and Hadi Yazdani Praee. Medical images fusion using two-stage
combined model DWT and DCT, International Advanced Researches and Engineering Journal 5(3)(Under Construction) (2021) 344-351.
[10] Asadi, Mehrdad, Mahmood Fathy, Hamidreza Mahini, and Amir Masoud Rahmani. An Evolutionary Game
Approach to Safety-Aware Speed Recommendation in Fog/Cloud-Based Intelligent Transportation Systems, IEEE
Transactions on Intelligent Transportation Systems (2021).
[11] Panait, Liviu, and Sean Luke. Cooperative multi-agent learning: The state of the art, Autonomous agents and
multi-agent systems 11(3)(2005) 387-434.
[12] Wang, Ruyan, Xue Jiang, Yujie Zhou, Zhidu Li, Dapeng Wu, Tong Tang, Alexander Fedotov, and Vladimir
Badenko. Multi-agent reinforcement learning for edge information sharing in vehicular networks, Digital Communications and Networks (2021).
[13] Al-Dayaa, H. S., and D. B. Megherbi. Reinforcement learning technique using agent state occurrence frequency
with analysis of knowledge sharing on the agent’s learning process in multiagent environments, The Journal of
Supercomputing 59(1)(2012) 526-547.
[14] Harati, Ahad, Majid Nili Ahmadabadi, and Babak Nadjar Araabi. Knowledge-based multiagent credit assignment:
A study on task type and critic information, IEEE systems journal 1(1)(2007) 55-67.
[15] Airiau, St´ephane. Cooperative games and multiagent systems, The Knowledge Engineering Review 28(4)(2013)
381-424.
[16] Jing, Shoucai, Fei Hui, Xiangmo Zhao, Jackeline Rios-Torres, and Asad J. Khattak. Cooperative game approach to
optimal merging sequence and on-ramp merging control of connected and automated vehicles, IEEE Transactions
on Intelligent Transportation Systems 20(11)(2019) 4234-4244.
[17] Wang, Zeng, Bo Hu, Xin Wang, and Shanzhi Chen. Cooperative game-theoretic power control with a balancing
factor in large-scale LTE networks: an energy efficiency perspective, The Journal of Supercomputing 71(9)(2015)
3288-3300.
[18] Meng, Yan. Multi-robot searching using game-theory based approach, International Journal of Advanced Robotic
Systems 5(4)(2008) 44.
[19] Chang, Yu-Han, Tracey Ho, and Leslie P. Kaelbling. All learning is local: Multi-agent learning in global reward
games, (2004).
[20] Guisi, Douglas M., Richardson Ribeiro, Marcelo Teixeira, Andre P. Borges, and Fabricio Enembreck. Reinforcement learning with multiple shared rewards, Procedia Computer Science 80 (2016) 855-864.
[21] Bagnell, Drew, and Andrew Ng. On local rewards and scaling distributed reinforcement learning, Advances in
Neural Information Processing Systems 18 (2005) 91-98.
[22] R˘adulescu, Roxana, Manon Legrand, Kyriakos Efthymiadis, Diederik M. Roijers, and Ann Now´e. Deep multiagent reinforcement learning in a homogeneous open population, In Benelux Conference on Artificial Intelligence,
pp. 90-105. Springer, Cham, 2018.
[23] Rahaie, Zahra, and Hamid Beigy. Critic learning in multi agent credit assignment problem, Journal of Intelligent
and Fuzzy Systems 30(6)(2016) 3465-3480.
[24] George, Marcus L. Effective teaching and examination strategies for undergraduate learning during COVID-19
school restrictions, Journal of Educational Technology Systems 49(1)(2020) 23-48.
[25] Tisdell, Clement A. Economic, social and political issues raised by the COVID-19 pandemic, Economic analysis
and policy 68 (2020) 17-28.
[26] Arias, Michael, Rodrigo Saavedra, Maira R. Marques, Jorge Munoz-Gama, and Marcos Sep´ulveda. Human resource allocation in business process management and process mining: A systematic mapping study, Management
Decision (2018).
[27] Boonpeng, Sabaithip, and Piyasak Jeatrakul. Decision support system for investing in stock market by using OAA-neural network, In 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), pp.
1-6. IEEE, 2016.
[28] Wakilpoor, Ceyer, Patrick J. Martin, Carrie Rebhuhn, and Amanda Vu. Heterogeneous Multi-Agent Reinforcement
Learning for Unknown Environment Mapping, arXiv preprint arXiv:2010.02663 (2020).
[29] Calvo, Jeancarlo Arguello, and Ivana Dusparic. Heterogeneous Multi-Agent Deep Reinforcement Learning for
Traffic Lights Control, In AICS, pp. 2-13. 2018.
[30] Cripps, Martin, and Norman Ireland. The design of auctions and tenders with quality thresholds: the symmetric
case, The Economic Journal 104(423)(1994) 316-326.
[31] Peir´o, Jos´e M., Sonia Agut, and Rosa Grau. The relationship between overeducation and job satisfaction among
young Spanish workers: The role of salary, contract of employment, and work experience, Journal of applied
social psychology 40(3)(2010) 666-689.
[32] O’Neill, Barry. A problem of rights arbitration from the Talmud, Mathematical social sciences 2(4)(1982) 345-371.
[33] Castro, Leyre, and Edward A. Wasserman. Animal learning, Wiley Interdisciplinary Reviews: Cognitive Science
1(1)(2010) 89-98.
[34] Busoniu, Lucian, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement
learning, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38(2)(2008)
156-172.
[35] Jiang, Yuqian, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, and Peter Stone. Temporal-Logic-Based
Reward Shaping for Continuing Learning Tasks, arXiv preprint arXiv:2007.01498 (2020).
[36] Sutton, Richard Stuart. Temporal credit assignment in reinforcement learning, PhD diss., University of Massachusetts Amherst, 1984.
[37] Yu, Zhong, Gu Guochang, and Zhang Rubo. A new approach for structural credit assignment in distributed
reinforcement learning systems, In 2003 IEEE International Conference on Robotics and Automation (Cat. No.
03CH37422), vol. 1, pp. 1215-1220. IEEE, 2003.
[38] Mao, Wenji, and Jonathan Gratch. The social credit assignment problem, In International Workshop on Intelligent
Virtual Agents, pp. 39-47. Springer, Berlin, Heidelberg, 2003.
[39] Skinner, Burrhus Frederic. The behavior of organisms: An experimental analysis, BF Skinner Foundation, 2019.
[40] Foerster, Jakob, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual
multi-agent policy gradients, In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32(1. 2018.
[41] Foerster, Jakob, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual
multi-agent policy gradients, In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32(1. 2018.
[42] Wang, Jianhong, Yuan Zhang, Tae-Kyun Kim, and Yunjie Gu. Shapley Q-value: a local reward approach to solve
global reward games. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(05, pp. 7285-7292.
2020.
[43] Sunehag, Peter, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot et al. Value-decomposition networks for cooperative multi-agent learning, arXiv preprint
arXiv:1706.05296 (2017).
[44] Son, Kyunghwan, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. Qtran: Learning to factorize
with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine
Learning, pp. 5887-5896. PMLR, 2019.
[45] Rahaie, Zahra, and Hamid Beigy. Expertness framework in multi-agent systems and its application in credit
assignment problem, Intelligent Data Analysis 18(3)(2014) 511-528.
[46] Even-Dar, Eyal, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process, Advances in
neural information processing systems 17 (2005) 401-408.
[47] Ma, Chris YT, David KY Yau, Xin Lou, and Nageswara SV Rao. Markov game analysis for attack-defense of
power networks under possible misinformation, IEEE Transactions on Power Systems 28(2)(2012) 1676-1686.
[48] Levy, Yehuda John, and Eilon Solan. Stochastic games, Complex Social and Behavioral Systems: Game Theory
and Agent-Based Models (2020) 229-250.
[49] Mhatre, Manasi, Sakshi Nagaonkar, Sminil Shirsat, and Pournima Kamble. Scrabble Game Using Java, International Journal of Progressive Research in Science and Engineering 2(5)(2021) 114-116.
[50] Curiel, Imma J., Michael Maschler, and Stef H. Tijs. Bankruptcy games, Zeitschrift f¨ur operations research
31(5)(1987) A143-A159.
[51] Berganti˜nos, Gustavo, Leticia Lorenzo, and Silvia Lorenzo-Freire. A characterization of the proportional rule in
multi-issue allocation situations, Operations Research Letters 38(1)(2010) 17-19.
[52] Hagiwara, Makoto, and Shunsuke Hanato. A strategic justification of the constrained equal awards rule through a
procedurally fair multilateral bargaining game, Theory and Decision 90(2)(2021) 233-243.[53] Lorenzo, Leticia. The constrained equal loss rule in problems with constraints and claims, Optimization
59(5)(2010) 643-660.
Volume 12, Special Issue
December 2021
Pages 1987-2018
  • Receive Date: 06 August 2021
  • Revise Date: 30 November 2021
  • Accept Date: 04 December 2021