A Causal View for Reward Redistribution in Reinforcement Learning

来源：电子工程学院点击：

讲座名称：A Causal View for Reward Redistribution in Reinforcement Learning

讲座人：杜雅丽助理教授

讲座时间：7月9日14:30

地点：北校区100号楼306

讲座人介绍：

杜雅丽，伦敦大学国王学院助理教授。此前她是伦敦大学学院人工智能中心博士后研究员。主要研究兴趣为强化学习、多智能体学习协作。研究成果已广泛发表在ICLR、ICML、NeurIPS以及AI Journal等顶级会议和期刊。她曾在 ACML2022, AAAI2023 大会上演讲合作多智能体学习的教程 (Tutorial)。曾多次担任国际知名期刊的编辑和会议的审稿人或程序委员，AAMAS 2023组委会，担任Journal of AAMAS （CCF B类）特刊主编，IEEE Transactions on AI 副主编，担任AAAI 2022/2023 高级程序委员。因在合作强化学习上的贡献，入选AAAI New Faculty Highlights programme (2023), Rising Stars in AI (KAUST 2023), WAIC云帆奖(2023), KCL年度学术贡献奖 (2022)。她的研究也受到英国工程和自然科学研究理事会 (UKRI EPSRC)资助。

讲座内容：

In reinforcement learning, a significant challenge lies in identifying the state-action pairs responsible for delayed future rewards. Return Decomposition addresses this challenge by redistributing rewards from observed sequences while maintaining policy invariance. However, existing approaches lack interpretability. In this talk, we propose a novel framework called Generative Return Decomposition (GRD) that explicitly models the contributions of state and action from a causal perspective. GRD utilizes causal generative models to characterize the generation of Markovian rewards and trajectory-wise long-term return. By identifying the causal relations and unobservable Markovian rewards, GRD provides a compact representation for policy optimization within the most favorable subspace of the agent's state space. Theoretical analysis confirms the identifiability of the unobservable Markovian reward function and causal structure. Experimental results demonstrate the superior performance of GRD compared to state-of-the-art methods, while the provided visualization showcases the interpretability of our approach.

主办单位：电子工程学院

上一条：Constructions of Coded Caching Schemes

下一条：雷达信号处理全国重点实验室“国际合作平台”第六届国际学术研讨会

1 2 3

报告人	杜雅丽助理教授	时间	7月9日14:30
地点	北校区100号楼306	报告时间