Título: Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning
Palestrante: Felipe Leno da Silva (EP-USP)
Data e local: Quarta-feira, 25 de setembro de 2019 às 14:00, Santo André, Bloco A, S-213-0
Resumo:
Reinforcement Learning (RL) is a well-known technique to train autonomous agents through interactions with the environment. However, the learning process has a high sample-complexity to infer an effective policy, especially when multiple agents are simultaneously actuating in the environment. We here propose to take advantage of previous knowledge, so as to accelerate learning in multiagent RL problems. Agents may reuse knowledge gathered from previously solved tasks, and they may also receive guidance from more experienced friendly agents to learn faster. However, specifying a framework to integrate knowledge reuse into the learning process requires answering challenging research questions, such as: How to abstract task solutions to reuse them later in similar yet different tasks?; How to define when advice should be given?; How to select the previous task most similar to the new one and map correspondences; and How to defined if received advice is trustworthy? Although many methods exist to reuse knowledge from a specific knowledge source, the literature is composed of methods very specialized in their own scenario that are not compatible. We propose in this thesis to reuse knowledge both from previously solved tasks and from communication with other agents. In order to accomplish our goal, we propose several flexible methods to enable each of those two types of knowledge reuse. Our proposed methods include: Ad Hoc Advising, an inter-agent advising framework, where agents can share knowledge among themselves through action suggestions; an extension of the object-oriented representation to multiagent RL and methods to leverage it for reusing knowledge; and a method specialized to adversarial games where the agent models the opponent and learns to play against the model in a simulated version of the task, significantly reducing the number of required samples against the real opponent. Combined, our methods provide ways to reuse knowledge from both previously solved tasks and other agents with state-of-the-art performance. Our contributions are first steps towards more flexible and broadly applicable multiagent transfer learning methods, where agents will be able to consistently combine reused knowledge from multiple sources, including solved tasks and other learning agents.