Policy Search for the Optimal Control of Markov Decision Processes:
A Novel Particle-Based Iterative Scheme

Manganini, Giorgio; Pirotta, Matteo; Restelli, Marcello; Piroddi, Luigi; Prandini, Maria

doi:10.1109/TCYB.2015.2483780

Classical approximate dynamic programming techniques based on state-space gridding become computationally impracticable for high-dimensional problems. Policy search techniques cope with this curse of dimensionality issue by searching for the optimal control policy in a restricted parameterized policy space. We here focus on the case of discrete action space and introduce a novel policy parametrization that adopts particles to describe the map from the state space to the action space, each particle representing a region of the state space that is mapped into a certain action. The locations and actions associated with the particles describing a policy can be tuned by means of a recently introduced policy gradient method with parameter-based exploration. The task of selecting an appropriately sized set of particles is here solved through an iterative policy building scheme that adds new particles to improve the policy performance and is also capable of removing redundant particles. Experiments demonstrate the scalability of the proposed approach as the dimensionality of the state-space grows.

Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme

MANGANINI, GIORGIO;PIROTTA, MATTEO;RESTELLI, MARCELLO;PIRODDI, LUIGI;PRANDINI, MARIA

2016-01-01

Abstract

Classical approximate dynamic programming techniques based on state-space gridding become computationally impracticable for high-dimensional problems. Policy search techniques cope with this curse of dimensionality issue by searching for the optimal control policy in a restricted parameterized policy space. We here focus on the case of discrete action space and introduce a novel policy parametrization that adopts particles to describe the map from the state space to the action space, each particle representing a region of the state space that is mapped into a certain action. The locations and actions associated with the particles describing a policy can be tuned by means of a recently introduced policy gradient method with parameter-based exploration. The task of selecting an appropriately sized set of particles is here solved through an iterative policy building scheme that adds new particles to improve the policy performance and is also capable of removing redundant particles. Experiments demonstrate the scalability of the proposed approach as the dimensionality of the state-space grows.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Parole chiave
	
				Approximate dynamic programming (ADP), Markov decision processes (MDPs), policy search, reinforcement learning (RL), stochastic optimal control.
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2016_IEEETCyb_46_Manganinipdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Non pubblico Dimensione 1.61 MB Formato Unknown Visualizza/Apri Richiedi una copia	1.61 MB	Unknown	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12571/7993

Citazioni

ND

13

12

Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme

MANGANINI, GIORGIO;PIROTTA, MATTEO;RESTELLI, MARCELLO;PIRODDI, LUIGI;PRANDINI, MARIA

2016-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)