A particle-based policy for the optimal control of Markov decision processes

IRIS

When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse of dimensionality). Policy search techniques are able to overcome this problem because, instead of estimating the value function over the entire state space, they search for the optimal control policy in a restricted parameterized policy space. This paper presents a new policy parametrization that exploits a single point (particle) to represent an entire region of the state space and can be tuned through a recently introduced policy gradient method with parameter-based exploration. Experiments demonstrate the superior performance of the proposed approach in high dimensional environments.

A particle-based policy for the optimal control of Markov decision processes

Matteo Pirotta;Giorgio Manganini;Luigi Piroddi;Maria Prandini;Marcello Restelli

2014-01-01

Abstract

When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse of dimensionality). Policy search techniques are able to overcome this problem because, instead of estimating the value function over the entire state space, they search for the optimal control policy in a restricted parameterized policy space. This paper presents a new policy parametrization that exploits a single point (particle) to represent an entire region of the state space and can be tuned through a recently introduced policy gradient method with parameter-based exploration. Experiments demonstrate the superior performance of the proposed approach in high dimensional environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Codice ISBN
	
				9783902823625
			
	Parole chiave
	
				Markov decision processes, Stochastic optimal control, Approximate dynamic programming, Reinforcement learning, Policy search.
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2014_IFACWC_Pirotta.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Accesso gratuito Dimensione 1.06 MB Formato Adobe PDF Visualizza/Apri	1.06 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12571/7660

Citazioni

ND

2

ND

social impact