This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.

Following Newton direction in Policy Gradient with parameter exploration

MANGANINI, GIORGIO;
2015

Abstract

This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.
9781479919604
INF
AUT
File in questo prodotto:
File Dimensione Formato  
2015_IJCNN_Manganini.pdf

non disponibili

Licenza: Non pubblico
1.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.12571/7711
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact