Bhatnagar, Shalabh and Kumar, Shishir (2004) A Simultaneous Perturbation Stochastic Approximation-Based Actor-Critic Algorithm for Markov Decision Processes. In: IEEE Transactions on Automatic Control, 49 (4). pp. 592-598.
A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic policies and uses simultaneous perturbation stochastic approximation-based estimates. On the faster scale, the value function corresponding to a given stationary policy is updated and averaged over a fixed number of epochs (for enhanced performance). The proof of convergence to a locally optimal policy is presented. Finally, numerical experiments using the proposed algorithm on flow control in a bottleneck link using a continuous time queueing model are shown.
|Item Type:||Journal Article|
|Additional Information:||1990 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Keywords:||Actor-critic algorithms;Markov decision processes;simultaneous perturbation stochastic approximation (SPSA);Two timescale stochastic approximation|
|Department/Centre:||Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)|
|Date Deposited:||30 Nov 2005|
|Last Modified:||19 Sep 2010 04:21|
Actions (login required)