Abdulla, Mohammed Shahid and Bhatnagar, Shalabh (2007) Solving MDPs using two-timescale simulated annealing with multiplicative weights. In: American Control Conference 2007, JUL 09-13, 2007, New York, NY.
This is the latest version of this item.
Restricted to Registered users only
We develop extensions of the Simulated Annealing with Multiplicative Weights (SAMW) algorithm that proposed a method of solution of Finite-Horizon Markov Decision Processes (FH-MDPs). The extensions developed are in three directions: a) Use of the dynamic programming principle in the policy update step of SAMW b) A two-timescale actor-critic algorithm that uses simulated transitions alone, and c) Extending the algorithm to the infinite-horizon discounted-reward scenario. In particular, a) reduces the storage required from exponential to linear in the number of actions per stage-state pair. On the faster timescale, a 'critic' recursion performs policy evaluation while on the slower timescale an 'actor' recursion performs policy improvement using SAMW. We give a proof outlining convergence w.p. 1 and show experimental results on two settings: semiconductor fabrication and flow control in communication networks.
|Item Type:||Conference Paper|
|Additional Information:||Copyright 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Department/Centre:||Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)|
|Date Deposited:||30 Apr 2010 09:15|
|Last Modified:||22 Feb 2012 06:51|
Available Versions of this Item
- Solving MDPs using two-timescale simulated annealing with multiplicative weights. (deposited 30 Apr 2010 09:15) [Currently Displayed]
Actions (login required)