Bhatnagar, Shalabh and Borkar, Vivek S and Akarapu, Madhukar (2006) A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events. In: Journal of Machine Learning Research, 7 . pp. 1937-1962.
Restricted to Registered users only
Download (214Kb) | Request a copy
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here. The fast timescale recursions compute the conditional transition probabilities of an associated chain by obtaining solutions to the multiplicative Poisson equation (for a given policy estimate). Further, the risk parameter associated with the value function for a given policy estimate is updated on a timescale that lies in between the two scales above. We briefly sketch the convergence analysis of our algorithm and present a numerical application in the setting of routing multiple lows in communication networks.
|Item Type:||Journal Article|
|Additional Information:||Copyright of this article belongs to Journal of Machine Learning Research.|
|Keywords:||Markov decision processes;optimal control conditioned on a rare event;simulation based algorithms;SPSA with deterministic perturbations;reinforcement learning|
|Department/Centre:||Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)|
|Date Deposited:||30 May 2008|
|Last Modified:||19 Sep 2010 04:45|
Actions (login required)