Bhatnagar, Shalabh and Babu, K Mohan (2008) New algorithms of the Q-learning type. In: Automatica, 44 (4). pp. 1111-1119.
0.pdf - Published Version
Restricted to Registered users only
Download (283Kb) | Request a copy
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.
|Item Type:||Journal Article|
|Additional Information:||Copyright of this article belongs to Elsevier Science.|
|Keywords:||Q-learning;Reinforcement learning;Markov decision processes;Two-timescale stochastic approximation;SPSA.|
|Department/Centre:||Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)|
|Date Deposited:||25 Mar 2010 11:19|
|Last Modified:||19 Sep 2010 05:58|
Actions (login required)