Click on any of the experiment buttons to get started! Also be sure to
press F12 to see extra details for each game.
1a: Q-Learning; 500 steps of PRANDOM, 8500 steps of PRANDOM (α =
0.3, γ = 0.5)
1b: Q-Learning; 500 steps of PRANDOM, 8500 steps of PGREEDY (α =
0.3, γ = 0.5)
1c: Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α =
0.3, γ = 0.5)
2: SARSA; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α = 0.3,
γ = 0.5)
3a: SARSA or Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT
(α = 0.15, γ = 0.5)
3b: SARSA or Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT
(α = 0.45, γ = 0.5)
4a: SARSA or Q-Learning; 500 steps of PRANDOM, after which PEXPLOIT,
terminate after 6 games (α = 0.3, γ = 0.5)
4b: SARSA or Q-Learning; 500 steps of PRANDOM, after which PEXPLOIT, and
after the first 3 games, change the location of the pickups, then run
for 3 more games (α = 0.3, γ = 0.5)