Path Discovery 5x5 World Experiment

Click on any of the experiment buttons to get started! Also be sure to press F12 to see extra details for each game.

Delay per step: ms (higher is slower) *Note that the delay CANNOT be changed midway through an experiment.
Enable Q Visualization

1a: Q-Learning; 500 steps of PRANDOM, 8500 steps of PRANDOM (α = 0.3, γ = 0.5)
1b: Q-Learning; 500 steps of PRANDOM, 8500 steps of PGREEDY (α = 0.3, γ = 0.5)
1c: Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α = 0.3, γ = 0.5)
2: SARSA; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α = 0.3, γ = 0.5)
3a: SARSA or Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α = 0.15, γ = 0.5)
3b: SARSA or Q-Learning; 500 steps of PRANDOM, 8500 steps of PEXPLOIT (α = 0.45, γ = 0.5)
4a: SARSA or Q-Learning; 500 steps of PRANDOM, after which PEXPLOIT, terminate after 6 games (α = 0.3, γ = 0.5)
4b: SARSA or Q-Learning; 500 steps of PRANDOM, after which PEXPLOIT, and after the first 3 games, change the location of the pickups, then run for 3 more games (α = 0.3, γ = 0.5)