TicTacToe Not Learning to Win?
See original GitHub issueI’ve been playing with the TicTacToe example. The example trains rapidly on my CPU (great A3C implementation @peastman!), but the rewards indicate that the learned model isn’t doing great. In the TicTacToe environment, +10 is a win, +5 a draw, .1 for unfinished games, -3 for losses and illegal moves. The average rewards I’m seeing (evaluated on 5K games, at intervals of 50K rollouts) are 3-5, indicating, that the system is getting around a draw on average:
[{50000: 3.5432999999999999}, {100000: 5.9225000000000003}, {150000: 3.1412}, {200000: 3.2425999999999999}, {250000: 3.3361999999999998}, {300000: 3.4727000000000001}, {350000: 5.0313999999999997}, {400000: 3.5625}, {450000: 2.9565999999999999}, {500000: 3.7029999999999998}, {550000: 2.6173000000000002}, {600000: 3.5272999999999999}, {650000: 4.0795000000000003}, {700000: 3.6585999999999999}, {750000: 3.1958000000000002}, {800000: 5.8152999999999997}, {850000: 3.9902000000000002}, {900000: 4.1486999999999998}, {950000: 3.3544}, {1000000: 3.3946999999999998}]
Naively, I would expect that we should be able to train RL to win nearly always on TicTacToe (average reward close to 10). What are we possibly missing?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Tic tac toe game not learning to win - python - Stack Overflow
In this tic tac toe game using RL I penalize the agent if it does not win. The RL agent is player X....
Read more >3 Ways to Win at Tic Tac Toe - wikiHow
1. Play your first X in a corner. Most experienced tic tac toe players put the first "X" in a corner when they...
Read more >Tic Tac Toe - Never Lose (Usually Win) - YouTube
Your browser can't play this video. Learn more. Switch camera.
Read more >How To Win At Tic Tac Toe Almost Every Time - YouTube
Your browser can't play this video. Learn more. Switch camera.
Read more >Is there a way to never lose at Tic-Tac-Toe? - Quora
Yes. You start first with any one corner box. If the opponent choose any box other than center, then opponent cannot stop you...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think the main problem is just that your network is much bigger and more complicate than it needs to be. I simplified it down to this:
I also reduced
value_weight
back to 1.0 (the default value). With those changes, I get much better results:might be a good time to try to get https://github.com/deepchem/deepchem/issues/720 to work (at least in contrib)