[CRITICAL PERFORMANCE ISSUE] Move List Allocation
See original GitHub issueChess Programming Background
I’m the lead developer of the C# StockNemo Chess Engine (~ 3.2K elo & TCEC Swiss Participant), as well as the lead developer of the (in-development) C++ rewrite StockDory (> 3.3K elo & TCEC Swiss Participant). Neither of those engines is a derivative or copy of Stockfish (Stockfish was an inspiration for the name).
Summary
As was expressed through Discord, Move Generation is far too slow with the current state of the API. Among other things, it does a heap allocation (for 218 moves) per traversed node in the tree. While other things can also be discussed, fixes for them aren’t as simple. On the contrary, for the mentioned problem, the fix is simple: Use C#'s stackalloc feature to allocate move lists on the stack (considerably faster allocation & doesn’t trigger the Garbage Collector). The code for the majority of developers would change…
From:
Move[] moves = board.GetLegalMoves();
To:
Span<Move> moves = stackalloc Move[218];
board.GetLegalMoves(ref moves);
Now, GetLegalMoves
would accept a Span<Move>
reference argument, inflating that stack-allocated span instead of allocating an array on the heap. As can be seen, this adds near zero complexity and is not hard to understand for most developers. It is understood that this project was aimed towards the novice side of the chess engine developing community; however, it’s important that correct practices are pushed forward rather than incorrect misconceptions.
Not to mention, the speed-up due to this would be quite insane. See PR: https://github.com/SebLague/Chess-Challenge/pull/190 (includes benchmarks)
It is a problem, not just a fair disadvantage:
So far, though, I’ve talked about this morally. However, it’s important to back it up with cases where it is a real problem:
Consider the FEN:
r2q1rk1/bppb1pp1/p2p2np/2PPp3/1P2P1n1/P3BN2/2Q1BPPP/RN3RK1 w - - 2 15
The above FEN is a position where, after 1 move, a certain phenomenon is possible: Capture Train. It’s the case where a series of noisy positions (positions with capture moves available) come one after another. In simpler words, a position is reached where the opponent or we can capture a piece, and then the other side can also capture a piece the next turn, and so forth, for numerous moves (without a gap in between). Eventually, a quiet position (where no more captures are possible) will be reached, where no more capture moves are available for that position. The above FEN, after the 1 move, has a capture train which is at least 17 ply deep.
Oh, cool. But how is that relevant?
A popular search feature that almost every strong chess engine (pretty much every strong alpha-beta chess engine) has is called Quiescence Search (QSearch). It deals with the Horizon Effect, which Sebastian discussed in one of his videos (if my memory serves me right).
A capture train is also popularly regarded as a QSearch Explosion, where QSearch goes extremely deep to ensure it misses nothing over the horizon. With the current move generation speed (heap allocation & everything), even a Depth 1 Alpha-beta Search, which uses QSearch to handle the horizon effect, does not finish within 800ms (on my i9-11900H).
Surely, this is a serious issue and should be resolved.
To add:
Over 16GB heap allocation in just a single game.
Issue Analytics
- State:
- Created 2 months ago
- Reactions:11
- Comments:13 (1 by maintainers)
Top GitHub Comments
I don’t entirely agree. This issue doesn’t present an interesting challenge to the programmer, it just makes some reasonable approaches unviable for reasons outside their control. If we had the space to implement this stuff ourselves it wouldn’t be so much of a problem, but we simply don’t.
I have added board.GetLegalMovesNonAlloc as an alternate way to get moves in V1.13 (see docs for more info).