Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CRITICAL PERFORMANCE ISSUE] Move List Allocation

See original GitHub issue

Chess Programming Background

I’m the lead developer of the C# StockNemo Chess Engine (~ 3.2K elo & TCEC Swiss Participant), as well as the lead developer of the (in-development) C++ rewrite StockDory (> 3.3K elo & TCEC Swiss Participant). Neither of those engines is a derivative or copy of Stockfish (Stockfish was an inspiration for the name).

Summary

As was expressed through Discord, Move Generation is far too slow with the current state of the API. Among other things, it does a heap allocation (for 218 moves) per traversed node in the tree. While other things can also be discussed, fixes for them aren’t as simple. On the contrary, for the mentioned problem, the fix is simple: Use C#'s stackalloc feature to allocate move lists on the stack (considerably faster allocation & doesn’t trigger the Garbage Collector). The code for the majority of developers would change…

From:

Move[] moves = board.GetLegalMoves();

To:

Span<Move> moves = stackalloc Move[218];
board.GetLegalMoves(ref moves);

Now, GetLegalMoves would accept a Span<Move> reference argument, inflating that stack-allocated span instead of allocating an array on the heap. As can be seen, this adds near zero complexity and is not hard to understand for most developers. It is understood that this project was aimed towards the novice side of the chess engine developing community; however, it’s important that correct practices are pushed forward rather than incorrect misconceptions.

Not to mention, the speed-up due to this would be quite insane. See PR: https://github.com/SebLague/Chess-Challenge/pull/190 (includes benchmarks)

It is a problem, not just a fair disadvantage:

So far, though, I’ve talked about this morally. However, it’s important to back it up with cases where it is a real problem:

Consider the FEN:

r2q1rk1/bppb1pp1/p2p2np/2PPp3/1P2P1n1/P3BN2/2Q1BPPP/RN3RK1 w - - 2 15

The above FEN is a position where, after 1 move, a certain phenomenon is possible: Capture Train. It’s the case where a series of noisy positions (positions with capture moves available) come one after another. In simpler words, a position is reached where the opponent or we can capture a piece, and then the other side can also capture a piece the next turn, and so forth, for numerous moves (without a gap in between). Eventually, a quiet position (where no more captures are possible) will be reached, where no more capture moves are available for that position. The above FEN, after the 1 move, has a capture train which is at least 17 ply deep.

Oh, cool. But how is that relevant?

A popular search feature that almost every strong chess engine (pretty much every strong alpha-beta chess engine) has is called Quiescence Search (QSearch). It deals with the Horizon Effect, which Sebastian discussed in one of his videos (if my memory serves me right).

A capture train is also popularly regarded as a QSearch Explosion, where QSearch goes extremely deep to ensure it misses nothing over the horizon. With the current move generation speed (heap allocation & everything), even a Depth 1 Alpha-beta Search, which uses QSearch to handle the horizon effect, does not finish within 800ms (on my i9-11900H).

Surely, this is a serious issue and should be resolved.

To add:

Over 16GB heap allocation in just a single game.

Issue Analytics

State:
Created 2 months ago
Reactions:11
Comments:13 (1 by maintainers)

Top GitHub Comments

6reactions

Eldriitchcommented, Jul 22, 2023

It’s only an “issue” if you are trying to make a “serious” engine, but that’s not possible with just 1024 tokens anyway. If the base program has issues then I think it’s up to you to work around them entirely within the confines of MyBot.cs; that is the challenge.

I feel that if we start digging into flaws and optimisations of the base program like this, it will never end and participants will have to deal with endless changes. You could even argue that making this in C# is a “serious issue” and that the whole thing needs to be made in C++

I think it’s best if we just stick with the current base program, warts and all(unless it’s a UI bug or something like that).

I don’t entirely agree. This issue doesn’t present an interesting challenge to the programmer, it just makes some reasonable approaches unviable for reasons outside their control. If we had the space to implement this stuff ourselves it wouldn’t be so much of a problem, but we simply don’t.

5reactions

SebLaguecommented, Jul 23, 2023

I have added board.GetLegalMovesNonAlloc as an alternate way to get moves in V1.13 (see docs for more info).

Top Results From Across the Web

Increasing performance via low memory allocation in C#

This post runs through some of the techniques we used for writing highly performant, low allocation code, including data streaming, list ...

performance - Memory Allocation/Deallocation Bottleneck?

Allocating and releasing memory in terms of performance are relatively costly operations. The calls in modern operating systems have to go all ...

How Memory Allocation Affects Performance in Multi- ...

If your application does not scale on new multiprocessor, multicore, multithread hardware, the problem might be lock contention in the memory allocator.

C# Performance tricks — Reducing heap allocations and ...

Here I will suggest a few strategies to overcome those “performance villains” and make your code faster but still nice to read. But...

Memory allocation errors can be caused by slow page file ...

Describes an issue that causes memory allocation errors that can be caused by slow page file growth.