[Bug] plots beyond ~4400 = harvester 100.0 load, cache_hit: false, plots check hangs before challenges
See original GitHub issueWhat happened?
Noted that for the last few releases, chia_harvester was pegging a thread continuously while farming.
Info:
- System has >20k plots direct attached. Single harvester.
- plot_refresh_callback completes in 15 seconds and proof checks are typically 0.4-1 sec.
- Aside from chia_harvester constantly pegging its thread, all else appears to function normally.
Elaboration:
- Reinstalled chia_blockchain from scratch, only importing keys and mainnet/wallet db’s. No change.
- Experimented with varying numbers of plots and noted that at below ~4400 plots, chia_harvester no longer pegs a thread (dropped to 0.0 load). Added 200 plots back and load jumped back to 100.0 indefinitely.
- Experimented with various harvester config settings (num_threads, parallel_reads, batch_size). No change.
- Noted that upon startup, and with >4400 plots, the found_plot messages from harvester transition from
cache_hit: True
tocache_hit: False
. - Also noted that attempting to run a
chia plots check
on any of the drives/plots withcache_hit: False
results in an indefinite hang of that check before it issues a single challenge. - Rewards are tracking for my total plot count (not 4400), so while the
cache_hit: False
causes high harvester CPU usage and inability to check those plots, they are still successfully farming.
Possible causes:
- This feels like high plot counts not playing nicely with plot_refresh / chia.plotting.cache, resulting in one of the harvester threads pegging indefinitely while attempting to cache some portion of plots over some maximum, and perhaps that same thread fails to respond to a plots check of those same plots?
Version
1.5.0
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output
No response
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:21 (6 by maintainers)
Top Results From Across the Web
How to fix 100% CPU Utilization Bug (When no processes are ...
For those who want to give a little thanks, I just set up a page over at buymeacoffee as a new way to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Okay so… turned out that the reason for all this are plots created via bladebit RAM plotter where the
DiskProver
serializes into 524.659 bytes which:uint32
->Value 5794656522 does not fit into uint32
while we seralize the length of the bytes.The reason why the
DiskProver
serializes into such a huge blob is that those plots seem to have 65.536C2
entries.Table pointers from a plot in question with
table_begin_pointers[10] - table_begin_pointers[9]
-> 262.144:Table pointers from a normally working plot with
table_begin_pointers[10] - table_begin_pointers[9]
-> 176:Im going to talk with @harold-b about this and will post an update once we figured this out.
It could still be a caching-related issue since it would create a new cache on the next startup (and the cache is then used while the harvester runs). Either way, we won’t know unless we can figure out a way to tell what those pegged harvester threads are doing.