Unable to use copy from (fails with CircuitBreakingException)
See original GitHub issueCrateDB version
4.6.4
CrateDB setup information
26 data nodes with 60GB heap
The data nodes shared a NFS mount where the data we plan to import resides.
Our refresh_interval is set to -1
Steps to Reproduce
We’re attempting to transition from Crate 4.1.3 to 4.6.4 and are running the copy from command to import the data.
The command we’re running:
crash -c "copy mytable from 'file:///path/to/dir/*' with (shared=true, compression='gzip', format='json')"
We were originally setting our heap to 30gb but increased it hoping that would help.
When we increased the heap to 60gb, we also updated the settings below:
indices.breaker.query.limit: 90%
indices.breaker.request.limit: 85%
Expected Result
We expect the process to work as intended
Actual Result
The error we receive:
CircuitBreakingException[[query] Data too large, data for [copyFrom: 0] would be [57984155648/54gb], which is larger than the limit of [57982058496/54gb]]
We do have some shards that are ~90GB
Assuming the files are json new-line delimited, maybe the bulk operation is trying to do too much? If they aren’t new-line delimited, then I suppose we’d need to make these machines beefier and allocate more heap for the import process?
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
bputt is my better half in case you’re wondering who that is
Thx - we’ll try to reproduce and come up with a fix for the root cause.
Meanwhile you could also try the
num_readers
option with some value lower than 26.copy mytable from 'file:///path/to/dir/*' with (shared=true, compression='gzip', format='json', num_readers = 10)
or something like that.