question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to use copy from (fails with CircuitBreakingException)

See original GitHub issue

CrateDB version

4.6.4

CrateDB setup information

26 data nodes with 60GB heap

The data nodes shared a NFS mount where the data we plan to import resides.

Our refresh_interval is set to -1

Steps to Reproduce

We’re attempting to transition from Crate 4.1.3 to 4.6.4 and are running the copy from command to import the data.

The command we’re running:

crash -c "copy mytable from 'file:///path/to/dir/*' with (shared=true, compression='gzip', format='json')"

We were originally setting our heap to 30gb but increased it hoping that would help.

When we increased the heap to 60gb, we also updated the settings below:

indices.breaker.query.limit: 90%
indices.breaker.request.limit: 85%

Expected Result

We expect the process to work as intended

Actual Result

The error we receive:

CircuitBreakingException[[query] Data too large, data for [copyFrom: 0] would be [57984155648/54gb], which is larger than the limit of [57982058496/54gb]]

We do have some shards that are ~90GB

Assuming the files are json new-line delimited, maybe the bulk operation is trying to do too much? If they aren’t new-line delimited, then I suppose we’d need to make these machines beefier and allocate more heap for the import process?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bputt-ecommented, Oct 5, 2021

bputt is my better half in case you’re wondering who that is

0reactions
mfusseneggercommented, Oct 27, 2021

Thx - we’ll try to reproduce and come up with a fix for the root cause.

Meanwhile you could also try the num_readers option with some value lower than 26. copy mytable from 'file:///path/to/dir/*' with (shared=true, compression='gzip', format='json', num_readers = 10) or something like that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"[circuit_breaking_exception] [parent]" Data too large, data for ...
A common technique to improve search speed over multiple fields is to copy their values into a single field at index time, and...
Read more >
CircuitBreakingException occurs without plausible reasons
So I reran the query today a bunch of times and queried the marvel reported node_stats for the failing nodes. The query -....
Read more >
Elasticsearch unassigned shards CircuitBreakingException ...
I got alert stating elasticsearch has 2 unassigned shards. I made below api calls to gather more details.
Read more >
Fixing "[circuit_breaking_exception] [parent] Data too large ...
This specific error is the "[circuit_breaking_exception] [parent] Data too large, data for [<http_request>]" . It is not directly visible where ...
Read more >
Elasticsearch issue: Unassigned shards because of max_retry
Sometimes, source node fails to copy. In my case, the fail was, the source node had insufficient memory to process my allocation request....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found