question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NEST removes null value from after_key Dictionary when aggregating

See original GitHub issue

We have a piece of code to do Composite Aggregation on our data, and in it we’re running it on two fields with missing_bucket set to true.

Our issue is that when one of the fields becomes null in the data, the after_key is serialized incorrectly on the next request.

Note: At the bottom. There is an absolute minimal reproduction.

Our code (boiled down):

static void Main(string[] args)
{
    IConnectionPool pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    IConnection connection = new HttpConnection();

    ConnectionSettings connSettings = new ConnectionSettings(pool, connection);
    connSettings.ThrowExceptions();
    connSettings.DisableDirectStreaming();

    ElasticClient client = new ElasticClient(connSettings);

    // Grouping
    SearchRequest<JObject> search = new SearchRequest<JObject>("some_index", "_doc");
    search.Size = 0;

    List<ICompositeAggregationSource> aggregateList = new List<ICompositeAggregationSource>();
    aggregateList.Add(new TermsCompositeAggregationSource("1")
    {
        Field = "PropertyA.keyword",
        MissingBucket = true
    });
    aggregateList.Add(new TermsCompositeAggregationSource("2")
    {
        Field = "PropertyB.keyword",
        MissingBucket = true
    });

    CompositeAggregation compositeAggregation = new CompositeAggregation("composite")
    {
        Sources = aggregateList
    };

    search.Aggregations = compositeAggregation;

    while (true)
    {
        int pageSize = 10; // We use 1000, 10 is for testing
        compositeAggregation.Size = pageSize;

        ISearchResponse<JObject> result = client.Search<JObject>(search);

        BucketAggregate aggA = (BucketAggregate)result.Aggregations["composite"];

        if (!aggA.Items.Any())
            break;

        // Prepare next request
        // This is what fails the next round
        compositeAggregation.After = aggA.AfterKey;

       // .. work with data ..
    }
}

In the above, ES fails our second (or some subsequent request) with:

DebugInformation

# FailureReason: BadResponse while attempting POST on http://localhost:9200/some_index/_doc/_search?typed_keys=true
# Audit trail of this API call:
 - [1] BadResponse: Node: http://localhost:9200/ Took: 00:00:00.0494512
# OriginalException: Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: POST /some_index/_doc/_search?typed_keys=true. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2"""
   at Elasticsearch.Net.Transport`1.HandleElasticsearchClientException(RequestData data, Exception clientException, IElasticsearchResponse response)
   at Elasticsearch.Net.Transport`1.FinalizeResponse[TResponse](RequestData requestData, IRequestPipeline pipeline, List`1 seenExceptions, TResponse response)
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.LowLevelDispatch.SearchDispatch[TResponse](IRequest`1 p, SerializableData`1 body)
   at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[TRequest,TQueryString,TResponse](TRequest request, Func`3 responseGenerator, Func`3 dispatch)
   at ConsoleApp10.Program.Main(String[] args) in C:\Users\MichaelBisbjerg\source\repos\ConsoleApp10\ConsoleApp10\Program.cs:line 89
# Request:
{
	"aggs": {
		"composite": {
			"composite": {
				"after": {
					"1": "value1"
				},
				"size": 10,
				"sources": [{
						"1": {
							"terms": {
								"field": "PropertyA.keyword",
								"missing_bucket": true
							}
						}
					}, {
						"2": {
							"terms": {
								"field": "PropertyB.keyword",
								"missing_bucket": true
							}
						}
					}
				]
			}
		}
	},
	"size": 0
}
# Response:
{
	"error": {
		"root_cause": [{
				"type": "illegal_argument_exception",
				"reason": "[after] has 1 value(s) but [sources] has 2"
			}
		],
		"type": "search_phase_execution_exception",
		"reason": "all shards failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [{
				"shard": 0,
				"index": "some_index",
				"node": "Z7iIXKGMQZSRN6MDZ1h3Jg",
				"reason": {
					"type": "illegal_argument_exception",
					"reason": "[after] has 1 value(s) but [sources] has 2"
				}
			}
		],
		"caused_by": {
			"type": "illegal_argument_exception",
			"reason": "[after] has 1 value(s) but [sources] has 2",
			"caused_by": {
				"type": "illegal_argument_exception",
				"reason": "[after] has 1 value(s) but [sources] has 2"
			}
		}
	},
	"status": 400
}
# Exception:
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: POST /some_index/_doc/_search?typed_keys=true. ServerError: Type: search_phase_execution_exception Reason: "all shards failed" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2" CausedBy: "Type: illegal_argument_exception Reason: "[after] has 1 value(s) but [sources] has 2"""
   at Elasticsearch.Net.Transport`1.HandleElasticsearchClientException(RequestData data, Exception clientException, IElasticsearchResponse response)
   at Elasticsearch.Net.Transport`1.FinalizeResponse[TResponse](RequestData requestData, IRequestPipeline pipeline, List`1 seenExceptions, TResponse response)
   at Elasticsearch.Net.Transport`1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters)
   at Nest.LowLevelDispatch.SearchDispatch[TResponse](IRequest`1 p, SerializableData`1 body)
   at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[TRequest,TQueryString,TResponse](TRequest request, Func`3 responseGenerator, Func`3 dispatch)
   at ConsoleApp10.Program.Main(String[] args) in C:\Users\MichaelBisbjerg\source\repos\ConsoleApp10\ConsoleApp10\Program.cs:line 89

When debugging, I clearly see that the aggA.AfterKey is a dictionary consisting of two values, but when it’s sent to ES again, it’s only with one.

I’ve reproduced the issue further, with just the serializer, using this code:

 void ReproduceSerializer()
{
    IConnectionPool pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    ConnectionSettings connSettings = new ConnectionSettings(pool);

    ElasticClient client = new ElasticClient(connSettings);

    using (MemoryStream ms = new MemoryStream())
    {
        Dictionary<string, object> dictionary = new Dictionary<string, object>
        {
            {"1", "C:\\" },
            {"2", null }
        };
        client.RequestResponseSerializer.Serialize(dictionary, ms);

        byte[] d = ms.ToArray();
        string p = Encoding.UTF8.GetString(d);

        /*
         Issue: "p" is just
         {
          "1": "C:\\"
         }

        Rather than:
         {
          "1": "C:\\",
          "2": null
         }
         */
    }
}

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
russcamcommented, Jun 6, 2019

I’ve merged in https://github.com/elastic/elasticsearch-net/pull/3800 to mark AfterKey as obsolete on BucketAggregate to discourage its usage, and introduced a CompositeAfterKey property which is of type CompositeKey and will honour null values when being passed into subsequent composite aggregation calls.

0reactions
russcamcommented, Jun 6, 2019

So instead of (BucketAggregate)result.Aggregations[“composite”];, I do result.Aggregations.Composite(“composite”);

Yes, this will work. The first way should be discouraged because BucketAggregate is an intermediate type used internally to hold the data for a number of different aggregations. At the very least, AfterKey on BucketAggregate should be of type CompositeKey. Will open a PR now to fix this in 7.x, and a PR to obsolete it in 6.x, and use a different property.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nest aggregation results are null however there are data in ...
I'm working on aggregations in NEST, so far everything has worked well, but now when I try to access nested fields through .children...
Read more >
Null values in AfterKey of Composite Aggregation are ...
Hi guys, I'm using Composite Aggregation to summarize data. Since some fields may be missing, I set missing_bucket = true, so the returned ......
Read more >
Terms aggregation | Elasticsearch Guide [8.9]
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value. Example:.
Read more >
removing null values from a dictionary | /*code-comments*/
I recently was converting a Python dictionary to a JSON object to include in the body of a POST request. Unfortunately, this triggered...
Read more >
Null Values in Aggregate Functions
You can choose to treat null values in aggregate functions as NULL or zero. By default, the Integration Service treats null values as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found