question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JSON deserialization fails "sometimes" without notice

See original GitHub issue

Bug Report

When querying our ES indices from es-hadoop we in some cases get empty result documents.

Issue description

We tracked the problem down to the deserialization of the documents. In some cases it fails (for the same document) leaving us with an empty document.

Steps to reproduce

A simple test program queries the same document a hundred times with and without deserialization and checks for a specific field we know it’s contained within the document.

Code:

import org.elasticsearch.spark._

import org.apache.log4j.Level
import org.apache.log4j.LogManager

val options = com.freiheit.ev.ElasticSearchOptions.getOptions() // nodes, credentials
val optionsWithJson = scala.collection.mutable.Map[String, String](options.toSeq: _*)

LogManager.getRootLogger.setLevel(Level.ERROR)

var broken = 0
optionsWithJson("es.output.json") = "false"

for (i <- 1 to 100) {
   val result = sc.esRDD("euv_landingpages*/acquisitionLead", "?q=_id:6947a956c1f2f675a9e3acd0e714dc97c81beb5c8bc2ed6ce151c8598b2fea30",optionsWithJson).collect()
   if ( !result(0)._2.contains("sessionid") )
      broken+=1
}
LogManager.getRootLogger().error("broken with parsing: " + broken)

optionsWithJson("es.output.json") = "true"
broken = 0

for (i <- 1 to 100) {
   val result = sc.esRDD("euv_landingpages*/acquisitionLead", "?q=_id:6947a956c1f2f675a9e3acd0e714dc97c81beb5c8bc2ed6ce151c8598b2fea30",optionsWithJson).collect()
   if ( !result(0)._2.asInstanceOf[String].contains("sessionid") )
      broken+=1
}

LogManager.getRootLogger().error("broken without parsing: " + broken)

Output:

16/09/22 12:29:20 ERROR root: broken with parsing: 44
16/09/22 12:29:41 ERROR root: broken without parsing: 0

Example document:

curl -k "https://xyz:9200/euv_landingpages*/acquisitionLead/_search?q=_id:6947a956c1f2f675a9e3acd0e714dc97c81beb5c8bc2ed6ce151c8598b2fea30&pretty"
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 60,
    "successful" : 60,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "euv_landingpages-2016-03",
      "_type" : "acquisitionLead",
      "_id" : "6947a956c1f2f675a9e3acd0e714dc97c81beb5c8bc2ed6ce151c8598b2fea30",
      "_score" : 1.0,
      "_source" : {
        "logFilename" : "tracking.2016-03-18-16.log",
        "useragent" : "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36",
        "rawTimestamp" : 1458316388123,
        "sessionid" : "15b423df-49be-4e51-b4a6-d5886e40969d",
        "timestamp" : "2016-03-18T16:53:08.123+0100",
        "index" : "2016-03",
        "payload" : {
          "platform" : "webcms",
          "submitStatus" : "SUCCESS",
          "pagePath" : "bogota/offer-a-property/",
          "country" : "CO",
          "landingpagePath" : "bogota/",
          "hash" : "s6HWSMhrVSyNMuiw3O91/g==",
          "pageType" : "aquisitionpage",
          "contactFieldsSet" : "FN|LN|ST|SN|CI|PH|EM|ME|",
          "language" : "EN"
        },
        "eventType" : "acquisitionLead",
        "id" : "6947a956c1f2f675a9e3acd0e714dc97c81beb5c8bc2ed6ce151c8598b2fea30"
      }
    } ]
  }
}

Corresponding mapping:

curl -k "https://xyz:9200/euv_landingpages*/acquisitionLead/_mapping?pretty"
{
  "euv_landingpages-2016-06" : {
    "mappings" : {
      "acquisitionLead" : {
        "properties" : {
          "eventType" : {
            "type" : "string"
          },
          "id" : {
            "type" : "string"
          },
          "index" : {
            "type" : "string"
          },
          "logFilename" : {
            "type" : "string"
          },
          "payload" : {
            "properties" : {
              "condition" : {
                "type" : "string"
              },
              "constructionYear" : {
                "type" : "string"
              },
              "contactFieldsSet" : {
                "type" : "string"
              },
              "country" : {
                "type" : "string"
              },
              "hash" : {
                "type" : "string"
              },
              "imagesUploaded" : {
                "type" : "boolean"
              },
              "landingpagePath" : {
                "type" : "string"
              },
              "language" : {
                "type" : "string"
              },
              "livingArea" : {
                "type" : "string"
              },
              "offertype" : {
                "type" : "string"
              },
              "pagePath" : {
                "type" : "string"
              },
              "pageType" : {
                "type" : "string"
              },
              "platform" : {
                "type" : "string"
              },
              "plotSize" : {
                "type" : "string"
              },
              "referer" : {
                "type" : "string"
              },
              "selectedPropertyType" : {
                "type" : "string"
              },
              "submitStatus" : {
                "type" : "string"
              }
            }
          },
          "rawTimestamp" : {
            "type" : "long"
          },
          "sessionid" : {
            "type" : "string"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "useragent" : {
            "type" : "string"
          }
        }
      }
    }
  }, ...other indices...

Version Info

OS : Debian GNU/Linux 8.4 (jessie) JVM : Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.8.0_91) Spark : 1.6.2 ES-Hadoop : 2.3.4 ES : 2.3.4

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
anophelescommented, Oct 6, 2016

@sebastianmueller and I further analyzed this issue an came to the conclusion that #648 is responsible for this behavior, i.e. fields that are not mapped are ignored by default. Interestingly, we were pretty confident, that our mappings were (at least partly) correct.

TL;DR: Settings .set(“es.read.unmapped.fields.ignore”, “false”) on the SparkConf imports all mappings, even if not defined in your indexes’ mappings.

I went into the rabbit hole and checked to see were exactly the mapping was read from. This seems to happen here: https://github.com/elastic/elasticsearch-hadoop/blob/dd1d6036fa9ee21d9a3082226a152d40cce9ede7/mr/src/main/java/org/elasticsearch/hadoop/rest/RestService.java#L266

Which in turn calls https://github.com/elastic/elasticsearch-hadoop/blob/dd1d6036fa9ee21d9a3082226a152d40cce9ede7/mr/src/main/java/org/elasticsearch/hadoop/rest/RestRepository.java#L436

IMHO the bug lies within the function Field.parseField(). The return value of client.getMapping(resourceR.mapping()) seems fine. We get a map from index-name to the respective mappings. (We access the resource over an alias: euv_landingpages fans out to multiple indexes)

Furthermore Field.parseField() returns ONLY the first type (!!!) of the returned mappings, as seen in here https://github.com/elastic/elasticsearch-hadoop/blob/dd1d6036fa9ee21d9a3082226a152d40cce9ede7/mr/src/main/java/org/elasticsearch/hadoop/serialization/dto/mapping/Field.java#L77

The extraction of the first type is undeterministic since elasticsearch returns the mappings without a specific order, resulting in the observed behavior (sometimes a valid map is produced and sometimes not).

I hope that my analysis can further help you with fixing this bug.

0reactions
jbaieracommented, Mar 28, 2017

Thanks to everyone who has posted super helpful analyses on this to help narrow the issue down. I’m going to close this ticket in favor of #938. If there are any more developments or feedback for this, please post it there to keep it consolidated. Cheers!

Read more comments on GitHub >

github_iconTop Results From Across the Web

c# - Randoms failures on json.net deserialization through ...
I'm using Json.NET for deserializing json files sent by the API server on an iOS client (with Monotouch). I have a really weird...
Read more >
JSON Deserialization behaviour regarding missing properties
Hi,. I'm using System.Text.Json.JsonSerializer.Serialize in a .NET 5 C# application to save the application state to a json file.
Read more >
Failed to deserialize JSON - OutSystems
JSON. Hello,. I'm trying to open a GeoJSON to test in Outsystems from the following link: ... i set up as a REST...
Read more >
Protect yourself when deserializing - System.Text.Json
When dealing with deserialization of JSON, it's always a good idea to validate that it infact deserialized correctly.
Read more >
How to Deserialize JSON Into Dynamic Object in C# - Code ...
For example, cherry-picking a small portion of JSON data, dealing with external JSON data whose structure is largely unknown or changes very ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found