Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dots in field names exception

See original GitHub issue

spark-1.6.2-bin-hadoop2.6, elasticsearch-5.0.0-beta1, elasticsearch-hadoop-5.0.0-beta1

curl -XPOST localhost:9200/test4/test -d '{"b":0,"e":{"f.g":"hello"}}'
./bin/pyspark --driver-class-path=../elasticsearch-hadoop-5.0.0-beta1/dist/elasticsearch-hadoop-5.0.0-beta1.jar
>>> df1 = sqlContext.read.format("org.elasticsearch.spark.sql").load("test4/test")
>>> df1.printSchema()
root
 |-- b: long (nullable = true)
 |-- e: struct (nullable = true)
 |    |-- f: struct (nullable = true)
 |    |    |-- g: string (nullable = true)

>>> df1.show()
---8<--- snip ---8<--- 
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'e.f.g' not found in row; typically this is caused by a mapping inconsistency
    at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:45)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:14)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:94)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:466)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:391)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:286)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:259)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:365)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Issue Analytics

State:
Created 7 years ago
Reactions:8
Comments:21 (8 by maintainers)

Top GitHub Comments

3reactions

jbaieracommented, May 12, 2022

I am going to go ahead and re-open this since it seems like this “problem” of dots in field names is less of a “problem” and more just where things are trending toward in the data integration space. It would be unwise of us to ignore this issue given recent developments across existing solutions.

That said, this issue is not an easy fix and requires some adjusting of invariants that we have treated very carefully over the years - most notably that _source is sacred and should only be changed judiciously. Additionally, document update logic likely will need looking at (just try running a partial document update using normalized JSON in the request against a document containing dotted field names).

0reactions

asapegincommented, Feb 17, 2022

But then it will be needed to convert all related SIEM functionality in Elastic Security then, including rules, detections, alerts, .siem-signals index, etc., etc.