Dots in field names exception
See original GitHub issuespark-1.6.2-bin-hadoop2.6, elasticsearch-5.0.0-beta1, elasticsearch-hadoop-5.0.0-beta1
curl -XPOST localhost:9200/test4/test -d '{"b":0,"e":{"f.g":"hello"}}'
./bin/pyspark --driver-class-path=../elasticsearch-hadoop-5.0.0-beta1/dist/elasticsearch-hadoop-5.0.0-beta1.jar
>>> df1 = sqlContext.read.format("org.elasticsearch.spark.sql").load("test4/test")
>>> df1.printSchema()
root
|-- b: long (nullable = true)
|-- e: struct (nullable = true)
| |-- f: struct (nullable = true)
| | |-- g: string (nullable = true)
>>> df1.show()
---8<--- snip ---8<---
org.elasticsearch.hadoop.EsHadoopIllegalStateException: Position for 'e.f.g' not found in row; typically this is caused by a mapping inconsistency
at org.elasticsearch.spark.sql.RowValueReader$class.addToBuffer(RowValueReader.scala:45)
at org.elasticsearch.spark.sql.ScalaRowValueReader.addToBuffer(ScalaEsRowValueReader.scala:14)
at org.elasticsearch.spark.sql.ScalaRowValueReader.addToMap(ScalaEsRowValueReader.scala:94)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:806)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:696)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:466)
at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:391)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:286)
at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:259)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:365)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Issue Analytics
- State:
- Created 7 years ago
- Reactions:8
- Comments:21 (8 by maintainers)
Top Results From Across the Web
why dots in field names are illegal?? - Google Groups
Using dots in any fields names in most DBs is kinda bad practice really, you shuld only use lower case alphanumeric separated by...
Read more >Avoiding Dots / Periods in PySpark Column Names
Dots / periods in PySpark column names need to be escaped with backticks which is tedious and error-prone. This blog post explains the ......
Read more >How to use dot in field name? - mongodb - Stack Overflow
Because of this special dot symbol mean you cannot use it in field names. Like you cannot use dot symbol in identifiers in...
Read more >Support for dots in field names - HCL Product Documentation
You cannot use a field name with a dot in it in a query or operation. HCL OneDB ignores the field. The query...
Read more >Field name cannot contain '.' - Logstash - Elastic Discuss
I've been looking in the logs more and I see that any field that contains a dot in the name is not parsed...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am going to go ahead and re-open this since it seems like this “problem” of dots in field names is less of a “problem” and more just where things are trending toward in the data integration space. It would be unwise of us to ignore this issue given recent developments across existing solutions.
That said, this issue is not an easy fix and requires some adjusting of invariants that we have treated very carefully over the years - most notably that
_source
is sacred and should only be changed judiciously. Additionally, document update logic likely will need looking at (just try running a partial document update using normalized JSON in the request against a document containing dotted field names).But then it will be needed to convert all related SIEM functionality in Elastic Security then, including rules, detections, alerts, .siem-signals index, etc., etc.