EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL
See original GitHub issue- Create an index type with a mapping consisting of a field of type geo_shape.
- Create an
RDD[String]
containing a polygon as GeoJSON, as the value of a field whose name matches the mapping:"""{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
- Write to an index type in Elasticsearch:
rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
- Read into SparkSQL DataFrame with either
esDF
orread
-format
-load
:sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)
Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field ‘rect’ not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2
Issue Analytics
- State:
- Created 8 years ago
- Comments:46 (25 by maintainers)
Top Results From Across the Web
Elasticsearch for Apache Hadoop 2.2.0 | Elastic
MatchError for list type #617; EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL #607; es.mapping.exclude doesn't work for Hive #595 ...
Read more >Spark SQL, DataFrames and Datasets Guide
DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs....
Read more >Reading Nested data from ElasticSearch via Spark Scala
Keep in my that ES is schemaless, and SparkSQL has a "hard" schema. ... go back to Dataset / Dataframe once the schema...
Read more >Spark Read CSV file into DataFrame
Spark SQL provides spark.read.csv("path") to read a CSV file into Spark DataFrame and dataframe.write.csv("path") to save or write to.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index . It is giving error message - Failed to parse
@randallwhitman Hi,
I’ve taken a closer look at this and it’s a bit more complicated. Fixable but it’s not as easy as I thought. The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite… loose which trips Spark and/or the connector as it doesn’t fit exactly into the schema.
First off, field “crs” is null meaning it is not mapped - there’s no type information associated with it and thus, no mapping. So the connector doesn’t even see it when looking at the mapping so when it encounters it in the
_source
, it doesn’t know what to do with it. This needs to be fixed - currently I’ve added a better exception message and raised #648 Second, the mapping information is incomplete for Spark SQL requirements. For examplecoordinates
is a field of typelong
. Is it a primitive or an array? We don’t know before hand. One can indicate that it’s an array through the newly introducedes.read.field.as.array.include/exclude
(ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it[long]
,[[long]]
,[[[long]]]
and so on? I’ve raised yet another issue for this, namely #650.