question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL

See original GitHub issue
  1. Create an index type with a mapping consisting of a field of type geo_shape.
  2. Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping: """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
  3. Write to an index type in Elasticsearch: rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
  4. Read into SparkSQL DataFrame with either esDF or read-format-load:
    • sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
    • sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field ‘rect’ not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:46 (25 by maintainers)

github_iconTop GitHub Comments

1reaction
Bomb281993commented, Oct 31, 2018

Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index . It is giving error message - Failed to parse

1reaction
costincommented, Jan 8, 2016

@randallwhitman Hi,

I’ve taken a closer look at this and it’s a bit more complicated. Fixable but it’s not as easy as I thought. The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite… loose which trips Spark and/or the connector as it doesn’t fit exactly into the schema.

First off, field “crs” is null meaning it is not mapped - there’s no type information associated with it and thus, no mapping. So the connector doesn’t even see it when looking at the mapping so when it encounters it in the _source, it doesn’t know what to do with it. This needs to be fixed - currently I’ve added a better exception message and raised #648 Second, the mapping information is incomplete for Spark SQL requirements. For example coordinates is a field of type long. Is it a primitive or an array? We don’t know before hand. One can indicate that it’s an array through the newly introduced es.read.field.as.array.include/exclude (ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it [long], [[long]], [[[long]]] and so on? I’ve raised yet another issue for this, namely #650.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Elasticsearch for Apache Hadoop 2.2.0 | Elastic
MatchError for list type #617; EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL #607; es.mapping.exclude doesn't work for Hive #595 ...
Read more >
Spark SQL, DataFrames and Datasets Guide
DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs....
Read more >
| notebook.community
A gallery of the most interesting jupyter notebooks online.
Read more >
Reading Nested data from ElasticSearch via Spark Scala
Keep in my that ES is schemaless, and SparkSQL has a "hard" schema. ... go back to Dataset / Dataframe once the schema...
Read more >
Spark Read CSV file into DataFrame
Spark SQL provides spark.read.csv("path") to read a CSV file into Spark DataFrame and dataframe.write.csv("path") to save or write to.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found