Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL

See original GitHub issue

Create an index type with a mapping consisting of a field of type geo_shape.
Create an RDD[String] containing a polygon as GeoJSON, as the value of a field whose name matches the mapping: """{"rect":{"type":"Polygon","coordinates":[[[50,32],[69,32],[69,50],[50,50],[50,32]]],"crs":null}}"""
Write to an index type in Elasticsearch: rdd1.saveJsonToEs(indexName+"/"+indexType, connectorConfig)
Read into SparkSQL DataFrame with either esDF or read-format-load:
- sqlContext.esDF(indexName+"/"+indexType, connectorConfig)
- sqlContext.read.format("org.elasticsearch.spark.sql").options(connectorConfig).load(indexName+"/"+indexType)

Result is: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field ‘rect’ not found; typically this occurs with arrays which are not mapped as single value Full stack trace in gist. Elasticsearch Hadoop v2.1.2

Issue Analytics

State:
Created 8 years ago
Comments:46 (25 by maintainers)

Top GitHub Comments

1reaction

Bomb281993commented, Oct 31, 2018

Hi @costin , through spark java we are also facing issues while pushing geo-shape to elastic search index . It is giving error message - Failed to parse

1reaction

costincommented, Jan 8, 2016

@randallwhitman Hi,

I’ve taken a closer look at this and it’s a bit more complicated. Fixable but it’s not as easy as I thought. The major issue with SparkSQL is that it requires a strict schema before loading any data so the connector can only rely on the mapping to provide it. However the underlying data (due to the flexibility of JSON) can be quite… loose which trips Spark and/or the connector as it doesn’t fit exactly into the schema.

First off, field “crs” is null meaning it is not mapped - there’s no type information associated with it and thus, no mapping. So the connector doesn’t even see it when looking at the mapping so when it encounters it in the _source, it doesn’t know what to do with it. This needs to be fixed - currently I’ve added a better exception message and raised #648 Second, the mapping information is incomplete for Spark SQL requirements. For example coordinates is a field of type long. Is it a primitive or an array? We don’t know before hand. One can indicate that it’s an array through the newly introduced es.read.field.as.array.include/exclude (ES 2.2 only). However this is not enough, as the array depth is unknown. The connector is told that this field is an array but is it [long], [[long]], [[[long]]] and so on? I’ve raised yet another issue for this, namely #650.

Top Results From Across the Web

Elasticsearch for Apache Hadoop 2.2.0 | Elastic

MatchError for list type #617; EsHadoopIllegalStateException reading Geo-Shape into DataFrame - SparkSQL #607; es.mapping.exclude doesn't work for Hive #595 ...

Spark SQL, DataFrames and Datasets Guide

DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs....

| notebook.community

A gallery of the most interesting jupyter notebooks online.

Reading Nested data from ElasticSearch via Spark Scala

Keep in my that ES is schemaless, and SparkSQL has a "hard" schema. ... go back to Dataset / Dataframe once the schema...

Spark Read CSV file into DataFrame

Spark SQL provides spark.read.csv("path") to read a CSV file into Spark DataFrame and dataframe.write.csv("path") to save or write to.