Support fields that are sometimes arrays
See original GitHub issueUsing spark-1.6.2-bin-hadoop2.6, ES-Hadoop 2.4.0 and pyspark.
Somewhere in my dataset there are fields that are normally arrays, but occasionally aren’t - see below for an example. In real life, I think one offender is the logstash geoip filter generating the location field which is normally an array, but in some error cases isn’t.
If I set es_read_field_as_array_include I get scala.MatchError: jim
, if I don’t set it I get scala.MatchError: Buffer(tom, peter)
. Therefore I have no way of reading the field, and can only process the index if I blacklist the offending fields.
Also the stacktrace doesn’t say what field it’s having the issue with, which makes debugging very hard.
curl -XPOST localhost:9200/sparktest/spark -d '{"friends":"jim"}'
curl -XPOST localhost:9200/sparktest/spark -d '{"friends":["tom","peter"]}'
./bin/pyspark --driver-class-path=../elasticsearch-hadoop-2.4.0/dist/elasticsearch-hadoop-2.4.0.jar
>>> df = sqlContext.read.format("org.elasticsearch.spark.sql").options(es_read_field_as_array_include="friends").load("sparktest/spark")
>>> df.show()
Issue Analytics
- State:
- Created 7 years ago
- Reactions:8
- Comments:8 (3 by maintainers)
Top Results From Across the Web
JSON Objects with Field that is sometimes an Array
Problem:Sometimes an Access Request Target is a Single Target, Sometimes it is an array. Question: How can I make Jackson deserialize to ...
Read more >Structure Arrays - MATLAB & Simulink - MathWorks
Create a structure array and store data in its fields. ... An array of structures is sometimes referred to as a struct array....
Read more >Term query does not support array of values - Opster
Term query returns documents that contain an exact term in a provided field. You cannot use term query to search an array of...
Read more >Work with arrays | BigQuery - Google Cloud
You can construct arrays of simple data types, such as INT64 , and complex data types, such as STRUCT s. The current exception...
Read more >4. Pointers and Arrays - Understanding and Using C ... - O'Reilly
An array is a fundamental data structure built into C. A thorough understanding of arrays and their use is necessary to develop effective...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Any updates on this?
When I print dataframe’s schema
df.printSchema()
It only showNo display array_field that means “es.read.field.as.array.include” -> “array_field” doesn’t work