question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support fields that are sometimes arrays

See original GitHub issue

Using spark-1.6.2-bin-hadoop2.6, ES-Hadoop 2.4.0 and pyspark.

Somewhere in my dataset there are fields that are normally arrays, but occasionally aren’t - see below for an example. In real life, I think one offender is the logstash geoip filter generating the location field which is normally an array, but in some error cases isn’t.

If I set es_read_field_as_array_include I get scala.MatchError: jim, if I don’t set it I get scala.MatchError: Buffer(tom, peter). Therefore I have no way of reading the field, and can only process the index if I blacklist the offending fields.

Also the stacktrace doesn’t say what field it’s having the issue with, which makes debugging very hard.

curl -XPOST localhost:9200/sparktest/spark -d '{"friends":"jim"}'
curl -XPOST localhost:9200/sparktest/spark -d '{"friends":["tom","peter"]}'
./bin/pyspark --driver-class-path=../elasticsearch-hadoop-2.4.0/dist/elasticsearch-hadoop-2.4.0.jar
>>> df = sqlContext.read.format("org.elasticsearch.spark.sql").options(es_read_field_as_array_include="friends").load("sparktest/spark")
>>> df.show()

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:8
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
aalkilanicommented, May 22, 2017

Any updates on this?

0reactions
CQUTWangHongcommented, Dec 26, 2017

When I print dataframe’s schema df.printSchema() It only show

root
 |-- string_field: string (nullable = false)

No display array_field that means “es.read.field.as.array.include” -> “array_field” doesn’t work

Read more comments on GitHub >

github_iconTop Results From Across the Web

JSON Objects with Field that is sometimes an Array
Problem:Sometimes an Access Request Target is a Single Target, Sometimes it is an array. Question: How can I make Jackson deserialize to ...
Read more >
Structure Arrays - MATLAB & Simulink - MathWorks
Create a structure array and store data in its fields. ... An array of structures is sometimes referred to as a struct array....
Read more >
Term query does not support array of values - Opster
Term query returns documents that contain an exact term in a provided field. You cannot use term query to search an array of...
Read more >
Work with arrays | BigQuery - Google Cloud
You can construct arrays of simple data types, such as INT64 , and complex data types, such as STRUCT s. The current exception...
Read more >
4. Pointers and Arrays - Understanding and Using C ... - O'Reilly
An array is a fundamental data structure built into C. A thorough understanding of arrays and their use is necessary to develop effective...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found