question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scala.MatchError: Buffer(ACK)

See original GitHub issue
  • Bug report.

Issue description

I’m trying to read from Elasticsearch 5.6.0 with spark 2.1. I have the error message “scala.MatchError: Buffer(ACK) (of class scala.collection.convert.Wrappers$JListWrapper)”

Steps to reproduce

Code:

import org.elasticsearch.spark._
import org.elasticsearch.hadoop.mr.EsInputFormat

val sparkSession = SparkSession.builder()
   .config("es.nodes", "myserver")
   .config("es.port","9200")
   .appName("ES")
   .getOrCreate()

val query = """{ "query" : { "range" : { "@timestamp" : { "gte": "now-30m",  "lte" : "now"} } } }""" 
val dataES = sparkSession.read.format("org.elasticsearch.spark.sql")
 .option("es.query", {query}) 
 .load("netflow-2017.10.09")

dataES.count() ==> returns # ok
dataES.take(5) ==> error message

Print schema:

root
 |-- @timestamp: timestamp (nullable = true)
 |-- @version: string (nullable = true)
 |-- geoip: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- geoip_dst: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- geoip_src: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- host: string (nullable = true)
 |-- netflow: struct (nullable = true)
 |    |-- bytes: long (nullable = true)
 |    |-- direction: string (nullable = true)
 |    |-- dst_addr: string (nullable = true)
 |    |-- dst_as: integer (nullable = true)
 |    |-- dst_locality: string (nullable = true)
 |    |-- dst_mask_len: integer (nullable = true)
 |    |-- dst_port: string (nullable = true)
 |    |-- dst_port_name: string (nullable = true)
 |    |-- engine_id: integer (nullable = true)
 |    |-- engine_type: integer (nullable = true)
 |    |-- first_switched: timestamp (nullable = true)
 |    |-- flow_locality: string (nullable = true)
 |    |-- flow_records: integer (nullable = true)
 |    |-- flow_seq_num: long (nullable = true)
 |    |-- input_snmp: string (nullable = true)
 |    |-- ip_version: string (nullable = true)
 |    |-- last_switched: timestamp (nullable = true)
 |    |-- next_hop: string (nullable = true)
 |    |-- output_snmp: string (nullable = true)
 |    |-- packets: long (nullable = true)
 |    |-- protocol: string (nullable = true)
 |    |-- protocol_name: string (nullable = true)
 |    |-- sampling_algorithm: integer (nullable = true)
 |    |-- sampling_interval: integer (nullable = true)
 |    |-- src_addr: string (nullable = true)
 |    |-- src_as: integer (nullable = true)
 |    |-- src_locality: string (nullable = true)
 |    |-- src_mask_len: integer (nullable = true)
 |    |-- src_port: string (nullable = true)
 |    |-- src_port_name: string (nullable = true)
 |    |-- tcp_flag_tags: string (nullable = true)
 |    |-- tcp_flags: integer (nullable = true)
 |    |-- tcp_flags_label: string (nullable = true)
 |    |-- tos: string (nullable = true)
 |    |-- version: string (nullable = true)
 |    |-- vlan: string (nullable = true)
 |-- tags: string (nullable = true)
 |-- type: string (nullable = true)

Strack trace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): scala.MatchError: Buffer(FIN, SYN, ACK) (of class scala.collection.convert.Wrappers$JListWrapper)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:383)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:60)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:57)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
  ... 75 elided
Caused by: scala.MatchError: Buffer(FIN, SYN, ACK) (of class scala.collection.convert.Wrappers$JListWrapper)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:383)
  at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:60)
  at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:57)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:748)``` 

### Version Info

OS:         :  Linux centos
JVM         :  
Hadoop/Spark:  2.1 (scala 2.11.8)
ES-Hadoop   :  elasticsearch-spark-20_2.11 % 5.6.0"
ES          :  5.6.0

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jbaieracommented, Oct 9, 2017

@hupiv Please keep these sorts of questions on our forums. Github is reserved for confirmed bugs and feature tracking.

In this case you are loading a field which contains multiple values in it, but Spark is treating the field as a singular value. You’ll need to use es.read.field.as.array.include to demarcate which fields have multiple values.

0reactions
allamicommented, Apr 16, 2019

Hello, i’ve this error while using “org.elasticsearch” % “elasticsearch-spark-20_2.11” % “6.3.0” could you please help me resolve that: scala.MatchError: Buffer(00205215) (of class scala.collection.convert.Wrappers$JListWrapper)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to resolve scala.MatchError when creating a Data Frame
You expected Spark to be able to parse an arbitrary Java class into a DataFrame - that is not the case, Spark can...
Read more >
MatchError - The Scala Programming Language
This class implements errors which are thrown whenever an object doesn't match any pattern of a pattern matching expression. Source: MatchError.scala.
Read more >
scala.MatchError: null - Migrate from ES 5.6.3 to 6.8.3
I am using PySpark ( Spark 2.2.0, Scala 2.11.8, AWS EMR emr-5.8.0) and trying to migrate from Elasticsearch 5.6.3 to 6.8.3.
Read more >
scala-js/scala-js - Gitter
this is about converting String to Uint16Array and back. Sébastien Doeraene ... i am getting match error when using autowire scala.
Read more >
OutOfMemoryError exceptions for Apache Spark in Azure ...
JavaSerializerInstance.serialize(JavaSerializer.scala:101) at ... details in Zookeeper to be recovered after the Livy Server is back.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found