Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scala.MatchError: Buffer(ACK)

See original GitHub issue

Bug report.

Issue description

I’m trying to read from Elasticsearch 5.6.0 with spark 2.1. I have the error message “scala.MatchError: Buffer(ACK) (of class scala.collection.convert.Wrappers$JListWrapper)”

Steps to reproduce

Code:

import org.elasticsearch.spark._
import org.elasticsearch.hadoop.mr.EsInputFormat

val sparkSession = SparkSession.builder()
   .config("es.nodes", "myserver")
   .config("es.port","9200")
   .appName("ES")
   .getOrCreate()

val query = """{ "query" : { "range" : { "@timestamp" : { "gte": "now-30m",  "lte" : "now"} } } }""" 
val dataES = sparkSession.read.format("org.elasticsearch.spark.sql")
 .option("es.query", {query}) 
 .load("netflow-2017.10.09")

dataES.count() ==> returns # ok
dataES.take(5) ==> error message

Print schema:

root
 |-- @timestamp: timestamp (nullable = true)
 |-- @version: string (nullable = true)
 |-- geoip: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- geoip_dst: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- geoip_src: struct (nullable = true)
 |    |-- as_org: string (nullable = true)
 |    |-- asn: string (nullable = true)
 |    |-- autonomous_system: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- continent_code: string (nullable = true)
 |    |-- country_code2: string (nullable = true)
 |    |-- country_code3: string (nullable = true)
 |    |-- country_name: string (nullable = true)
 |    |-- dma_code: integer (nullable = true)
 |    |-- ip: string (nullable = true)
 |    |-- latitude: float (nullable = true)
 |    |-- location: struct (nullable = true)
 |    |    |-- lat: double (nullable = true)
 |    |    |-- lon: double (nullable = true)
 |    |-- longitude: float (nullable = true)
 |    |-- postal_code: string (nullable = true)
 |    |-- region_code: string (nullable = true)
 |    |-- region_name: string (nullable = true)
 |    |-- timezone: string (nullable = true)
 |-- host: string (nullable = true)
 |-- netflow: struct (nullable = true)
 |    |-- bytes: long (nullable = true)
 |    |-- direction: string (nullable = true)
 |    |-- dst_addr: string (nullable = true)
 |    |-- dst_as: integer (nullable = true)
 |    |-- dst_locality: string (nullable = true)
 |    |-- dst_mask_len: integer (nullable = true)
 |    |-- dst_port: string (nullable = true)
 |    |-- dst_port_name: string (nullable = true)
 |    |-- engine_id: integer (nullable = true)
 |    |-- engine_type: integer (nullable = true)
 |    |-- first_switched: timestamp (nullable = true)
 |    |-- flow_locality: string (nullable = true)
 |    |-- flow_records: integer (nullable = true)
 |    |-- flow_seq_num: long (nullable = true)
 |    |-- input_snmp: string (nullable = true)
 |    |-- ip_version: string (nullable = true)
 |    |-- last_switched: timestamp (nullable = true)
 |    |-- next_hop: string (nullable = true)
 |    |-- output_snmp: string (nullable = true)
 |    |-- packets: long (nullable = true)
 |    |-- protocol: string (nullable = true)
 |    |-- protocol_name: string (nullable = true)
 |    |-- sampling_algorithm: integer (nullable = true)
 |    |-- sampling_interval: integer (nullable = true)
 |    |-- src_addr: string (nullable = true)
 |    |-- src_as: integer (nullable = true)
 |    |-- src_locality: string (nullable = true)
 |    |-- src_mask_len: integer (nullable = true)
 |    |-- src_port: string (nullable = true)
 |    |-- src_port_name: string (nullable = true)
 |    |-- tcp_flag_tags: string (nullable = true)
 |    |-- tcp_flags: integer (nullable = true)
 |    |-- tcp_flags_label: string (nullable = true)
 |    |-- tos: string (nullable = true)
 |    |-- version: string (nullable = true)
 |    |-- vlan: string (nullable = true)
 |-- tags: string (nullable = true)
 |-- type: string (nullable = true)

Strack trace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): scala.MatchError: Buffer(FIN, SYN, ACK) (of class scala.collection.convert.Wrappers$JListWrapper)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:383)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:60)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:57)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
  ... 75 elided
Caused by: scala.MatchError: Buffer(FIN, SYN, ACK) (of class scala.collection.convert.Wrappers$JListWrapper)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:231)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
  at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:383)
  at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:60)
  at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:57)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:748)``` 

### Version Info

OS:         :  Linux centos
JVM         :  
Hadoop/Spark:  2.1 (scala 2.11.8)
ES-Hadoop   :  elasticsearch-spark-20_2.11 % 5.6.0"
ES          :  5.6.0

Issue Analytics

State:
Created 6 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

jbaieracommented, Oct 9, 2017

@hupiv Please keep these sorts of questions on our forums. Github is reserved for confirmed bugs and feature tracking.

In this case you are loading a field which contains multiple values in it, but Spark is treating the field as a singular value. You’ll need to use es.read.field.as.array.include to demarcate which fields have multiple values.

0reactions

allamicommented, Apr 16, 2019

Hello, i’ve this error while using “org.elasticsearch” % “elasticsearch-spark-20_2.11” % “6.3.0” could you please help me resolve that: scala.MatchError: Buffer(00205215) (of class scala.collection.convert.Wrappers$JListWrapper)

Top Results From Across the Web

How to resolve scala.MatchError when creating a Data Frame

You expected Spark to be able to parse an arbitrary Java class into a DataFrame - that is not the case, Spark can...

MatchError - The Scala Programming Language

This class implements errors which are thrown whenever an object doesn't match any pattern of a pattern matching expression. Source: MatchError.scala.

scala.MatchError: null - Migrate from ES 5.6.3 to 6.8.3

I am using PySpark ( Spark 2.2.0, Scala 2.11.8, AWS EMR emr-5.8.0) and trying to migrate from Elasticsearch 5.6.3 to 6.8.3.

scala-js/scala-js - Gitter

this is about converting String to Uint16Array and back. Sébastien Doeraene ... i am getting match error when using autowire scala.

OutOfMemoryError exceptions for Apache Spark in Azure ...

JavaSerializerInstance.serialize(JavaSerializer.scala:101) at ... details in Zookeeper to be recovered after the Livy Server is back.