question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Caused by: java.lang.Exception: unsupported data type ARRAY

See original GitHub issue

I have been stuck trying to figure if am doing something wrong but basically, I’m trying to use avro to writes data into hbase using your library but it’s given me the error below:

Here is what my AVRO schema looks like:

player_schema = """
                    {
                        "type":"array",
                        "items": {
                            "name":"activities",
                            "type":"record",
                            "fields":[
                                    {"name":"start", "type":"double"},
                                    {"name":"y", "type":"double"},
                                    {"name":"width", "type":"double"}
                                ]
                            }
                    }
                 """     

opponent_schema = """
                {
                    "type":"array",
                    "items": {
                        "name":"opponent",
                        "type":"record",
                        "fields":[
                                {"name":"x", "type":"double"},
                                {"name":"y", "type":"double"}
                            ]
                        }
                }                    
         """       

Finally,


catalog = """{
                "table":{"namespace":"default", "name":"gm_drake"},
                "rowkey":"id",
                "columns":{                
                    "id":{"cf":"rowkey", "col":"id", "type":"string"},
                    "feature":{"cf":"data", "col":"feature", "type":"string"},
                    "player":{"cf":"data", "col":"player", "avro":"player_schema"},
                    "opponent":{"cf":"data", "col":"opponent", "avro":"opponent_schema"}
                }
            }"""

df_drake_json.write.options(player_schema =player_schema, opponent_schema = opponent_schema, catalog = catalog).format("org.apache.spark.sql.execution.datasources.hbase").save()

And here is my sample dataset

{"feature":"ROLLOVER_ATTACK","player":[{"start":-7.281526191298157,"y":1.0,"width":0.15025269761092996}],"opponent":[{"y":1.7573350289423742E-4,"x":-7.281526191298157},{"y":1.7580108506663706E-4,"x":-7.281375788197446}]}

Getting this error java.lang.Exception: unsupported data type ARRAY

Caused by: java.lang.Exception: unsupported data type ARRAY
        at org.apache.spark.sql.execution.datasources.hbase.AvroSedes$.serialize(SchemaConverters.scala:400)
        at org.apache.spark.sql.execution.datasources.hbase.Utils$.toBytes(Utils.scala:74)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$2.apply(HBaseRelation.scala:155)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1$2.apply(HBaseRelation.scala:154)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.org$apache$spark$sql$execution$datasources$hbase$HBaseRelation$$convertToPut$1(HBaseRelation.scala:154)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert$1.apply(HBaseRelation.scala:161)
        at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$insert$1.apply(HBaseRelation.scala:161)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1112)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 more

How can I fix this exception ?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:19 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
weiqingycommented, Apr 26, 2017

You can try, but I am afraid you could not use dataframe/rdd directly here since you need to invoke AvroSerde.serialize() which controls how to convert your data into binary. It means, take AvroSerde.serialize(user, avroSchema) as an example, Avro needs to understand what user is.

You can refer here to try to use SchemaConverters.createConverterToSQL(avroSchema)(data) and SchemaConverters.toSqlType(avroSchema) to convert dataframe/rdd to/from Avro Record, I am not sure though.

0reactions
mavencode01commented, Apr 26, 2017

@weiqingy quick follow on that: Can I use a dataframe/rdd instead of GenericData.Record(avroSchema)

    val user = new GenericData.Record(avroSchema)
    user.put("name", s"name${"%03d".format(i)}")
    user.put("favorite_number", i)
    user.put("favorite_color", s"color${"%03d".format(i)}")
    val favoriteArray = new GenericData.Array[String](2, avroSchema.getField("favorite_array").schema())
    favoriteArray.add(s"number${i}")
    favoriteArray.add(s"number${i+1}")
    user.put("favorite_array", favoriteArray)
    import collection.JavaConverters._
    val favoriteMap = Map[String, Int](("key1" -> i), ("key2" -> (i+1))).asJava
    user.put("favorite_map", favoriteMap)
    val avroByte = AvroSerde.serialize(user, avroSchema)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Unsupported data type NullType when turning a dataframe ...
The error you have Unsupported data type NullType indicates that one of the columns for the table you are saving has a NULL...
Read more >
Solved: Error in Spark-HBase Connector - unsupported data
Caused by: java.lang.Exception: unsupported data type StringType. at org.apache.spark.sql.execution.datasources.hbase.Utils$.toBytes(Utils.scala:88).
Read more >
How to Fix the Unsupported Operation Exception in Java
An UnsupportedOperationException is thrown when a requested operation cannot be performed because it is not supported for that particular class.
Read more >
Spark Ignite : Unsupported data type ArrayType(StringType,true)
IgniteException : Unsupported data type ArrayType(StringType,true) at org.apache.ignite.spark.impl.QueryUtils$.dataType(QueryUtils.scala:151) ...
Read more >
Collection Functions · The Internals of Spark SQL
Creates a new row for each element in the given array or map column. ... java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found