question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tagged types based on anything other than AnyVals produces exception in Spark

See original GitHub issue

A good explanation of the issue and examples are here: https://stackoverflow.com/questions/66377920/how-fix-issues-with-spark-and-shapeless-tagged-type-based-on-string There has been no responses. I also posed the question in the Gitter / Shapeless channel with no responses.

Basically, any case class that uses a tagged type like type Foo = Int @@ FooTag works just fine with Spark Datasets.

But if I use type Foo = String @@ FooTag, it fails with exception java.lang.ClassNotFoundException: no Java class corresponding to <refinement of java.lang.String with shapeless.tag.Tagged[FooTag]> found.

Not sure if this is a bug in Shapeless, or a limitation with Spark. Is there any kind of work around? Or am I limited to things like Int, Long, Double, Boolean as base type?

I created a custom Spark UDT for the java.util.UUID, and it works great. But when I use tagging on the UUID, same issue.

Thank you! Any guidance would be greatly appreciated.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12

github_iconTop GitHub Comments

1reaction
DCameronMauchcommented, Mar 29, 2021

Here is a repo that demonstrates the issue: https://github.com/DCameronMauch/TaggedType

If you change the method called by the main to the Int version, you can see that it works just fine. Long as the base type also works.

0reactions
joroKr21commented, Feb 11, 2022

The issue is with Spark SQL - it doesn’t work with refined types (which is what @@ translates to). Here is the offending code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L88-L112

It works for primitives because they are special cased in dataTypeFor with isSubtype - so subtypes of primitives are basically treated as primitives. Unfortunately Spark is not very good at offering extension points and I don’t think you can define a custom DataType for @@. But if you consider using https://github.com/typelevel/frameless it does let you define custom encoders: http://typelevel.org/frameless/Injection.html

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark UDF error - Schema for type Any is not supported
I'm trying to create a udf that will replace negative values in a column with 0. My dataframe is – called df, and...
Read more >
spark/ScalaReflection.scala at master · apache/spark - GitHub
* Returns true if the value of this data type is same between internal and external. */. def isNativeType(dt: DataType): ...
Read more >
getquill/quill - Gitter
Error:(16, 60) exception during macro expansion: scala.reflect.macros.TypecheckException: Can't find implicit `Decoder[org.joda.time.DateTime]`. Please, do one ...
Read more >
Exception Handling in Spark Data Frames - Anish C
Our idea is to tackle this so that the Spark job completes successfully. Now this can be different in case of RDD[String] or...
Read more >
Tagged or AnyVal? - kubuszok.com
But { type Tag = U } contains no methods and no members! There is simply no circumstance under which compiler would notice...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found