Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tagged types based on anything other than AnyVals produces exception in Spark

See original GitHub issue

A good explanation of the issue and examples are here: https://stackoverflow.com/questions/66377920/how-fix-issues-with-spark-and-shapeless-tagged-type-based-on-string There has been no responses. I also posed the question in the Gitter / Shapeless channel with no responses.

Basically, any case class that uses a tagged type like type Foo = Int @@ FooTag works just fine with Spark Datasets.

But if I use type Foo = String @@ FooTag, it fails with exception java.lang.ClassNotFoundException: no Java class corresponding to <refinement of java.lang.String with shapeless.tag.Tagged[FooTag]> found.

Not sure if this is a bug in Shapeless, or a limitation with Spark. Is there any kind of work around? Or am I limited to things like Int, Long, Double, Boolean as base type?

I created a custom Spark UDT for the java.util.UUID, and it works great. But when I use tagging on the UUID, same issue.

Thank you! Any guidance would be greatly appreciated.

Issue Analytics

State:
Created 3 years ago
Comments:12

Top GitHub Comments

1reaction

DCameronMauchcommented, Mar 29, 2021

Here is a repo that demonstrates the issue: https://github.com/DCameronMauch/TaggedType

If you change the method called by the main to the Int version, you can see that it works just fine. Long as the base type also works.

0reactions

joroKr21commented, Feb 11, 2022

The issue is with Spark SQL - it doesn’t work with refined types (which is what @@ translates to). Here is the offending code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L88-L112

It works for primitives because they are special cased in dataTypeFor with isSubtype - so subtypes of primitives are basically treated as primitives. Unfortunately Spark is not very good at offering extension points and I don’t think you can define a custom DataType for @@. But if you consider using https://github.com/typelevel/frameless it does let you define custom encoders: http://typelevel.org/frameless/Injection.html

Top Results From Across the Web

Spark UDF error - Schema for type Any is not supported

I'm trying to create a udf that will replace negative values in a column with 0. My dataframe is – called df, and...

spark/ScalaReflection.scala at master · apache/spark - GitHub

* Returns true if the value of this data type is same between internal and external. */. def isNativeType(dt: DataType): ...

getquill/quill - Gitter

Error:(16, 60) exception during macro expansion: scala.reflect.macros.TypecheckException: Can't find implicit `Decoder[org.joda.time.DateTime]`. Please, do one ...

Exception Handling in Spark Data Frames - Anish C

Our idea is to tackle this so that the Spark job completes successfully. Now this can be different in case of RDD[String] or...

Tagged or AnyVal? - kubuszok.com

But { type Tag = U } contains no methods and no members! There is simply no circumstance under which compiler would notice...