Spark 2.3: python example from the readme gives '"java.lang.IllegalArgumentException"
See original GitHub issueI use the following versions:
python 2.7.10 pyddq==4.1.1 drunken-data-quality==4.1.1 spark 2.3.0 spark uses scala 2.11.8 java 1.8.0_162
Are there people that are able to run this? I have a feeling that this issue might be due to wrong/conflicting versions.
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Users/vincent/Downloads/drunken-data-quality-assembly_2.11-4.1.1.jar pyspark-shell'
from pyspark.sql import SparkSession
from pyddq.core import Check
sparkSession = SparkSession.builder.appName("example-pyspark-read-and-write").getOrCreate()
df = sparkSession.createDataFrame([(1, "a"), (1, None), (3, "c")])
check = Check(df)
check.hasUniqueKey("_1", "_2").isNeverNull("_1").run()
above test case gives the following error:
Traceback (most recent call last):
File "/Users/vincent/Library/Preferences/PyCharmCE2017.1/scratches/scratch_3.py", line 9, in <module>
check.hasUniqueKey("_1", "_2").isNeverNull("_1").run()
File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/pyddq/core.py", line 436, in run
self.jvmCheck.run(jvm_reporters)
File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/py4j/protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o49.run.
: java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2066)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:297)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2770)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2769)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
at de.frosner.ddq.core.Runner$$anonfun$run$1.apply(Runner.scala:22)
at de.frosner.ddq.core.Runner$$anonfun$run$1.apply(Runner.scala:19)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at de.frosner.ddq.core.Runner$.run(Runner.scala:19)
at de.frosner.ddq.core.Check.run(Check.scala:209)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.base/java.lang.Thread.run(Thread.java:844)
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
java.lang.IllegalArgumentException when applying a Python ...
As per https://issues.apache.org/jira/browse/SPARK-29367, you need to either: set the environment variable. ARROW_PRE_0_15_IPC_FORMAT=1.
Read more >apache/sedona - Gitter
Hi, everyone! I run the following code for a spatial join on geometry fields: ```val coverage = DimCoverageReader.apply(spark, params) coverage.
Read more >Quick Start - Spark 2.3.0 Documentation
This tutorial provides a quick introduction to using Spark. We will first introduce the API through Spark's interactive shell (in Python or Scala), ......
Read more >DataFrameReader — Loading Data From External Data Sources
DataFrameReader is a fluent API to describe the input data source that will be used to "load" data from an external data source...
Read more >Learning Spark, Second Edition - Databricks
Spark's Structured APIs in Java, Python, Scala, or R, ... MLlib provides many popular machine learning algorithms built atop ... README.md.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @vincentclaes! Can you check if it runs with Spark 2.2?
It doesn’t support Python 3 at the moment and I’m not actively working on this project right now. If you want, feel free to submit a PR though, we’ll review it asap.