question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark 2.3: python example from the readme gives '"java.lang.IllegalArgumentException"

See original GitHub issue

I use the following versions:

python 2.7.10 pyddq==4.1.1 drunken-data-quality==4.1.1 spark 2.3.0 spark uses scala 2.11.8 java 1.8.0_162

Are there people that are able to run this? I have a feeling that this issue might be due to wrong/conflicting versions.

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Users/vincent/Downloads/drunken-data-quality-assembly_2.11-4.1.1.jar pyspark-shell'
from pyspark.sql import SparkSession
from pyddq.core import Check

sparkSession = SparkSession.builder.appName("example-pyspark-read-and-write").getOrCreate()
df = sparkSession.createDataFrame([(1, "a"), (1, None), (3, "c")])
check = Check(df)
check.hasUniqueKey("_1", "_2").isNeverNull("_1").run()

above test case gives the following error:

Traceback (most recent call last):
  File "/Users/vincent/Library/Preferences/PyCharmCE2017.1/scratches/scratch_3.py", line 9, in <module>
    check.hasUniqueKey("_1", "_2").isNeverNull("_1").run()
  File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/pyddq/core.py", line 436, in run
    self.jvmCheck.run(jvm_reporters)
  File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Users/vincent/Workspace/test_dags/venv/lib/python2.7/site-packages/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o49.run.
: java.lang.IllegalArgumentException
	at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
	at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
	at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
	at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
	at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
	at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
	at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
	at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
	at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
	at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
	at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
	at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
	at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
	at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2066)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:297)
	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2770)
	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2769)
	at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
	at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
	at de.frosner.ddq.core.Runner$$anonfun$run$1.apply(Runner.scala:22)
	at de.frosner.ddq.core.Runner$$anonfun$run$1.apply(Runner.scala:19)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.immutable.List.map(List.scala:285)
	at de.frosner.ddq.core.Runner$.run(Runner.scala:19)
	at de.frosner.ddq.core.Check.run(Check.scala:209)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.base/java.lang.Thread.run(Thread.java:844)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
FRosnercommented, Mar 29, 2018

Hi @vincentclaes! Can you check if it runs with Spark 2.2?

0reactions
FRosnercommented, Nov 28, 2019

It doesn’t support Python 3 at the moment and I’m not actively working on this project right now. If you want, feel free to submit a PR though, we’ll review it asap.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java.lang.IllegalArgumentException when applying a Python ...
As per https://issues.apache.org/jira/browse/SPARK-29367, you need to either: set the environment variable. ARROW_PRE_0_15_IPC_FORMAT=1.
Read more >
apache/sedona - Gitter
Hi, everyone! I run the following code for a spatial join on geometry fields: ```val coverage = DimCoverageReader.apply(spark, params) coverage.
Read more >
Quick Start - Spark 2.3.0 Documentation
This tutorial provides a quick introduction to using Spark. We will first introduce the API through Spark's interactive shell (in Python or Scala), ......
Read more >
DataFrameReader — Loading Data From External Data Sources
DataFrameReader is a fluent API to describe the input data source that will be used to "load" data from an external data source...
Read more >
Learning Spark, Second Edition - Databricks
Spark's Structured APIs in Java, Python, Scala, or R, ... MLlib provides many popular machine learning algorithms built atop ... README.md.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found