question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError: 'JavaPackage' object is not callable

See original GitHub issue

Whenever I try to use Flint here locally (no Hadoop/EMR involved), it keep barfing at me with the above error message in the subject. It’s a setup on top of Python 3.7 with PySpark 2.4.4 and OpenJDK 8; an Ubuntu 19.04 install.

Note: As I’m running locally only, I’m getting this log message from Spark, but everything does run perfectly using vanilla PySpark:

19/10/23 09:59:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

It happens when I try to either read a PySpark dataframe into a ts.flint.TimeSeriesDataFrame. This example is adapted from the Flint Example.ipynb:

import pyspark
import ts.flint
from ts.flint import FlintContext

sc = pyspark.SparkContext('local', 'Flint Example')
spark = pyspark.sql.SparkSession(sc)
flint_context = FlintContext(spark)

sp500 = (
    spark.read
    .option('header', True)
    .option('inferSchema', True)
    .csv('sp500.csv')
    .withColumnRenamed('Date', 'time')
)
sp500 = flint_context.read.dataframe(sp500)

The last line causes the “boom”, with this (first part of) the stack trace:

TypeError                                 Traceback (most recent call last)
~/.virtualenvs/pyspark-test/lib/python3.7/site-packages/ts/flint/java.py in new_reader(self)
     37         try:
---> 38             return utils.jvm(self.sc).com.twosigma.flint.timeseries.io.read.TSReadBuilder()
     39         except TypeError:

TypeError: 'JavaPackage' object is not callable

Any ideas what may be going wrong and how the problem could be solved?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

3reactions
pohutukawacommented, Oct 28, 2019

Digging deeper around here, it seems like there’s an issue with PySpark 2.4 (issue #63, addressed in unmerged pull-request #64). Though the README.md clearly states that flint is already compatible with PySpark 2.4 with Python >= 3.5.

Besides that issue, looking also more closely at the README.md, I’m of the opinion that the install instructions are lacking. I only just did the pip install ts-flint thing, and nothing else. The instructions on ts-flint.readthedocs.io do not even mention anything at all, leading me to the assumption that a pip install is sufficient. The only thing in the README mentioned is that some Scala artifact available from Maven Central is required, though I’m not interested in building ts-flint, but just using it.

Anyway, just following the Maven Central link takes me somewhere, but I’ve got no idea on how/what to download and where to place it. I presume some JAR file(s) will be required. But how???

Any help would be appreciated for someone quite well informed on Python, OK on Java, but not well at all in the Scala eco-system. Stuff like that would be also awesome to find (both) in the README and the ReadTheDocs piece.

0reactions
Maria-UETcommented, Jul 2, 2021

I was facing the same problem with Pyarrow.

My environment:

  • Python 3.6
  • Pyspark 2.4.4
  • Pyarrow 4.0.1
  • Jupyter notebook
  • Spark cluster on GCS

When I try to enable Pyarrow optimization like this: spark.conf.set('spark.sql.execution.arrow.enabled', 'true)

I get the following warning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however failed by the reason below: TypeError: 'JavaPackage' object is not callable

I solved this problem by:

  1. I printed the config of spark session:
import  os
from pyspark import SparkConf

spark_config = SparkConf().getAll()
for conf in spark_config:
    print(conf)

This will print the key-value pairs of spark configurations.

  1. I found the path to my jar files in this key-value pair: ('spark.yarn.jars', 'path\to\jar\files')

  2. After I found the path where my jar files are located, I printed the names of jars for Pyarrow, like this:

jar_names = os.listdir('path\to\jar\files')
for jar_name in jar_names:
    if 'arrow' in jar_name:
        print(jar_name)

I found the following jars:

arrow-format-0.10.0.jar
arrow-memory-0.10.0.jar
arrow-vector-0.10.0.jar
  1. Then I added the path of arrow jars in the spark session config: For adding multiple jar file paths, I used : as delimiter. spark.conf.set('spark.driver.extraClassPath', 'path\to\jar\files\arrow-format-0.10.0.jar:path\to\jar\files\arrow-memory-0.10.0.jar:path\to\jar\files\arrow-vector-0.10.0.jar')

  2. Then I restarted the kernel and Pyarrow optimization worked

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: 'JavaPackage' object is not callable (spark._jvm)
I'm setting up GeoSpark Python and after installing all the pre-requisites, I'm running the very basic code examples to test it. from pyspark....
Read more >
'JavaPackage' object is not callable when using Pyspark - Hail ...
Whenever I try to use PySpark within Python code: from pyspark import SparkConf, SparkContext conf = (SparkConf().set("spark.executor.memory", ...
Read more >
TypeError: 'JavaPackage' object is not callable #232 - GitHub
In short, this error means the JAVA/JAR is not loaded inside the Python/PyPI. This should correctly download and load the JAR from Maven:...
Read more >
TypeError: 'JavaPackage' object is not callable
I am using set of libraries, which I am installing under a notebook. Not in the cluster config. python>=3.7,<3.8.
Read more >
spark._jvm. \ TypeError: 'JavaPackage' object is not callable
Hello everyone! Can anybody help me with this error: File "/root/anaconda3/lib/python3.7/site-packages/geo_pyspark/register/geo_registrator ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found