question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle [BUG]

See original GitHub issue

SynapseML version

0.10.0

System information

  • Language version: python 3.8.10
  • Spark Version: 3.2.1
  • Spark Platform Synapse - Databricks

So I have followed the installation instruction into databricks as per https://microsoft.github.io/SynapseML/docs/getting_started/installation/#databricks

Describe the problem

Conversion of predictions data set to numpy is resulting in unsatisfied link error java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle

I think this is the line that is causing the error.

Now I find it hard to believe that my bit of code is particularly novel, so I cannot be the only one who is having this issue. Or I am doing something wrong.

Code to reproduce issue

from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml import Pipeline


# get col names and defintions
string_cols = [c for c, t in df.dtypes if t =='string']
string_index = [f"{s}_index" for s in string_cols]
numeric_cols = [c for c, t in df.dtypes if t !='string']
numeric_cols.remove('objective')


stringIndexer = StringIndexer(inputCols=string_cols, outputCols=string_index, handleInvalid="keep")
featurizer = VectorAssembler(inputCols=numeric_cols+string_index, outputCol="features", handleInvalid="keep")

data_pipeline = Pipeline(stages= [stringIndexer, featurizer])

data = data_pipeline.fit(df).transform(df)["objective", "features"]

# split into train and test  
train, test = data.randomSplit([0.90, 0.10], seed=1)

from synapse.ml.lightgbm import LightGBMClassifier
param = {
             'featuresCol':"features", 
            'labelCol':"objective",
            'zeroAsMissing': False,
            'objective': 'binary',
            'metric': 'binary',
            'verbosity': 0,
            'isUnbalance': True,
            'useBarrierExecutionMode':True, #fix for a known issue see details https://github.com/microsoft/SynapseML/issues/1534
            'learningRate': 0.019960206745150144,
             'posBaggingFraction': 0.741400512824773,
             'negBaggingFraction': 0.9592530174926162,
             'lambdaL1': 7.222372408024596e-07,
             'lambdaL2': 8.048479891726644e-08,
             'numLeaves': 231,
             'featureFraction': 0.7013476730404191,
             'baggingFraction': 0.9473274453520037,
             'baggingFreq': 7,
             'minDataInLeaf': 30,

        }
  

lgb_class = LightGBMClassifier(**param)
model = lgb_class.fit(train)        
predictions = model.transform(test)

import numpy as np
prediction_probability = np.array(predictions.select('probability').collect())

Other info / logs


Py4JJavaError Traceback (most recent call last) <command-391679> in <module> 1 import numpy as np ----> 2 f = np.array(predictions.select(‘probability’).collect())

/databricks/spark/python/pyspark/sql/dataframe.py in collect(self) 713 # Default path used in OSS Spark / for non-DF-ACL clusters: 714 with SCCallSiteSync(self._sc) as css: –> 715 sock_info = self._jdf.collectToPython() 716 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) 717

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 115 def deco(*a, **kw): 116 try: –> 117 return f(*a, **kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: –> 326 raise Py4JJavaError( 327 “An error occurred while calling {0}{1}{2}.\n”. 328 format(target_id, “.”, name), value)

Py4JJavaError: An error occurred while calling o11576.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 73 in stage 55.0 failed 4 times, most recent failure: Lost task 73.3 in stage 55.0 (TID 21878) (10.30.252.53 executor 32): org.apache.spark.SparkException: Failed to execute user defined function (LightGBMClassificationModel$$Lambda$7979/1876654359: (struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:168) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1655) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle()J at com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle(Native Method) at com.microsoft.ml.lightgbm.lightgbmlib.voidpp_handle(lightgbmlib.java:628) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler$.com$microsoft$azure$synapse$ml$lightgbm$booster$BoosterHandler$$createBoosterPtrFromModelString(LightGBMBooster.scala:42) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler.<init>(LightGBMBooster.scala:64) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler$lzycompute(LightGBMBooster.scala:237) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler(LightGBMBooster.scala:232) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.score(LightGBMBooster.scala:396) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.predictProbability(LightGBMClassifier.scala:178) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.$anonfun$transform$4(LightGBMClassifier.scala:138) … 23 more

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3029) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2976) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2970) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2970) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1390) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1390) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1390) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3238) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3179) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3167) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1152) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2638) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:241) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:276) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:81) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:87) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62) at org.apache.spark.sql.execution.ResultCacheManager.collectResult$1(ResultCacheManager.scala:611) at org.apache.spark.sql.execution.ResultCacheManager.computeResult(ResultCacheManager.scala:618) at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:561) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:560) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:457) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:436) at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:422) at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3739) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3951) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:240) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:388) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:187) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:142) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:338) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3949) at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3737) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Failed to execute user defined function (LightGBMClassificationModel$$Lambda$7979/1876654359: (struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:168) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1655) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) … 1 more Caused by: java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle()J at com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle(Native Method) at com.microsoft.ml.lightgbm.lightgbmlib.voidpp_handle(lightgbmlib.java:628) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler$.com$microsoft$azure$synapse$ml$lightgbm$booster$BoosterHandler$$createBoosterPtrFromModelString(LightGBMBooster.scala:42) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler.<init>(LightGBMBooster.scala:64) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler$lzycompute(LightGBMBooster.scala:237) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler(LightGBMBooster.scala:232) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.score(LightGBMBooster.scala:396) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.predictProbability(LightGBMClassifier.scala:178) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.$anonfun$transform$4(LightGBMClassifier.scala:138) … 23 more

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations

AB#1911075

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:20 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
mhamilton723commented, Aug 25, 2022

Hey @timpiperseek @martamaslankowska we cut a new version 0.10.1, please let us know if this solves your issues thanks for your patience!

1reaction
martamaslankowskacommented, Aug 29, 2022

we cut a new version 0.10.1, please let us know if this solves your issues

Hi, thanks a lot for the new version! I just checked it out and it seems to be working - so that indeed solves my problem 😊

Read more comments on GitHub >

github_iconTop Results From Across the Web

how do I install parckage(such as mmlspark) to CDH cluster ...
Here, I think this error is because mmlspark python port can not load the jar , which causes Py4JJavaError. But I have no...
Read more >
LightGbmMulticlassTrainer Class (Microsoft.ML.Trainers ...
The IEstimator<TTransformer> for training a boosted decision tree multi-class classification model using LightGBM.
Read more >
Bug ID: JDK-8225425 java.lang.UnsatisfiedLinkError: net.dll
UnsatisfiedLinkError : net.dll: Can't find dependent libraries ... We're using windows/nanoserver as a base docker image for our application (TeamCity ...
Read more >
Issues · microsoft/SynapseML - GitHub
java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle [BUG] area/lightgbm awaiting response bug triage. #1595 opened ...
Read more >
com.microsoft.ml.lightgbm : lightgbmlib : 3.2.110 - Maven Central
LightGBM - A fast, distributed, high performance gradient boosting framework based on decision tree algorithms, used for ranking, classification and many ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found