Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle [BUG]

See original GitHub issue

SynapseML version

0.10.0

System information

Language version: python 3.8.10
Spark Version: 3.2.1
Spark Platform Synapse - Databricks

So I have followed the installation instruction into databricks as per https://microsoft.github.io/SynapseML/docs/getting_started/installation/#databricks

Describe the problem

Conversion of predictions data set to numpy is resulting in unsatisfied link error java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle

I think this is the line that is causing the error.

Now I find it hard to believe that my bit of code is particularly novel, so I cannot be the only one who is having this issue. Or I am doing something wrong.

Code to reproduce issue

from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml import Pipeline


# get col names and defintions
string_cols = [c for c, t in df.dtypes if t =='string']
string_index = [f"{s}_index" for s in string_cols]
numeric_cols = [c for c, t in df.dtypes if t !='string']
numeric_cols.remove('objective')


stringIndexer = StringIndexer(inputCols=string_cols, outputCols=string_index, handleInvalid="keep")
featurizer = VectorAssembler(inputCols=numeric_cols+string_index, outputCol="features", handleInvalid="keep")

data_pipeline = Pipeline(stages= [stringIndexer, featurizer])

data = data_pipeline.fit(df).transform(df)["objective", "features"]

# split into train and test  
train, test = data.randomSplit([0.90, 0.10], seed=1)

from synapse.ml.lightgbm import LightGBMClassifier
param = {
             'featuresCol':"features", 
            'labelCol':"objective",
            'zeroAsMissing': False,
            'objective': 'binary',
            'metric': 'binary',
            'verbosity': 0,
            'isUnbalance': True,
            'useBarrierExecutionMode':True, #fix for a known issue see details https://github.com/microsoft/SynapseML/issues/1534
            'learningRate': 0.019960206745150144,
             'posBaggingFraction': 0.741400512824773,
             'negBaggingFraction': 0.9592530174926162,
             'lambdaL1': 7.222372408024596e-07,
             'lambdaL2': 8.048479891726644e-08,
             'numLeaves': 231,
             'featureFraction': 0.7013476730404191,
             'baggingFraction': 0.9473274453520037,
             'baggingFreq': 7,
             'minDataInLeaf': 30,

        }
  

lgb_class = LightGBMClassifier(**param)
model = lgb_class.fit(train)        
predictions = model.transform(test)

import numpy as np
prediction_probability = np.array(predictions.select('probability').collect())

Other info / logs

Py4JJavaError Traceback (most recent call last) <command-391679> in <module> 1 import numpy as np ----> 2 f = np.array(predictions.select(‘probability’).collect())

/databricks/spark/python/pyspark/sql/dataframe.py in collect(self) 713 # Default path used in OSS Spark / for non-DF-ACL clusters: 714 with SCCallSiteSync(self._sc) as css: –> 715 sock_info = self._jdf.collectToPython() 716 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) 717

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 115 def deco(*a, **kw): 116 try: –> 117 return f(*a, **kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: –> 326 raise Py4JJavaError( 327 “An error occurred while calling {0}{1}{2}.\n”. 328 format(target_id, “.”, name), value)

Py4JJavaError: An error occurred while calling o11576.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 73 in stage 55.0 failed 4 times, most recent failure: Lost task 73.3 in stage 55.0 (TID 21878) (10.30.252.53 executor 32): org.apache.spark.SparkException: Failed to execute user defined function (LightGBMClassificationModel$$Lambda$7979/1876654359: (struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:168) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1655) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle()J at com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle(Native Method) at com.microsoft.ml.lightgbm.lightgbmlib.voidpp_handle(lightgbmlib.java:628) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler$.com$microsoft$azure$synapse$ml$lightgbm$booster$BoosterHandler$$createBoosterPtrFromModelString(LightGBMBooster.scala:42) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler.<init>(LightGBMBooster.scala:64) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler$lzycompute(LightGBMBooster.scala:237) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler(LightGBMBooster.scala:232) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.score(LightGBMBooster.scala:396) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.predictProbability(LightGBMClassifier.scala:178) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.$anonfun$transform$4(LightGBMClassifier.scala:138) … 23 more

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3029) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2976) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2970) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2970) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1390) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1390) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1390) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3238) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3179) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3167) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1152) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2638) at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:241) at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:276) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:81) at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:87) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75) at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62) at org.apache.spark.sql.execution.ResultCacheManager.collectResult$1(ResultCacheManager.scala:611) at org.apache.spark.sql.execution.ResultCacheManager.computeResult(ResultCacheManager.scala:618) at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:561) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:560) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:457) at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:436) at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:422) at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3739) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3951) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:240) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:388) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:187) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:142) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:338) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3949) at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3737) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Failed to execute user defined function (LightGBMClassificationModel$$Lambda$7979/1876654359: (struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) => struct<type:tinyint,size:int,indices:array<int>,values:array<double>>) at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:168) at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80) at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:95) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1655) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) … 1 more Caused by: java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle()J at com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle(Native Method) at com.microsoft.ml.lightgbm.lightgbmlib.voidpp_handle(lightgbmlib.java:628) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler$.com$microsoft$azure$synapse$ml$lightgbm$booster$BoosterHandler$$createBoosterPtrFromModelString(LightGBMBooster.scala:42) at com.microsoft.azure.synapse.ml.lightgbm.booster.BoosterHandler.<init>(LightGBMBooster.scala:64) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler$lzycompute(LightGBMBooster.scala:237) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.boosterHandler(LightGBMBooster.scala:232) at com.microsoft.azure.synapse.ml.lightgbm.booster.LightGBMBooster.score(LightGBMBooster.scala:396) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.predictProbability(LightGBMClassifier.scala:178) at com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel.$anonfun$transform$4(LightGBMClassifier.scala:138) … 23 more

What component(s) does this bug affect?

area/cognitive: Cognitive project
area/core: Core project
area/deep-learning: DeepLearning project
area/lightgbm: Lightgbm project
area/opencv: Opencv project
area/vw: VW project
area/website: Website
area/build: Project build system
area/notebooks: Samples under notebooks folder
area/docker: Docker usage
area/models: models related issue

What language(s) does this bug affect?

language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations

AB#1911075

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:20 (1 by maintainers)

Top GitHub Comments

2reactions

mhamilton723commented, Aug 25, 2022

Hey @timpiperseek @martamaslankowska we cut a new version 0.10.1, please let us know if this solves your issues thanks for your patience!

1reaction

martamaslankowskacommented, Aug 29, 2022

we cut a new version 0.10.1, please let us know if this solves your issues

Hi, thanks a lot for the new version! I just checked it out and it seems to be working - so that indeed solves my problem 😊

Top Results From Across the Web

how do I install parckage(such as mmlspark) to CDH cluster ...

Here, I think this error is because mmlspark python port can not load the jar , which causes Py4JJavaError. But I have no...

LightGbmMulticlassTrainer Class (Microsoft.ML.Trainers ...

The IEstimator<TTransformer> for training a boosted decision tree multi-class classification model using LightGBM.

Bug ID: JDK-8225425 java.lang.UnsatisfiedLinkError: net.dll

UnsatisfiedLinkError : net.dll: Can't find dependent libraries ... We're using windows/nanoserver as a base docker image for our application (TeamCity ...

Issues · microsoft/SynapseML - GitHub

java.lang.UnsatisfiedLinkError: com.microsoft.ml.lightgbm.lightgbmlibJNI.voidpp_handle [BUG] area/lightgbm awaiting response bug triage. #1595 opened ...

com.microsoft.ml.lightgbm : lightgbmlib : 3.2.110 - Maven Central

LightGBM - A fast, distributed, high performance gradient boosting framework based on decision tree algorithms, used for ranking, classification and many ...