Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kedro loss lightgbm model paramater due to deepcopy(lightgbm_model)

See original GitHub issue

Description

When loading lightGBM with Memory Dataset, the parameters are lost due to deepcopy.

This is not a bug from Kedro, but I think Kedro should consider changing the default to prevent this happens. The issue is related to deepcopy(lgb_model) causing parameters lsot.

Context

How has this bug affected you? What were you trying to accomplish?

Steps to Reproduce

git clone https://github.com/noklam/kedro_lightgbm_bug

kedro run

In the `create_model()` node
> {'objective': 'regression', 'verbose': -1, 'num_leaves': 3} # prints parameters

In the `load_model()` node
>{} # prints parameters, it lost all the parameters

python test_lightgbm.py (without involving kedro, you can see the different output from deepcopy(model) versus pickle.load)

import pandas as pd
import numpy as np
import pickle
import lightgbm as lgb
from copy import deepcopy

params = {
'objective': 'regression',
'verbose': -1,
'num_leaves': 3
}

X = np.random.rand(100,2)
Y = np.ravel(np.random.rand(100,1))
lgbm = lgb.train(params, lgb.Dataset(X,label=Y),num_boost_round=1)

f = open('test_pickle.pkl','wb')
pickle.dump(lgbm,f)
f.close()

print(lgbm.params)

## Deep copy will missing params
new_model = deepcopy(lgbm)
print(new_model.params)


## Load from file is fine
import pickle
f = open('test_pickle.pkl','rb')
m2 = pickle.load(f)
f.close()

print(lgbm.params)

{‘objective’: ‘regression’, ‘verbose’: -1, ‘num_leaves’: 3} {} {‘objective’: ‘regression’, ‘verbose’: -1, ‘num_leaves’: 3}

Expected Result

parameters should be preserved.

Actual Result

paramteres lost, it becomes empty dict.

-- If you received an error, place it here.

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

Kedro version used (pip show kedro or kedro -V): 0.16.6
Python version used (python -V): 3.7.5
Operating system and version: Windows 10

Issue Analytics

State:
Created 3 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

noklamcommented, Mar 19, 2021

I wrote a casual summary here in case someone interest in this issue. https://noklam.ml/python/pickle/deepcopy/2021/03/19/deepcopy-lightgbm-and-Pickles.html

1reaction

noklamcommented, Mar 19, 2021

Seems it is rather a workaround to make lightgbm pickable(I do not know much about this, parameter is just a dict, I may need to read more to understand the issue later).

It is more desirable to keep the parameters with the model, but I agree at this point there is nothing to do for Kedro.

Not sure if they are going to fix this issue in next release. It’s not going to affect model inference, but parameters are still potentially useful for other downstream tasks.

Thanks for following the issue, I think it is ok to close the issue.

Top Results From Across the Web

Kedro loss lightgbm model paramater due to deepcopy ...

When loading lightGBM with Memory Dataset, the parameters are lost due to deepcopy. This is not a bug from Kedro, but I think...

Parameters — LightGBM 3.3.3.99 documentation

Parameters . This page contains descriptions of all parameters in LightGBM. List of other helpful links. Python API · Parameters Tuning.

Focal loss implementation for LightGBM - Max Halford

Focal loss was initially proposed to resolve the imbalance issues that occur when training object detection models. However, it can and has been ......

LightGBM with the Focal Loss for imbalanced datasets

For each dataset I run two experiments: 1) using the is_unbalance parameter set to True (which is equivalent to using scale_pos_weight if the ......

how to pass additional parameters to lgbm custom loss function?

I have write the rmsse custom loss function in a following way ... model = store_name = 'CA_1 lgbm.train(params,train_set=train_set ...