Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fitting an XGBoost model results in InternalHashError: unhashable type: 'bytearray'

See original GitHub issue

Summary

My app is attempting to train and predict with an XGBoost model from the xgboost package. I am fitting my model in a cached function (using @st.cache) and then returning it.

However, it says it cannot hash bytearray objects in one of my builtins.dict objects. This is strange because before I updated Streamlit to 0.59, it was able to do so.

If you don’t know where the object of type builtins.dict is coming from, try looking at the hash chain below for an object that you do recognize, then pass that to hash_funcs instead:

Object of type builtins.dict: {'feature_names': ['per_capita_crime_rate_by_town', 'proportion_of_residential_land_zoned_for_lots_over_25,000_sq.ft.', 'proportion_of_non-retail_business_acres_per_town.', 'Charles_River_dummy_variable_(1_if_tract_bounds river;_0_otherwise)', 'nitric_oxides_concentration_(parts_per_10_million)', 'average_number_of_rooms_per_dwelling', 'proportion_of_owner-occupied_units_built_prior_to_1940', 'weighted_distances_to_five_Boston_employment_centres', 'index_of_accessibility_to_radial_highways', 'full-value_property-tax_rate_per_$10,000', 'pupil-teacher_ratio_by_town', '1000(Bk-0.63)^2_where_Bk_is_the_proportion_of_blacks_by_town', '%_lower_status_of_the_population'], 'feature_types': ['float', 'float', 'float', 'int', 'float', 'float', 'float', 'float', 'int', 'int', 'float', 'float', 'float'], 'handle': bytearray(b'\x00\x00\x00?\r\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x.....'), 'booster': 'gbtree', 'best_iteration': 99, 'best_ntree_limit': 100} Object of type xgboost.core.Booster: <xgboost.core.Booster object at 0x1a1eb79690> Object of type builtins.tuple: ('_Booster', <xgboost.core.Booster object at 0x1a1eb79690>) Object of type builtins.dict: {'max_depth': 3, 'learning_rate': 0.1, 'n_estimators': 100, 'verbosity': 1, 'silent': None, 'objective': 'reg:linear', 'booster': 'gbtree', 'gamma': 0, 'min_child_weight': 1, 'max_delta_step': 0, 'subsample': 1, 'colsample_bytree': 1, 'colsample_bylevel': 1, 'colsample_bynode': 1, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'base_score': 0.5, 'missing': nan, 'kwargs': {}, '_Booster': <xgboost.core.Booster object at 0x1a1eb79690>, 'seed': None, 'random_state': 0, 'nthread': None, 'n_jobs': 1, 'importance_type': 'gain'} Object of type xgboost.sklearn.XGBRegressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, importance_type='gain', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=None, subsample=1, verbosity=1) Object of type builtins.tuple: (array([23.016954 , 31.42364 , 16.173046 , 23.580927 , 17.46015 , 22.1714 , 18.314796 , 14.029961 , 20.737488 , 21.180895 , 20.44529 , 18.690483 , 8.321284 , 21.453217 , 20.421919 , 24.553173 , 19.685305 , 10.205381 , 44.475704 , 15.940252 , 23.858517 , 23.737234 , 13.884621 , 20.765696 , 15.456101 , 16.24305 , 21.799377 , 13.161657 , 19.93968 , 21.674849 , 19.766438 , 23.370852 , 23.209932 , 19.655743 , 15.145709 , 16.75448 , 32.84218 , 20.021385 , 20.638344 , 23.61842 , 17.877428 , 30.510242 , 43.739815 , 20.179007 , 22.488018 , 14.906468 , 16.279074 , 23.69828 , 18.070068 , 26.881145 , 20.835695 , 35.763424 , 16.517195 , 25.812237 , 47.97466 , 21.505997 , 16.060717 , 31.166424 , 21.966013 , 18.112715 , 22.984049 , 34.817833 , 30.661045 , 19.36766 , 25.49301 , 18.369967 , 14.297357 , 23.17898 , 28.338715 , 14.903171 , 21.480898 , 28.35976 , 10.96798 , 21.158417 , 22.444817 , 7.4538136, 20.583168 , 44.752457 , 11.438277 , 13.292285 , 21.415586 , 11.619457 , 19.344286 , 10.624224 , 19.948153 , 27.053463 , 16.849163 , 23.626413 , 25.075293 , 16.859615 , 21.492283 , 9.272746 , 19.522285 , 19.510963 , 23.251637 , 19.985184 , 37.08692 , 11.051673 , 12.942635 , 10.689882 , 20.378206 , 22.951574 , 13.608057 , 20.544075 , 19.422127 , 12.785345 , 19.35001 , 25.687586 , 20.213074 , 23.25188 , 8.632252 , 13.273444 , 22.108204 , 23.423246 , 32.223713 , 14.496059 , 42.171318 , 15.669648 , 21.15128 , 23.506771 , 19.632486 , 23.542662 , 6.652353 , 20.856237 , 24.006516 , 22.47102 , 21.927805 ], dtype=float32), XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, importance_type='gain', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, n_estimators=100, n_jobs=1, nthread=None, objective='reg:linear', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, silent=None, subsample=1, verbosity=1))

Steps to reproduce

here is a sample code snippet to run and reproduce the error

from sklearn.datasets import load_boston
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

x, y = load_boston(return_X_y=True)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.3, random_state=42)
ypred, model = train_and_predict_regression(xtrain, ytrain, xtest)

@st.cache(suppress_st_warning=True)
def train_and_predict_regression(xtrain, ytrain, xtest):
    """trains model with training data and predicts"""
    try:
        model = XGBoostRegressor(objective='reg:squarederror')

        model.fit(xtrain, ytrain)
        ypred = model.predict(xtest)

        return ypred, model
    except ValueError as er:
        st.error(er)

Expected behavior:

To be able to

Actual behavior:

Explain the buggy behavior you experience when you go through the steps above. If applicable, add screenshots to help explain your problem.

Is this a regression?

That is, did this use to work the way you expected in the past? yes

Debug info

Streamlit version: (get it with $ streamlit version)
Python version: (get it with $ python --version)
Using Conda? PipEnv? PyEnv? Pex?
OS version:
Browser version:

Additional information

If needed, add any other context about the problem here. For example, did this bug come from https://discuss.streamlit.io or another site? Link the original source here!

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

jrhonecommented, May 18, 2020

We’re failing to hash bytearray

import streamlit as st
from streamlit.hashing import _CodeHasher

b1 = bytearray([1, 2, 3])
b1 in {}

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'bytearray'

c = _CodeHasher()
c.to_bytes(b)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jonathanrhone1/Src/streamlit/streamlit-1/lib/streamlit/hashing.py", line 265, in to_bytes
    if key in self._hashes:
TypeError: unhashable type: 'bytearray'

0reactions

vnguyendccommented, May 26, 2020

Awesome thank you!

On Tue, May 26, 2020 at 5:36 PM Jonathan Rhone notifications@github.com wrote:

Hi @vnguyendc https://github.com/vnguyendc, the workaround is to pass a custom hash func for the XGBRegressor. I believe id is what we want in this case.

@st.cache(hash_funcs={‘xgboost.sklearn.XGBRegressor’: id}) def train_and_predict_regression(xtrain, ytrain, xtest):

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/streamlit/streamlit/issues/1456#issuecomment-634292895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMUYJ3BCKY7ZFF4MMVFU2NLRTQY7RANCNFSM4NBYKFKA .

– Vinh Nguyen Data Scientist/Analyst @ Amruta Inc. vinh.nguyen@amrutainc.com 703 862 6538