Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError when saving a model with `numpy.bool_` types

See original GitHub issue

numpy.bool_ types are not being correctly serialized to json.

What is the current behavior? The ComplexEncoder class (here) does not handle numpy.bool_ which is not JSON serializable. This raises a TypeError when saving certain models.

If the current behavior is a bug, please provide the steps to reproduce.

model = TabNetClassifier(...)
model.fit(...)  # training data and model parameters contain values of type numpy.bool_
model.save_model('path/to/model')

Expected behavior numpy.bool_ should be cast to python’s bool before being serialized to JSON. Here is my suggested fix. Please let me know if this is acceptable for a PR:

class ComplexEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        if isinstance(obj, np.bool_):
            return bool(obj)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, obj)

Other relevant information: poetry version: “poetry-core>=1.0.0” python version: “^3.9” Operating System: “Linux Kernel 5.18.14-arch1-1” Additional tools: CUDA Version: 11.7 Driver Version: 515.57

Additional context

Here’s a stacktrace:

  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/abstract_model.py", line 375, in save_model
    json.dump(saved_params, f, cls=ComplexEncoder)
  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/utils.py", line 339, in default
    return json.JSONEncoder.default(self, obj)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bool_ is not JSON serializable

I ran into this when trying tabnet in a kaggle competition. If you need to, you can look here in my code where the error happens.

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:14

Top GitHub Comments

2reactions

andreas-wolfcommented, Aug 23, 2022

@Optimox Hi. I don’t know if that happens in the AMEX competition, but I guess so, since the json encoding is not working for dtypes other than np.int64.

Sorry for not being clear enough in my description of the problem. I’ve attached therefor a minimal working example to trigger the bug.

As said the problem is that y_train aka the target variable is of type bool (or np.int8 in my case) and you’re only handling np.int64 in ComplexEncoder https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L338

https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L336-L341

  import os
  import wget
  import pandas as pd
  import numpy as np
  from pathlib import Path
  from pytorch_tabnet.tab_model import TabNetClassifier
  url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
  dataset_name = 'census-income'
  out = Path(os.getcwd()+'/data/'+dataset_name+'.csv')
  out.parent.mkdir(parents=True, exist_ok=True)
  if out.exists():
      print("File already exists.")
  else:
      print("Downloading file...")
      wget.download(url, out.as_posix())
  features = ['39', ' 77516', ' 13']
  train = pd.read_csv(out)
  train = train[features + [' <=50K']]
  train['target'] = train[' <=50K'] == '<=50K'
  train = train.drop(columns=[' <=50K'])
  if "Set" not in train.columns:
      train["Set"] = np.random.choice(["train", "valid", "test"], p =[.8, .1, .1], size=(train.shape[0],))
  
  train_indices = train[train.Set=="train"].index
  valid_indices = train[train.Set=="valid"].index
  test_indices = train[train.Set=="test"].index
  
  X_train = train[features].values[train_indices]
  y_train = train['target'].values[train_indices]
  
  X_valid = train[features].values[valid_indices]
  y_valid = train['target'].values[valid_indices]
  
  X_test = train[features].values[test_indices]
  y_test = train['target'].values[test_indices]
  
  clf = TabNetClassifier()
  clf.fit(X_train=X_train, y_train=y_train,max_epochs=2)
  
  saving_path_name = "./tabnet_model_test_1"
  saved_filepath = clf.save_model(saving_path_name)

1reaction

Optimoxcommented, Dec 17, 2022

thanks I’ll fix this soon

Top Results From Across the Web

TypeError: 'numpy.bool_' object is not iterable - Stack Overflow

It means that you have tried to iterate over an instance of numpy.bool_ , which was likely an element in an array. Now...

Change data type of given numpy array - GeeksforGeeks

Solution : We will use numpy.astype() function to change the data type of the underlying data of the given numpy array.

Chapter 4. NumPy Basics: Arrays and Vectorized Computation

bool ? Boolean type storing True and False values. object, O, Python object ... a string that cannot be converted to float64 ),...

Python Booleans: Optimize Your Code With Truth Values

In this tutorial, you'll learn how to: Manipulate Boolean values with Boolean operators; Convert Booleans to other types; Convert other types to Python...

Support for nullable bool, int in dataframes · Issue #504 - GitHub

This may have to be a boolean array since numpy doesn't do bit ... Upgrading from 0.7.5 to 0.7.6 throws TypeError when saving...