question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError when saving a model with `numpy.bool_` types

See original GitHub issue

numpy.bool_ types are not being correctly serialized to json.

What is the current behavior? The ComplexEncoder class (here) does not handle numpy.bool_ which is not JSON serializable. This raises a TypeError when saving certain models.

If the current behavior is a bug, please provide the steps to reproduce.

model = TabNetClassifier(...)
model.fit(...)  # training data and model parameters contain values of type numpy.bool_
model.save_model('path/to/model')

Expected behavior numpy.bool_ should be cast to python’s bool before being serialized to JSON. Here is my suggested fix. Please let me know if this is acceptable for a PR:

class ComplexEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        if isinstance(obj, np.bool_):
            return bool(obj)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, obj)

Other relevant information: poetry version: “poetry-core>=1.0.0” python version: “^3.9” Operating System: “Linux Kernel 5.18.14-arch1-1” Additional tools: CUDA Version: 11.7 Driver Version: 515.57

Additional context

Here’s a stacktrace:

  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/abstract_model.py", line 375, in save_model
    json.dump(saved_params, f, cls=ComplexEncoder)
  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/utils.py", line 339, in default
    return json.JSONEncoder.default(self, obj)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bool_ is not JSON serializable

I ran into this when trying tabnet in a kaggle competition. If you need to, you can look here in my code where the error happens.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:14

github_iconTop GitHub Comments

2reactions
andreas-wolfcommented, Aug 23, 2022

@Optimox Hi. I don’t know if that happens in the AMEX competition, but I guess so, since the json encoding is not working for dtypes other than np.int64.

Sorry for not being clear enough in my description of the problem. I’ve attached therefor a minimal working example to trigger the bug.

As said the problem is that y_train aka the target variable is of type bool (or np.int8 in my case) and you’re only handling np.int64 in ComplexEncoder https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L338

https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L336-L341

  import os
  import wget
  import pandas as pd
  import numpy as np
  from pathlib import Path
  from pytorch_tabnet.tab_model import TabNetClassifier
  url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
  dataset_name = 'census-income'
  out = Path(os.getcwd()+'/data/'+dataset_name+'.csv')
  out.parent.mkdir(parents=True, exist_ok=True)
  if out.exists():
      print("File already exists.")
  else:
      print("Downloading file...")
      wget.download(url, out.as_posix())
  features = ['39', ' 77516', ' 13']
  train = pd.read_csv(out)
  train = train[features + [' <=50K']]
  train['target'] = train[' <=50K'] == '<=50K'
  train = train.drop(columns=[' <=50K'])
  if "Set" not in train.columns:
      train["Set"] = np.random.choice(["train", "valid", "test"], p =[.8, .1, .1], size=(train.shape[0],))
  
  train_indices = train[train.Set=="train"].index
  valid_indices = train[train.Set=="valid"].index
  test_indices = train[train.Set=="test"].index
  
  X_train = train[features].values[train_indices]
  y_train = train['target'].values[train_indices]
  
  X_valid = train[features].values[valid_indices]
  y_valid = train['target'].values[valid_indices]
  
  X_test = train[features].values[test_indices]
  y_test = train['target'].values[test_indices]
  
  clf = TabNetClassifier()
  clf.fit(X_train=X_train, y_train=y_train,max_epochs=2)
  
  saving_path_name = "./tabnet_model_test_1"
  saved_filepath = clf.save_model(saving_path_name)
1reaction
Optimoxcommented, Dec 17, 2022

thanks I’ll fix this soon

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError: 'numpy.bool_' object is not iterable - Stack Overflow
It means that you have tried to iterate over an instance of numpy.bool_ , which was likely an element in an array. Now...
Read more >
Change data type of given numpy array - GeeksforGeeks
Solution : We will use numpy.astype() function to change the data type of the underlying data of the given numpy array.
Read more >
Chapter 4. NumPy Basics: Arrays and Vectorized Computation
bool ? Boolean type storing True and False values. object, O, Python object ... a string that cannot be converted to float64 ),...
Read more >
Python Booleans: Optimize Your Code With Truth Values
In this tutorial, you'll learn how to: Manipulate Boolean values with Boolean operators; Convert Booleans to other types; Convert other types to Python...
Read more >
Support for nullable bool, int in dataframes · Issue #504 - GitHub
This may have to be a boolean array since numpy doesn't do bit ... Upgrading from 0.7.5 to 0.7.6 throws TypeError when saving...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found