question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Setting 'user_splits_fixed' in categorical binning

See original GitHub issue

Hi @guillermo-navas-palencia,

There is a problem when I try to setting value to user_splits_fixed. Suppose I have column with raito of event rate in each value like this:

value raito
-1 0.011665
2 0
3 0.0133333
4 0.166667
7 0
8 0.0246041
9 0
10 0.025641

Then when I set user_splits = [[ 2., 7., 9., 3., 10., 4.],[8],[-1]] ,user_splits_fixed=[True, True, True], monotonic_trend=None,dtype='categorical' and the program raises error ValueError: Fixed user_splits [list([2.0, 7.0, 9.0, 3.0, 10.0, 4.0])] are removed because produce pure prebins. Provide different splits to be fixed.. What thing is wrong here?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
guillermo-navas-palenciacommented, Mar 27, 2020

I created this example to reproduce your problem:

import numpy as np

np.random.seed(0)
n = 100000

x = sum([[i] * n for i in [-1, 2, 3, 4, 7, 8, 9, 10]], [])
y = list(np.random.binomial(1, 0.011665, n))
y += list(np.zeros(n))
y += list(np.random.binomial(1, 0.0133333, n))
y += list(np.random.binomial(1, 0.166667, n))
y += list(np.zeros(n))
y += list(np.random.binomial(1, 0.0246041, n))
y += list(np.zeros(n))
y += list(np.random.binomial(1, 0.025641, n))

user_splits = [[2., 7., 9., 3., 10., 4.], [8], [-1]]
user_splits_fixed = [True, True, True]

optb1 = OptimalBinning(dtype="categorical", user_splits=user_splits)
optb2 = OptimalBinning(dtype="categorical", user_splits=user_splits,
                       user_splits_fixed=user_splits_fixed)

for optb in (optb1, optb2):
    optb.fit(x, y)
    optb.binning_table.build()
    assert optb.binning_table.iv == approx(0.09345086993827473, rel=1e-6)

After commit a6d015b2e9365ecc05cba48421972c562f7960c7, it must work as expected.

1reaction
nic9lif3commented, Mar 27, 2020

Hi @guillermo-navas-palencia The splits result with default option of OptimalBinning is [[ 2., 7., 9., 3., 10., 4. 8],[-1]]. When I parse this splits as an option into user_splits parameter, it return the same error 😃.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Feature Engineering Examples: Binning Categorical Features
How to use NumPy or Pandas to quickly bin categorical features ... different categories, you're basically adding 49 columns to your dataset.
Read more >
Optimal binning methods for categorical variables
I'm running a multinomial logit to predict the outcome of a categoric response variable. I have both continuous and categoric independent ...
Read more >
A guide to binning data with python (numeric and categorical)
In this video, we discuss binning data with python using some nice python pandas functionality. We start by binning categorical data with ...
Read more >
3.6 Convert numeric to categorical by binning - Bookdown
Use cut() to set the bin boundaries. The combination of include.lowest = T and right = F results in bins of the form...
Read more >
Tutorial: optimal binning with binary target
Also, for this particular example, we set a cat_cutoff=0.1 to create bin others with ... Note that for categorical variables the optimal bins...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found