qcut() should make sure the bins bounderies are unique before passing them to _bins_to_cuts
See original GitHub issuexref #8309
for example:
pd.qcut([1,1,1,1,1,1,1,1,1,1,1,1,1,5,5,5], [0.00001, 0.5])
will raise “ValueError: Bin edges must be unique: array([ 1., 1.])” exception
Fix suggestion - add one new line:
def qcut(x, q, labels=None, retbins=False, precision=3):
if com.is_integer(q):
quantiles = np.linspace(0, 1, q + 1)
else:
quantiles = q
bins = algos.quantile(x, quantiles)
--->bins = np.unique(bins)
return _bins_to_cuts(x, bins, labels=labels, retbins=retbins,
precision=precision, include_lowest=True)
Issue Analytics
- State:
- Created 9 years ago
- Comments:21 (11 by maintainers)
Top Results From Across the Web
How to qcut with non unique bin edges? - Stack Overflow
The problem is that pandas.qcut chooses the bins/quantiles so that each one has the same number of records, but all records with the...
Read more >Binning Data with Pandas qcut and cut
Pandas qcut and cut are both used to bin continuous values into discrete buckets or bins. This article explains the differences between the ......
Read more >All Pandas qcut() you should know for binning numerical data ...
Pandas has 2 built-in functions cut() and qcut() for transforming numerical data into categorical data. cut() bins data into discrete intervals ...
Read more >Pandas cut And qcut Method For Data Binning - Regenerative
Another method for binning. But the concept is different. In qcut, when you pass q=4, it will try to divide the population equally...
Read more >pandas.qcut — pandas 1.5.2 documentation
Bins are represented as categories when categorical data is returned. binsndarray of floats. Returned only if retbins is True. Notes. Out of bounds...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think we could add a “duplicate_edges” parameter, with the following options:
bins = np.unique(bins)
line as in https://github.com/pydata/pandas/issues/7751#issue-37814702. This would result in less bins than specified, and some larger (with more elements) than others.Does it look reasonable? What do you think?
dukebody’s suggestion looks good to me.
For my use case, I would prefer for qcut to just return the (non-unique) bin list, and let me handle it.