CI: `test-fuzzydata` fails when `sample` uses a small `frac`
See original GitHub issuetest-fuzzydata
has been failing occasionally as can be seen here: https://github.com/modin-project/modin/actions/runs/3062497807/jobs/4943541816
Stack trace
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/generator.py:379: in generate_workflow
raise e
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/generator.py:367: in generate_workflow
wf.execute_current_operation(next_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/workflow.py:197: in execute_current_operation
raise e
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/workflow.py:168: in execute_current_operation
new_artifact = self.current_operation.execute(new_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/operation.py:172: in execute
result = self.materialize(new_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/clients/pandas.py:114: in materialize
new_df = eval(self.code)
<string>:1: in <module>
???
modin/logging/logger_decorator.py:128: in run_and_log
return obj(*args, **kwargs)
modin/_compat/pandas_api/latest/base.py:255: in sample
return self._sample(
modin/logging/logger_decorator.py:128: in run_and_log
return obj(*args, **kwargs)
modin/pandas/base.py:2451: in _sample
return self._default_to_pandas(
modin/logging/logger_decorator.py:128: in run_and_log
return obj(*args, **kwargs)
modin/pandas/base.py:431: in _default_to_pandas
result = getattr(self._pandas_class, op)(pandas_obj, *args, **kwargs)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/core/generic.py:5438: in sample
size = sample.process_sampling_size(n, frac, replace)
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
How Sample Size Affects the Margin of Error - Dummies.com
Sample size and margin of error have an inverse relationship. When your sample increases, your margin of error goes down — to a...
Read more >Small sample size confidence intervals (video) - Khan Academy
Sample mean +/- the margin of error gives us the confidence interval. If we are using a 95% confidence level, then it can...
Read more >Determining sample size based on confidence and margin of ...
What is the smallest sample size required to obtain the desired margin of error ? So let's just remind ourselves what the confidence...
Read more >How Big a Sample Do I Need? - BrownMath.com
Answer: To find an 95% CI with a margin of error no more than ±3.5 percentage points, where you have no idea of...
Read more >Sample Size Calculator
This free sample size calculator determines the sample size required to meet a given set of constraints. Also, learn more about population standard ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@pyrito These are precisely the kinds of errors I wanted this project to surface! Thanks for digging in!
I haven’t been monitoring this modin integration heavily. But I have some free time this week, and I think I can make some improvements to help improve the testing/debugging experience):
@mvashishtha @suhailrehman I spent some time digging into the fuzzydata code base and I don’t think the issue is there. Rather, I think we are hitting a very, very specific edge case in how Modin handles
sample
. If you see here: https://github.com/modin-project/modin/blob/eca9a936846faa31b116a1e58b1114d90cfa44d8/modin/pandas/base.py#L2439, it’s actually possible to end up gettingn
to be 0, so it’ll end up executing the wrong code path that hasn
andfrac
set.