Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CI: `test-fuzzydata` fails when `sample` uses a small `frac`

See original GitHub issue

test-fuzzydata has been failing occasionally as can be seen here: https://github.com/modin-project/modin/actions/runs/3062497807/jobs/4943541816

Stack trace

/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/generator.py:379: in generate_workflow
    raise e
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/generator.py:367: in generate_workflow
    wf.execute_current_operation(next_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/workflow.py:197: in execute_current_operation
    raise e
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/workflow.py:168: in execute_current_operation
    new_artifact = self.current_operation.execute(new_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/core/operation.py:172: in execute
    result = self.materialize(new_label)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/fuzzydata/clients/pandas.py:114: in materialize
    new_df = eval(self.code)
<string>:1: in <module>
    ???
modin/logging/logger_decorator.py:128: in run_and_log
    return obj(*args, **kwargs)
modin/_compat/pandas_api/latest/base.py:255: in sample
    return self._sample(
modin/logging/logger_decorator.py:128: in run_and_log
    return obj(*args, **kwargs)
modin/pandas/base.py:2451: in _sample
    return self._default_to_pandas(
modin/logging/logger_decorator.py:128: in run_and_log
    return obj(*args, **kwargs)
modin/pandas/base.py:431: in _default_to_pandas
    result = getattr(self._pandas_class, op)(pandas_obj, *args, **kwargs)
/usr/share/miniconda3/envs/modin/lib/python3.8/site-packages/pandas/core/generic.py:5438: in sample
    size = sample.process_sampling_size(n, frac, replace)

Issue Analytics

State:
Created a year ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

suhailrehmancommented, Sep 15, 2022

@pyrito These are precisely the kinds of errors I wanted this project to surface! Thanks for digging in!

I haven’t been monitoring this modin integration heavily. But I have some free time this week, and I think I can make some improvements to help improve the testing/debugging experience):

I think the same workflow should run on both dask and ray (right now, it’s generating two separate random workflows running on ray and dask, respectively)
Artifacts are not being written out on failure (significant pain to rerun and verify if the bug has been fixed)
fuzzydata should also run the workflow against pandas to measure performance regressions - maybe if the scale factor is significant enough to warrant it - the primary focus right now is correctness and not performance.

1reaction

pyritocommented, Sep 15, 2022

@mvashishtha @suhailrehman I spent some time digging into the fuzzydata code base and I don’t think the issue is there. Rather, I think we are hitting a very, very specific edge case in how Modin handles sample. If you see here: https://github.com/modin-project/modin/blob/eca9a936846faa31b116a1e58b1114d90cfa44d8/modin/pandas/base.py#L2439, it’s actually possible to end up getting n to be 0, so it’ll end up executing the wrong code path that has n and frac set.

Top Results From Across the Web

How Sample Size Affects the Margin of Error - Dummies.com

Sample size and margin of error have an inverse relationship. When your sample increases, your margin of error goes down — to a...

Small sample size confidence intervals (video) - Khan Academy

Sample mean +/- the margin of error gives us the confidence interval. If we are using a 95% confidence level, then it can...

Determining sample size based on confidence and margin of ...

What is the smallest sample size required to obtain the desired margin of error ? So let's just remind ourselves what the confidence...

How Big a Sample Do I Need? - BrownMath.com

Answer: To find an 95% CI with a margin of error no more than ±3.5 percentage points, where you have no idea of...

Sample Size Calculator

This free sample size calculator determines the sample size required to meet a given set of constraints. Also, learn more about population standard ......