Getting "ValueError: range() arg 3 must not be zero" error for multi iteration checks
See original GitHub issueWe are evaluating data-diff for our usecase. We are facing issue when multi step iteration is being performed ie when we are reducing bisection-threshold This is working fine when bisection-threshold is high enough so that everything is done in one iteration.
data-diff trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 100000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
In second case when we reduced bisection-threshold enough so that all diffs can’t be performed in one iteration
data-diff trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
getting following error
ValueError: range() arg 3 must not be zero
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 600, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 433, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 493, in _diff_tables
yield from self._bisect_and_diff_tables(table1, table2, level=level, max_rows=max(count1, count2))
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 446, in _bisect_and_diff_tables
checkpoints = table1.choose_checkpoints(self.bisection_factor - 1)
File "/usr/local/lib/python3.9/dist-packages/data_diff/diff_tables.py", line 180, in choose_checkpoints
checkpoints = split_space(self.min_key.int, self.max_key.int, count)
File "/usr/local/lib/python3.9/dist-packages/data_diff/utils.py", line 19, in split_space
return list(range(start, end, (size + 1) // (count + 1)))[1 : count + 1]```
Issue Analytics
- State:
- Created a year ago
- Comments:9
Top GitHub Comments
Looks like this was fixed
We have a new implementation for alphanumerics in
master
, that I believe should fix this issue.Sorry it took so long, but please try now and see if it helps.