Logic bug in arrow_writer?
See original GitHub issueI got some error, and I found it’s caused by batch_examples
being {}
. I wonder if the code should be as follows:
- if batch_examples and len(next(iter(batch_examples.values()))) == 0:
+ if not batch_examples or len(next(iter(batch_examples.values()))) == 0:
return
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
How To Fix Major Bug Issues In Logic Pro 10.7 - YouTube
Top Courses & Kits ⤵️--------------------------------------------- Beat Making In Logic Pro 10.5 - https://bit.ly/3lrLVMb Music Theory ...
Read more >Reading and writing Parquet files — Apache Arrow v10.0.1
Unsupported logical types: JSON, BSON, UUID. If such a type is encountered when reading a Parquet file, the default physical type mapping is...
Read more >XGBooster/github-issues · Datasets at Hugging Face
"As spotted by @cccntu in #4502, there's a logic bug in `ArrowWriter.write_batch` as the if-statement to handle the empty batches as detailed in...
Read more >CHANGELOG - parquet-cpp - Git at Google
[PARQUET-1083] - [C++] Refactor core logic in parquet-scan.cc so that it can be ... [PARQUET-1078] - [C++] Add Arrow writer option to coerce...
Read more >Error:'java.lang.UnsupportedOperationException' for Pyspark ...
EDIT : spark 3.1.1 do not have anymore this bug. ORIGINAL ANSWER : The solution of @Chogg DON'T WORK def _build_spark_session(app_name: str) ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think it would make things confusing because it doesn’t follow our definition of a batch: “the columns of a batch = the keys of the dict”. It would probably break certain behaviors as well. For example if you remove all the columns of a dataset (using
.remove_colums(...)
or.map(..., remove_columns=...)
), the writer has to write 0 columns, and currently the only way to tell the writer to do so usingwrite_batch
is to pass{}
.Yea the message can actually be improved indeed, it’s definitely not clear. Maybe we can add a line right before the call
pa.Table.from_arrays
to make sure the keys of the batch match the field names of the schemaThanks, I added a if-print and I found it does return an empty examples in the chunking function that is passed to
.map()
.