DIS: Keywords for multi-threading capabilities
See original GitHub issueWith the addition of the new pyarrow engine, we now have the option to use multiple threads to read a CSV file. (This is also controllable through the pyarrow.set_cpu_count
option).
Should we expose a keyword(such as num_threads
maybe) to the user as a keyword, or just add an example in the docs(for this case, redirecting to pyarrow.set_cpu_count
? In the case of read_csv
, this keyword would probably only apply to the pyarrow
engines, however it is worth noting that we have had multiple feature requests for parallel CSV reading (e.g. #37955), and it is probably worth it to be configure the number of threads used if we offer multithreading.
Personally, I would prefer having a keyword, as if we decide to add more I/O engines with multithreading capabilities, it would be more convenient to be able to control this option through a keyword.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
I think we should prefer usage of engine_kwargs when available. This makes it clear to the user that it depends on what engine they are using. It also lessens our technical debt as engines come and go, and change argument names.
you can multi threaded reading with the engine=‘pyarrow’ (in 1,4)