question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DIS: Keywords for multi-threading capabilities

See original GitHub issue

With the addition of the new pyarrow engine, we now have the option to use multiple threads to read a CSV file. (This is also controllable through the pyarrow.set_cpu_count option).

Should we expose a keyword(such as num_threads maybe) to the user as a keyword, or just add an example in the docs(for this case, redirecting to pyarrow.set_cpu_count? In the case of read_csv, this keyword would probably only apply to the pyarrow engines, however it is worth noting that we have had multiple feature requests for parallel CSV reading (e.g. #37955), and it is probably worth it to be configure the number of threads used if we offer multithreading.

Personally, I would prefer having a keyword, as if we decide to add more I/O engines with multithreading capabilities, it would be more convenient to be able to control this option through a keyword.

cc @pandas-dev/pandas-core

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
rhshadrachcommented, Sep 2, 2021

I think we should prefer usage of engine_kwargs when available. This makes it clear to the user that it depends on what engine they are using. It also lessens our technical debt as engines come and go, and change argument names.

0reactions
jrebackcommented, Jan 5, 2022

you can multi threaded reading with the engine=‘pyarrow’ (in 1,4)

Read more comments on GitHub >

github_iconTop Results From Across the Web

c++ - When to use volatile with multi threading? - Stack Overflow
Short & quick answer: volatile is (nearly) useless for platform-agnostic, multithreaded application programming. It does not provide any synchronization, ...
Read more >
Managing threads. The volatile keyword and the yield() method
Hi! We continue our study of multithreading. Today we'll get to know the volatile keyword and the yield() method. Let's dive in :)...
Read more >
Volatile Keyword in Java - Scaler Topics
Basically when multiple threads are reading your variable (atomic operation), but is being modified or written by just one. Go for 'volatile' as ......
Read more >
CON02-C. Do not use volatile as a synchronization primitive
The volatile keyword is sometimes misunderstood to provide atomicity for variables that are shared between threads in a multithreaded program.
Read more >
Should volatile really never be used for multi-threading? - Reddit
The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found