question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve performance of sequential part of text IO operations

See original GitHub issue

Currently sequential part of text IO operations (file splitting into portions for further parallel reading by workers) is implemented in pure Python mostly in offset and _read_rows functions. In some cases performance overhead can significantly affect overall IO operation execution time (for example if it is needed to iterate over all lines in file as it is done in https://github.com/modin-project/modin/pull/2607 in the case when skiprows is callable). It is needed to investigate the ways the sequential part can be speed-uped and implement the best one.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mvashishthacommented, Aug 23, 2022

@pyrito yes let’s dedupe.

0reactions
mvashishthacommented, Aug 23, 2022

Duplicate of #4770

Read more comments on GitHub >

github_iconTop Results From Across the Web

Articles: Tuning Java I/O Performance - Oracle
This article discusses and illustrates a variety of techniques for improving Java I/O performance. Most of the techniques center around tuning disk file...
Read more >
What is the Fastest Method for High Performance Sequential ...
Use mapped memory instead of writing to files. This isn't always faster but it allows the opportunity to optimize the I/O in an...
Read more >
On Random vs. Streaming I/O Performance - Simpson Lab
This workload is a great case study for looking some of the ins and outs of I/O performance in general, and the tradeoffs...
Read more >
Basic I/O Operations in Java (Input/Output Streams)
To perform I/O operations faster, Java uses the concept of streams. A stream can be defined as a sequence of data consisting of...
Read more >
Sequential File Programming Patterns and Performance with ...
Sequential access delivers 50 times more data per second. This sequential:random performance ratio is growing as technology improves disk densities and as ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found