question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PERF: slow `concat`

See original GitHub issue

Measured on master (https://github.com/modin-project/modin/commit/cfafbb254c221dd4f739a9cf5af17c9e8cdf13c3), Ray, 8 cores.

Problem: too much time is spent doing concat, it can be much faster. Pandas vs Modin: 0.89 sec vs 4,5 sec.

Possible solution: compute new_widths via _column_widths_cache where possible.

Script:

import modin.pandas as pd
import numpy as np
from time import time

random_state = np.random.RandomState(seed=42)
array = random_state.rand(10**6, 35)

df1 = pd.DataFrame(array)
df2 = pd.DataFrame(array)

df1 = df1 - 1
df2 = df2 - 2

start = time()
df = pd.concat([df1, df2], axis=1, copy=False)
print(f"concat time: {time()-start}") <-- 4,5 sec

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
pyritocommented, Jul 27, 2022

@anmyachev yeah of course! I was a bit puzzled to see modin finish the concat quickly in BenchmarkMode, but then I recall that benchmark mode resolves computations immediately, so it didn’t wait on the binary operation for df1 and df2.

1reaction
pyritocommented, Jul 27, 2022

Nevertheless, I think your PR addresses this @anmyachev

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is a SELECT that uses CONCAT() so slow? - MySQL实验室
When running queries on a large table and using a function like CONCAT() in the WHERE clause, the queries can be much slower...
Read more >
python - Is there a better way to improve the concat speed?
I'm finding a better way to speeding up the performance of my python code. Target data is the transaction record(per minute) of the...
Read more >
Why String Concatenation so Slow? | by Beribey - Medium
Why String Concatenation so Slow? Why adding string will affect the memory and performance of the system? Photo by ...
Read more >
Webpack 5 triggers a slow V8 concat path resulting in ... - GitHub
Webpack 5 triggers a slow V8 concat path resulting in significant performance degradation in some scenarios #14580.
Read more >
Performance: String Concatenation in SQL Server
String concatenation in SQL Server can be pretty quick but under certain circumstances it can really slow down. Something to be aware of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found