PERF: slow `concat`
See original GitHub issueMeasured on master (https://github.com/modin-project/modin/commit/cfafbb254c221dd4f739a9cf5af17c9e8cdf13c3), Ray, 8 cores.
Problem: too much time is spent doing concat
, it can be much faster. Pandas vs Modin: 0.89 sec vs 4,5 sec.
Possible solution: compute new_widths
via _column_widths_cache
where possible.
Script:
import modin.pandas as pd
import numpy as np
from time import time
random_state = np.random.RandomState(seed=42)
array = random_state.rand(10**6, 35)
df1 = pd.DataFrame(array)
df2 = pd.DataFrame(array)
df1 = df1 - 1
df2 = df2 - 2
start = time()
df = pd.concat([df1, df2], axis=1, copy=False)
print(f"concat time: {time()-start}") <-- 4,5 sec
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Why is a SELECT that uses CONCAT() so slow? - MySQL实验室
When running queries on a large table and using a function like CONCAT() in the WHERE clause, the queries can be much slower...
Read more >python - Is there a better way to improve the concat speed?
I'm finding a better way to speeding up the performance of my python code. Target data is the transaction record(per minute) of the...
Read more >Why String Concatenation so Slow? | by Beribey - Medium
Why String Concatenation so Slow? Why adding string will affect the memory and performance of the system? Photo by ...
Read more >Webpack 5 triggers a slow V8 concat path resulting in ... - GitHub
Webpack 5 triggers a slow V8 concat path resulting in significant performance degradation in some scenarios #14580.
Read more >Performance: String Concatenation in SQL Server
String concatenation in SQL Server can be pretty quick but under certain circumstances it can really slow down. Something to be aware of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@anmyachev yeah of course! I was a bit puzzled to see modin finish the
concat
quickly in BenchmarkMode, but then I recall that benchmark mode resolves computations immediately, so it didn’t wait on the binary operation fordf1
anddf2
.Nevertheless, I think your PR addresses this @anmyachev