question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

toCSV after joinOuter running very slow

See original GitHub issue

Ashley, thanks for your awesome work on everything. I’m new to JavaScript and I’m not sure if my issue is related to something I’m doing or if there’s an issue.

I’m having an issue with writing to CSV after performing an outer join. I’ve been able to verify that my data frames are being created. When I display head as pictured below the process is a little slow, but writing to CSV takes minutes to complete. I originally thought it wasn’t writing, but it does seem to write after some period of time. Additionally, the script seems to hang and I don’t return to the command line.

`const exceptDF = cleanRev.joinOuter(cleanSF,
    cleanRev => cleanRev.sfKey,
    cleanSF => cleanSF.sfKey,
    (cleanRev, cleanSF) => {
        return {
            index: cleanRev ? cleanRev.sfKey : cleanSF.sfkey,
            swanKey: cleanRev ? cleanRev.sfKey : undefined,
            sfKey: cleanSF ? cleanSF.sfKey : undefined
        };
    }
);

console.log(exceptDF.head(3).toString());  //this works, but it's slow

exceptDF.asCSV().writeFile('exceptDF.csv');  //this writes the file, but it takes several minutes and I don't return to the command line in the terminal`

For reference, I’m loading two csv files to different data frames, doing some manipulation, and then performing the outer join. Each file has around 4,000 rows and 20 columns.

Thanks for any input.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
ashleydaviscommented, May 21, 2019

Fantastic! Thanks again for logging the feedback.

Please make sure you star this repo!

0reactions
dneagoycommented, May 21, 2019

Ashley,

Sorry for the delay here and thanks very much for your help. I’m new to JavaScript, but the calls to .bake() make a phenomenal difference in speed. It seems like that was the main reason for the slowdown. I’ll need to read through the documentation a bit more to get a better understanding of when to use them.

I see a similar time to what you saw. Thank you for all the help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas to_csv() slow saving large dataframe - Stack Overflow
The following command is enough: df['Column'] = df['Column'].astype(str) ... I wrote this huge dataframe to csv using polars :
Read more >
POwershell Results Very Slow to Output to CSV - TechNet
The script below is running very, very, very slow. it is taking about 10 minutes per object to be written to a CSV...
Read more >
Pyspark loop through columns - Chiara Gabbani
PySpark map() Transformation is used to loop/iterate through the PySpark ... I've noticed that my code runs slower as Spark spends a lot...
Read more >
ogr2ogr Spatialite to csv very slow - GIS Stack Exchange
It works quite fast for me. I used http://download.geofabrik.de/europe-latest.osm.pbf as sample data and converted the points layer into ...
Read more >
csv export extremely slow — oracle-tech
when exporting query data to csv, it runs super slow and does not finish.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found