question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Performance] Data viewer can't handle large DFs

See original GitHub issue

Environment data

  • VS Code version: 1.33.1
  • Extension version (available under the Extensions sidebar): 2019.4.1
  • OS and version: Windows 7
  • Python version (& distribution if applicable, e.g. Anaconda): Anaconda distribution, Python 3.6.2
  • Type of virtual environment used (N/A | venv | virtualenv | conda | …): conda
  • Relevant/affected Python packages and their versions: None

Expected behaviour

View large DataFrames (>1000 columns, >1000 rows) in under 1 minute

Actual behaviour

When opening large DFs (current is 709x3201) the Data Viewer stops at showing the structure with all values at ‘loading …’ (current runtime 20 minutes).

Steps to reproduce:

  1. Create synthetic data frame: 3000 series of 700 floats each
  2. In variable explorer click view in data viewer

Logs

Output for Python in the Output panel (ViewOutput, change the drop-down the upper-right of the Output panel to Python)

None

Output from Console under the Developer Tools panel (toggle Developer Tools on under Help; turn on source maps to make any tracebacks be useful by running Enable source map support for extension debugging)

Can't find relevant logs. Is 'View in Data Viewer' supposed to show up in the logs at some point ?

I was really looking forward to these features, so thanks for getting them in there! However, when dealing with quantitative finance problems we often have very large dataframes, and it would be nice to be able to use the data viewer to explore them.

Best regards,

Francisco

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
rchiodocommented, May 31, 2019

I just submitted a fix for the column virtualization. Please feel free to try it out in our next insiders build (should be ready in about half an hour).

It should support any number of columns and rows, but it will ask if you’re sure you want to open the view if there’s more than 1000 columns. More than a 1000 columns causes the initial bring up to take awhile and fetching the data can take longer too (it has to turn the rows into a JSON string in order to send it to our UI - function of how VS code works).

1000 x 10000 DF takes me about 5 minutes to load.

However it also now supports filtering with expressions on numeric columns. Example:

Filter

0reactions
FranciscoRZcommented, Jun 3, 2019

That’s really impressive, thanks a lot!

Take care,

Francisco

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Performance] Data viewer can't handle large DFs #3434
When opening large DFs (current is 709x3201) the Data Viewer stops at showing the structure with all values at 'loading ...' (current runtime...
Read more >
Why DFS Replication Is Not Working (And How to Fix It)
1. Replicating Files Over High-Latency, Long-Distance WANs (or Wide-Area Networks) ... A common source of DFS replication issues occurs when you're sending data ......
Read more >
How to monitoring DFS Replication in Windows Server 2016
Describe step by step how can check dfs replication status with powershell or from DFS ... How to check Event Viewer Logs for...
Read more >
DFS Replication - FAQ | Microsoft Learn
This FAQ answers questions about Distributed File System (DFS) Replication (also known as DFS-R or DFSR) for Windows Server.
Read more >
Reduce data loss when migrating to a DFS | LinkTek.com
Handling a Large Number of Small Files​​ This seriously restricts the performance. This is because DFS systems, especially those adopted to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found