[Performance] Data viewer can't handle large DFs
See original GitHub issueEnvironment data
- VS Code version: 1.33.1
- Extension version (available under the Extensions sidebar): 2019.4.1
- OS and version: Windows 7
- Python version (& distribution if applicable, e.g. Anaconda): Anaconda distribution, Python 3.6.2
- Type of virtual environment used (N/A | venv | virtualenv | conda | …): conda
- Relevant/affected Python packages and their versions: None
Expected behaviour
View large DataFrames (>1000 columns, >1000 rows) in under 1 minute
Actual behaviour
When opening large DFs (current is 709x3201) the Data Viewer stops at showing the structure with all values at ‘loading …’ (current runtime 20 minutes).
Steps to reproduce:
- Create synthetic data frame: 3000 series of 700 floats each
- In variable explorer click view in data viewer
Logs
Output for Python
in the Output
panel (View
→Output
, change the drop-down the upper-right of the Output
panel to Python
)
None
Output from Console
under the Developer Tools
panel (toggle Developer Tools on under Help
; turn on source maps to make any tracebacks be useful by running Enable source map support for extension debugging
)
Can't find relevant logs. Is 'View in Data Viewer' supposed to show up in the logs at some point ?
I was really looking forward to these features, so thanks for getting them in there! However, when dealing with quantitative finance problems we often have very large dataframes, and it would be nice to be able to use the data viewer to explore them.
Best regards,
Francisco
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:14 (7 by maintainers)
Top GitHub Comments
I just submitted a fix for the column virtualization. Please feel free to try it out in our next insiders build (should be ready in about half an hour).
It should support any number of columns and rows, but it will ask if you’re sure you want to open the view if there’s more than 1000 columns. More than a 1000 columns causes the initial bring up to take awhile and fetching the data can take longer too (it has to turn the rows into a JSON string in order to send it to our UI - function of how VS code works).
1000 x 10000 DF takes me about 5 minutes to load.
However it also now supports filtering with expressions on numeric columns. Example:
That’s really impressive, thanks a lot!
Take care,
Francisco