question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DualListTreeSelect performance hit with large data sets

See original GitHub issue

Scope:

pf4-component-mapper dual-list-tree-select

Description

Cost management users are encountering latency with our settings page when we have large data sets. For our cost-demo user on stage-beta, we have 40,000 GCP items under one folder.

I tested the page loading time for 1k, 5k, 10k, 20k, and 40k items. For 40k selected items, we’re seeing page loads of 55 seconds.

  • 40k is about 55s
  • 20k is 12s
  • 10k is 6s
  • 5k is 3s
  • 1k is less than 2s.

That said, the DDF component appears to be converting the options provided via our schema and applies a default sort. There is also a lot of recursion happening with the selectedOptions function.

See https://github.com/data-driven-forms/react-forms/blob/master/packages/pf4-component-mapper/src/dual-list-tree-select/dual-list-tree-select.js#L10

Note that the non-tree version of dual-list has an isSortable prop, but the tree version does not. If we can omit the default sort (e.g., using isSortable={false}), we may see a significant performance increase.

Ideally, we would pre-sort server-side via the Cost Management settings API.

Schema

See https://console.stage.redhat.com/api/cost-management/v1/settings/

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dlabrecqcommented, Oct 20, 2021

We’ve mainly been testing this page: https://console.stage.redhat.com/beta/settings/applications/cost-management

With the DDF schema from: https://console.stage.redhat.com/api/cost-management/v1/settings/

I can forward the user/pwd offline

0reactions
Hyperkid123commented, Oct 20, 2021

@dlabrecq I was doing a little bit of research about the issue. Here are my findings.

Testing env:

  • dualist select component with sortable and tree props set to true (required to access the pointed out convertOptions function)
  • 10000 input options sorted prepared in the worst-case scenario. Items are in reverse order
  • using chrome profiling tools to get timings

Initial render profile summary

Screenshot from 2021-10-20 11-57-18

The initial load takes around 32s with the input data

Screenshot from 2021-10-20 11-58-26

  • Over 50% of the time (15.4 s) is taken by the layout activities (rendering)
  • the data preparation phase takes 26% (7.6s)

Convert options performance

The convert options function pointed in the issue description is not the problem here.

Screenshot from 2021-10-20 12-01-18

  • total execution time is 1118.9 ms (1.1s) which is roughly 3.9% of the whole execution time

Here is a total execution time comparison. The highlighted segment is the convert function execution portion.

Screenshot from 2021-10-20 12-30-34

Most expensive function calls

The most resource-heavy function operations are react/react-dom routines

Screenshot from 2021-10-20 12-04-15

These are direct DOM manipulation functions that are directly tied to a component implementation.

Profiling with sort and three props disabled and input data in sorted order

To dismiss any potential profiling mistakes, I have also done a run with the sort option disabled. The convertOptions function is not called in this scenario. the results are a bit better but the relative resource allocation is basically the same.

Screenshot from 2021-10-20 12-08-01

You can see the sorted list of activities that references the same functions

Screenshot from 2021-10-20 12-08-52

And the convertOptions is not present in the list Screenshot from 2021-10-20 12-09-05

Conclusion

The convert options are not responsible for the performance degradation. In fact, the impact is negligible when compared to the rest of the factors.

In my opinion, an implementation change of the dual list selector component to use the virtual list is the only option to fix the issue.

@dlabrecq In case that I have missed something, I would ask you to provide me with the exact dataset you were testing so I can run the profiling again. You can share it in gist snippet or send me a private message if you don’t want the data to be public.

EDIT: If you want I will provide the profiling data so you can open and inspect them in your browser.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · data-driven-forms/react-forms - GitHub
Contribute to data-driven-forms/react-forms development by creating an account on ... DualListTreeSelect performance hit with large data sets Investiage ...
Read more >
Performance tips | TensorFlow Datasets
Large datasets are sharded (split in multiple files) and typically do not fit in memory, so they should not be cached. Shuffle and...
Read more >
Impact of Dataset Size on Classification Performance - MDPI
Typically, large datasets lead to better classification performance and small datasets may trigger over-fitting [1,2,3]. In practice, however, collecting ...
Read more >
The Full Picture: Managing Searches Across Complex Data Sets
Indexing performance can degrade when there are large amounts of nested documents. Elasticsearch must internally reindex both the parent document and all its ......
Read more >
Analyzing and Interpreting Large Datasets - CDC
Before attempting data analysis for large datasets, it is very important you locate the survey sampling methodology, questionnaire, data variable dictionary and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found