question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug: Meltano is slow to start

See original GitHub issue

There appears to be a bug causing some users to have a slow startup, up to 45 seconds in the provided logs

Pre 2.x:

(.venv) ➜  dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ meltano --version
meltano, version 1.100.0
(.venv) ➜  dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ time meltano --log-level debug config tap-exchange-rates-api list
2022-08-08T22:33:35.566057Z [debug    ] Creating engine <meltano.core.project.Project object at 0x10e248970>@sqlite:////Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/meltano.db
2022-08-08T22:33:35.815951Z [debug    ] Using selector: KqueueSelector
2022-08-08T22:33:35.848522Z [debug    ] Created configuration at /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/run/tap-exchange-rates-api/tap.e8b67d7e-f096-4197-b805-e51bcf5971a8.config.json
2022-08-08T22:33:35.848773Z [debug    ] Could not find tap.properties.json in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/tap.properties.json, skipping.
2022-08-08T22:33:35.848938Z [debug    ] Could not find tap.properties.cache_key in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/tap.properties.cache_key, skipping.
2022-08-08T22:33:35.849094Z [debug    ] Could not find state.json in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/state.json, skipping.

Custom, possibly unsupported by the plugin:
base [env: TAP_EXCHANGE_RATES_API_BASE] current value: 'USD' (from `meltano.yml`)
start_date [env: TAP_EXCHANGE_RATES_API_START_DATE] current value: '2018-01-01' (from `meltano.yml`)
symbols [env: TAP_EXCHANGE_RATES_API_SYMBOLS] current value: ['CAD', 'MXN', 'AUD', 'EUR', 'USD'] (from `meltano.yml`)
meltano --log-level debug config tap-exchange-rates-api list  1.64s user 0.20s system 99% cpu 1.849 total

Vs in 2.4.0:

(.venv) ➜  dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ meltano --version
meltano, version 2.4.0
(.venv) ➜  dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ date && time meltano --log-level debug config tap-exchange-rates-api list && date
Mon Aug  8 15:46:01 PDT 2022
2022-08-08T22:46:45.104438Z [debug    ] Creating engine <meltano.core.project.Project object at 0x10c521760>@sqlite:////Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/meltano.db
2022-08-08T22:46:45.406143Z [debug    ] loaded lazy attr 'SafeConfigParser': <class 'configparser.ConfigParser'>
2022-08-08T22:46:45.406251Z [debug    ] loaded lazy attr 'NativeStringIO': <class '_io.StringIO'>
2022-08-08T22:46:45.406330Z [debug    ] loaded lazy attr 'BytesIO': <class '_io.BytesIO'>

Custom, possibly unsupported by the plugin:
base [env: TAP_EXCHANGE_RATES_API_BASE] current value: 'USD' (from `meltano.yml`)
start_date [env: TAP_EXCHANGE_RATES_API_START_DATE] current value: '2018-01-01' (from `meltano.yml`)
symbols [env: TAP_EXCHANGE_RATES_API_SYMBOLS] current value: ['CAD', 'MXN', 'AUD', 'EUR', 'USD'] (from `meltano.yml`)
meltano --log-level debug config tap-exchange-rates-api list  76.80s user 0.50s system 97% cpu 1:18.94 total
Mon Aug  8 15:47:20 PDT 2022

More detail in Slack thread: https://meltano.slack.com/archives/C01TCRBBJD7/p1659998566724249?thread_ts=1659985256.778899&cid=C01TCRBBJD7

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
WillDaSilvacommented, Aug 17, 2022

Thanks to @dehume, I’ve been able to review some fairly detailed profiling information taken from the slow run. I’ve attached a call graph below.

Almost the entire execution is spent running the standard library function copy.deepcopy. The two sites which are making the slow calls to deepcopy are:

Much of the time spent in the deepcopy calls is spent in ruamel.yaml.CommentedMap.__deepcopy__, which in turn calls deepcopy recursively approximately 111,400,000 times.

I am able to reproduce the performance issue locally by using a large meltano.yml file, most of which is a list of arbitrary numbers. As part of addressing this performance issue, a performance test should be added that ensures that having a large meltano.yml file (e.g. 1000 lines long) does not take significantly longer to process than a short meltano.yml file.

Image

Possible approaches to resolving this:

  • Use a custom subclass of ruamel.yaml.CommentedMap that (somehow) avoids the problem.
  • Sufficiently avoid/reduce calls to copy.deepcopy.
2reactions
edgarrmondragoncommented, Aug 17, 2022

@WillDaSilva thanks for digging! I think it’s fairly safe to use copy.copy instead in

https://github.com/meltano/meltano/blob/3bf8d54b69b2758fb2f3fa7fd6bd0abde156ff3c/src/meltano/core/project_files.py#L47

That might improve performance somewhat.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting
If you have a question about Meltano, are having trouble getting it to work, or have any kind of feedback, you can:
Read more >
Meltano elt run fails with a “pipe closed” exception (#2478)
I'm finding that after running an elt job full throttle for about 30-45 minutes, meltano consistently fails with a “pipe closed” exception.
Read more >
meltano/pyproject.toml at main
It's open source, flexible, and scales to your needs. Confidently move, transform, and test your data using tools you know with a data...
Read more >
Airbyte and Meltano comparison : r/dataengineering
The reason I am considering open source tools is that I have a bad experience with support of similar products, where to get...
Read more >
apache superset query error on datetime with timezone ...
I use meltano and a singer custom TAP to retrieve and input the data.. meltano taps streams.py has the part where I describe...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found