bug: Meltano is slow to start
See original GitHub issueThere appears to be a bug causing some users to have a slow startup, up to 45 seconds in the provided logs
Pre 2.x:
(.venv) ➜ dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ meltano --version
meltano, version 1.100.0
(.venv) ➜ dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ time meltano --log-level debug config tap-exchange-rates-api list
2022-08-08T22:33:35.566057Z [debug ] Creating engine <meltano.core.project.Project object at 0x10e248970>@sqlite:////Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/meltano.db
2022-08-08T22:33:35.815951Z [debug ] Using selector: KqueueSelector
2022-08-08T22:33:35.848522Z [debug ] Created configuration at /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/run/tap-exchange-rates-api/tap.e8b67d7e-f096-4197-b805-e51bcf5971a8.config.json
2022-08-08T22:33:35.848773Z [debug ] Could not find tap.properties.json in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/tap.properties.json, skipping.
2022-08-08T22:33:35.848938Z [debug ] Could not find tap.properties.cache_key in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/tap.properties.cache_key, skipping.
2022-08-08T22:33:35.849094Z [debug ] Could not find state.json in /Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/extractors/tap-exchange-rates-api/state.json, skipping.
Custom, possibly unsupported by the plugin:
base [env: TAP_EXCHANGE_RATES_API_BASE] current value: 'USD' (from `meltano.yml`)
start_date [env: TAP_EXCHANGE_RATES_API_START_DATE] current value: '2018-01-01' (from `meltano.yml`)
symbols [env: TAP_EXCHANGE_RATES_API_SYMBOLS] current value: ['CAD', 'MXN', 'AUD', 'EUR', 'USD'] (from `meltano.yml`)
meltano --log-level debug config tap-exchange-rates-api list 1.64s user 0.20s system 99% cpu 1.849 total
Vs in 2.4.0:
(.venv) ➜ dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ meltano --version
meltano, version 2.4.0
(.venv) ➜ dutchie-dagster-meltano git:(NoJIRA-upgrade-dagster-meltano) ✗ date && time meltano --log-level debug config tap-exchange-rates-api list && date
Mon Aug 8 15:46:01 PDT 2022
2022-08-08T22:46:45.104438Z [debug ] Creating engine <meltano.core.project.Project object at 0x10c521760>@sqlite:////Users/dwall/repos/dagster/dutchie-dagster-meltano/.meltano/meltano.db
2022-08-08T22:46:45.406143Z [debug ] loaded lazy attr 'SafeConfigParser': <class 'configparser.ConfigParser'>
2022-08-08T22:46:45.406251Z [debug ] loaded lazy attr 'NativeStringIO': <class '_io.StringIO'>
2022-08-08T22:46:45.406330Z [debug ] loaded lazy attr 'BytesIO': <class '_io.BytesIO'>
Custom, possibly unsupported by the plugin:
base [env: TAP_EXCHANGE_RATES_API_BASE] current value: 'USD' (from `meltano.yml`)
start_date [env: TAP_EXCHANGE_RATES_API_START_DATE] current value: '2018-01-01' (from `meltano.yml`)
symbols [env: TAP_EXCHANGE_RATES_API_SYMBOLS] current value: ['CAD', 'MXN', 'AUD', 'EUR', 'USD'] (from `meltano.yml`)
meltano --log-level debug config tap-exchange-rates-api list 76.80s user 0.50s system 97% cpu 1:18.94 total
Mon Aug 8 15:47:20 PDT 2022
More detail in Slack thread: https://meltano.slack.com/archives/C01TCRBBJD7/p1659998566724249?thread_ts=1659985256.778899&cid=C01TCRBBJD7
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:11 (4 by maintainers)
Top Results From Across the Web
Troubleshooting
If you have a question about Meltano, are having trouble getting it to work, or have any kind of feedback, you can:
Read more >Meltano elt run fails with a “pipe closed” exception (#2478)
I'm finding that after running an elt job full throttle for about 30-45 minutes, meltano consistently fails with a “pipe closed” exception.
Read more >meltano/pyproject.toml at main
It's open source, flexible, and scales to your needs. Confidently move, transform, and test your data using tools you know with a data...
Read more >Airbyte and Meltano comparison : r/dataengineering
The reason I am considering open source tools is that I have a bad experience with support of similar products, where to get...
Read more >apache superset query error on datetime with timezone ...
I use meltano and a singer custom TAP to retrieve and input the data.. meltano taps streams.py has the part where I describe...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks to @dehume, I’ve been able to review some fairly detailed profiling information taken from the slow run. I’ve attached a call graph below.
Almost the entire execution is spent running the standard library function
copy.deepcopy
. The two sites which are making the slow calls todeepcopy
are:Much of the time spent in the
deepcopy
calls is spent inruamel.yaml.CommentedMap.__deepcopy__
, which in turn callsdeepcopy
recursively approximately 111,400,000 times.I am able to reproduce the performance issue locally by using a large
meltano.yml
file, most of which is a list of arbitrary numbers. As part of addressing this performance issue, a performance test should be added that ensures that having a largemeltano.yml
file (e.g. 1000 lines long) does not take significantly longer to process than a shortmeltano.yml
file.Possible approaches to resolving this:
ruamel.yaml.CommentedMap
that (somehow) avoids the problem.copy.deepcopy
.@WillDaSilva thanks for digging! I think it’s fairly safe to use
copy.copy
instead inhttps://github.com/meltano/meltano/blob/3bf8d54b69b2758fb2f3fa7fd6bd0abde156ff3c/src/meltano/core/project_files.py#L47
That might improve performance somewhat.