Feature proposal: meltano-manifest.json
See original GitHub issueRelated to:
- https://github.com/meltano/meltano/discussions/6944
- https://github.com/meltano/meltano/discussions/6766
As discussed in office hours on 2022-10-12.
Feature goals/requirements:
- Stability. Since this would serve as the base for many interop/integration scenarios, we want to make the format as stable as possible.
- Self-contained. There should be no need to reference external files or services when using this format. So, for example, lock file content would be embedded so no external references are needed.
- Pre-calculated inheritance, precedence, and override effects. The reader shouldn’t need to know how inheritance works, or understand any other internal Meltano business logic.
- No secrets. This artifact essentially needs to be treated as ‘code’ and may be passed around in less-trusted contexts. As such, it should not contain: env vars from local OS or terminal context, env vars from
.dotenv
, settings values fromsystemdb
or other external settings/secrets providers. - Simple as possible. Bonus points if the new json manifest is able to be validated against existing JSON Schema rulesets, without creating or maintaining net new data structures.
So, given these guidelines, here’s a possible path forward:
- Start with contents and structure of the
meltano.yml
file itself. - Merge in all data from
include_paths
declarations. - Super-populate each plugin definition in the global context:
- Inject all properties from the lock file, except of course, those properties overridden.
- Calculate the value of each entry of the plugin’s
config
. - Optionally, we can store ‘extra’ info about how the evaluation was performed under a
meta
orvendor
key that is explicitly non-stable or at least explicitly free-form. - Environment variable declarations in plugin config should be left unresolved until runtime. If we are confident that the value can be resolved without leaking any sensitive info, the predicted evaluation could optionally be rendered under a
meta
orvendor
key, without losing fidelity of the config’s env var reference.
- Repeat the above process for each declared Meltano Environment.
- Inject other top-level entities (
jobs
,env
,schedules
, etc.) into the environment declarations if applicable, so that each environment definition is standalone.
Variations:
- We optionally could give an environment name to the ‘global’ or
--no-environment
behavior, so that the top-level file is just the environments declaration:version: 1 \n environments: [...] \n <EOF>
. This reduces the clutter in the file, and readers would only deserialize the environment definition they need in the given context. - We could optionally create separate manifest files per environment so that the manifest is specific to what is needed exactly in a given context.
Issue Analytics
- State:
- Created a year ago
- Comments:25 (15 by maintainers)
Top Results From Across the Web
Actions · meltano/meltano - GitHub
Feature proposal : meltano-manifest.json Slash commands dispatcher #3142: Issue comment #6876 (comment) created by aaronsteers. yesterday 12s. yesterday 12s.
Read more >Settings - Meltano Documentation
Meltano supports a number of settings that allow you to fine tune its behavior, which are documented here.To quickly find the setting you're...
Read more >Draft: Resolve "Create a JSON schema for `meltano.yml` and ...
Draft: Resolve "Create a JSON schema for `meltano.yml` and publish it on schemastore.org" ... Include the proposed fix or feature
Read more >Untitled
... Marketing branding proposal template, Frontier airlines training center, ... Paradigma biocentrico, Chocolate lab features, Dooney theme jenkins, ...
Read more >Proceedings of the 11th Python in Science Conference
methods have been proposed to bridge this disparity, with varying ... efficiently express vector operations is an important feature of the.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@JulesHuisman - From the above, it looks like we’d ideally just capture the whole selected part of the JSON schema per stream. Assuming we just use
schema
as the property name for the JSON schema definition, and assuming we drop from the schema everything that has been deselected by the user, the spec starts to take shape here pretty organically…meltano-manifest.json
(yaml-converted for readability)We already have another function using the key
catalog
at this level (used for catalog overrides today), but perhaps astreams
key could work instead of hiding under an interimmeta
/vendor
key. The JSON Schema is pretty non-controversial in terms of compatibility with our own Meltano/Singer paradigms and interop with other platforms - especially if we pre-filter it, since that removes the requirement for needingmetadata
and other Singer-specific catalog details. The primary keys and incremental replication key values would probably would be worth including as well.Still worth calling out that the inclusion of a
streams
entry presupposes that either we’ve already ran discovery on the extractors, or that we have built the futuremeltano catalog
feature which would store those in a long-term git artifact other than the local.meltano
cache files.Assuming that discovery has run and we already have a catalog cache, this is not a lot more surface area to add, spec wise. And the fact that we would be excluding deselected streams makes me less worried about file sizes overall. Still significant, but not a ‘show stopper’, per se.
That said, the inclusion of the stream’s JSON Schema very likely could lean us more towards leveraging a separate manifest file per Meltano Environment - reducing the file size penalty for adding json schema to only once per file - instead of multiplying the increase in size by the number of Meltano environments. While we don’t generally expect schemas to vary widely across environments, there’s no reason that they have to be identical across environments, and in ‘real-world’ scenarios, it would not be uncommon for different environments to have slightly different schema definitions for the same streams.
@aaronsteers @tayloramurphy In order to have a relatively flat manifest wherein you don’t have to perform computations on the config at runtime, while still making it compatible with the Meltano project file schema, we’ll need to have more than just one manifest file per environment as suggested above.
This is because schedules currently have
env
blocks, and jobs likely will in the future too. We may also want to support plugin config at these levels in the future too.To have a fully pre-computed manifest file we have to make each leaf node in the project files (i.e. contexts in which Meltano can actually do work) static. It can’t change based on whether you’re calling from one environment or another, or running under a schedule or not, or running a job or not. To accomplish this for environments we agreed that we’d create a separate environment for each manifest file, but at the time I didn’t realize that we actually need one for each combination of these 3 contexts in which Meltano can be running. If I’m missing any other such execution contexts, we’ll need to incorporate that in the same way.
To handle this, I propose we decide on an arbitrary order for these contexts, such as
(environment, schedule, job)
, then create each manifest file for a particular combination of them in that order. These can determine the file name of each manifest file:meltano-manifest.<environment name>.<schedule name>.<job name>.json
. This may be an issue if.
is a legal character within any of these names, so we may have to hash each of the names to avoid that problem.This is just the default name/location for a manifest file - when one is generated manually via
meltano compile
the--output
parameter can be used to save the file to a different location, e.g.--output ./manifest.json
. Specifying--output
should only work if a single manifest file is being generated. If some context (e.g. the schedule) is left unspecified, then an--output-dir
must be specified instead in which manifests for every value of the unspecified context will be generated.Each context has a global/none option, so you end up with $(e+1)(s+1)(j+1)$ manifest files, where $e$ is the number of environments, $s$ is the number of schedules, and $j$ is the number of jobs.
If we compile them only as-needed and selectively, that’s fine.
As-needed in the case where you’re some part of
meltano.core
and you want to get config for whaterver you’re doing now, so you request a manifest for the current environment-schedule-job triple. If it already exists for the current hash of the project files (maybe that hash can be stored within annotations?) then it’s simply read.Otherwise a new manifest is generated, and saved to disk. Selectively when you’re operating outside of a Meltano process and need data about a project. You run
meltano compile --environment <environment name> --schedule <schedule name> --job <job name>
instead ofmeltano compile
, and thereby only generate the manifest for the desired triple.If it really is the case that all combinations are needed… oof. Hopefully we can generate them quickly, and the project files don’t have too many environments/schedules/jobs.