question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Nesting @graphs does not properly namespace op names

See original GitHub issue

Summary

When composing graphs, the names of the ops contained in those graphs are not properly namespaced. This is particularly noticeable when using the factory function pattern which creates ops with the same names.

Consider a factory function make_table_loader(table_name, op_name="default_name") that one would like to reuse in many graphs. If calling this factory multiple times within the same graph without passing a new op_name, this should fail (and it does) due to a naming conflict at the same level.

However, if calling in different graphs, one should be able to reuse the name, even if the created op is different. There shouldn’t be any naming conflict, but in master this fails.

This behavior is inconsistent with the documentation (see https://github.com/dagster-io/dagster/issues/8013) but I also believe it to be a bug left over from the conversion from pipelines to graphs. It looks like this is also inconsistent with how get_output_for_handle is supposed to work here https://docs.dagster.io/guides/dagster/graph_job_op#a-simple-composite-solid .

Reproduction

from dagster import solid, graph, GraphOut, Permissive

def test_factory_composition_bug():
    # make sure that namespaced conflicts DO work
    def make_simple(name):
        @op(config_schema={"payload": Permissive()}, name=name)
        def fn(context):
            return context.op_config["payload"]

        return fn

    @graph()
    def wrapped_simple():
        return make_simple("simple")()

    expected_payload = {"key": ["hello", "there"]}
    result = wrapped_simple.execute_in_process(
        run_config={"ops": {"simple": {"config": {"payload": expected_payload}}}}
    )

    # default execution works
    assert result.success
    assert result.output_value() == expected_payload

    # what if we use a factory function twice, but with the same name?
    @graph()
    def wrapped_simple_extra():
        return make_simple("simple")()

    @graph(out={"val1": GraphOut(), "val2": GraphOut()})
    def wrap_all_simples():
        return {"val1": wrapped_simple(), "val2": wrapped_simple_extra()}

    # how about nested?
    result2 = wrap_all_simples.execute_in_process(
        run_config={
            "ops": {
                "wrapped_simple": {"ops": {"simple": {"config": {"payload": expected_payload}}}},
                "wrapped_simple_extra": {
                    "ops": {"simple": {"config": {"payload": expected_payload}}}
                },
            }
        }
    )
    assert result2.success
    assert result2.output_value("val1") == expected_payload
    assert result2.output_value("val2") == expected_payload

Dagit UI/UX Issue Screenshots

Additional Info about Your Environment


Message from the maintainers:

Impacted by this bug? Give it a 👍. We factor engagement into prioritization.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:7
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
alangenfeldcommented, May 26, 2022

Appreciate the follow-up discussion, definitely think this is something we should revisit and improve.

0reactions
simonvanderveldtcommented, Nov 18, 2022

I initially flagged the documentation bug since it didn’t match with what I saw in Dagster, but for normal usage I’d expect namespacing within the DAG/Graph over namespacing within the complete the Repository.

Use-cases and ways of working will differ and some users will work together when they are using a Repository and try to share things, but others might not and then someone’s DAG/Graph with an Op called “abc” will cause collisions with someone else’s DAG/Graph with an Op with the same name. Now depending on who was first the other user will get a failure, which I’d expect will result in confusion for at least some users. Also because of this these users are now hindered in their autonomous operation, they now need to be aware of what other people are doing to not cause any collisions. And I guess you could argue naming things is already hard enough, having to also consider the scope outside of the DAG/Graph just makes it harder.

[edit] Wanted to add that I do see that Ops (definitions) are something different than what I’m used to coming from Airflow Tasks and there are some advantages to them being an actual entity that exists and is stored within Dagster, allowing thing like being able to easily see in which Graphs/Assets an Op is used. I’m not sure it’s worth it though compared to the issues I mentioned above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I remove redundant namespace in nested query when ...
SQL Server will disallow any attribute names beginning with 'xmlns' and any tag names with colons in them.
Read more >
namespace Tcl Built-In Commands 8.5
If no namespace names are given, this command does nothing. namespace ensemble ... and namespace inscope appends additional args as proper list elements....
Read more >
https://svn.python.org/projects/external/tcl8.4.12...
.SH NAME namespace \- create and manipulate contexts for commands and variables . ... If no namespace names are given, this command does...
Read more >
W3C XML Schema Definition Language (XSD) 1.1 Part 1
These attributes are in the namespace whose name is ... and the rules for proper nesting of elements), while validity constraints are the ......
Read more >
Graph Surgeon — NVIDIA TensorRT Standard Python API ...
Allow you to create free standing TensorFlow nodes, which can be used as stand-ins for plugins. graphsurgeon.create_node(name, op=None, trt_plugin=False, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found