Graphs.transitiveClosure adding self-loops is unintuitive
See original GitHub issueI’ve been working on a project that centres around DAGs. I was very surprised to find cycles cropping up, so I tracked the issue down to Graphs.transitiveClosure.
I did a bit of reading (Wikipedia mostly) to see if this was expected behaviour. However, everything I saw pointed towards self-cycles not being a requirement of a transitive closure. I found two different images on Wikipedia that show a closure with no self-cycle, The description on the Transitive Closure wiki page itself uses the example that nodes are airports and edges are flights, and the transitive closure is the graph of everything reachable from a node in one or more hops.
Then, I thought to actually check the documentation more thoroughly. It looks like this is intended behaviour, but should this behaviour maybe be changed? (Maybe a flag parameter that can be passed to disable self-cycles) If not, maybe the documentation could make this a little clearer, since the Guava notion of the transitive closure is slightly different from the usual definition (unless I’ve really misunderstood things) - reachability elsewhere is defined as within 1 or more hops, but in Guava it is 0 or more.
There’s a one-line workaround at least (graph.nodes().forEach(n -> graph.removeEdge(n, n));
), hopefully that doesn’t fall afoul of any concurrent modification rules.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:11 (7 by maintainers)
Top GitHub Comments
Why does the right answer not turn on the standard definition of transitive closure, where a closure is the smallest relation that i. includes the original relation and ii. is transitive? This means that for a directed graph self loops should be included iff a node is reachable from itself along directed path of positive length. (See issue 3187.)
OK, I see what you are talking about. But this is a matter of very odd choices of vocabulary. The transitive closure of a binary relation in a set X is by definition the smallest transitive relation in X that includes R. It is common enough to create another closure, often called the reflexive-transitive closure, which is the transitive closure of the reflexive closure. Some authors explicitly say they will call this the transitive closure for short (e.g., Computer Algorthms by Baase and Van Gelder), but they should not. In addition to these, some authors introduce a notion of an irreflexive transitive closure, but this is very bad terminology, since the idea of an irreflexive closure (i.e., superset) is inchoherent.
Thank you for clarifying. In my view, if one wants all self loops in a relation that is not reflexive, one should produce the transitive closure of the reflexive closure. (However, providing an option for this seems fine to me.) For the transitive closure itself, a self loop should always indicate a positive-length cycle (possibly of length 1). To be a proper closure, any self loops in the original relation must be retained. In terms of the discussion above, therefore, the question really cannot be whether to “allow” self loops, since they may be required by a transitive closure. It is instead whether to add self loops (i.e., to create the reflexive transitive closure). Of course, one might offer an option to remove self loops, with the understanding that the result will not be a closure.