Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exception on parsing dot file

See original GitHub issue

With structure.dot as

digraph qnet {
    rankdir="TB";
    graph [pad="0", ranksep="0.25", nodesep="0.25"];
    node [penwidth=0.5, height=0.25, color=black, shape=box, fontsize=10, fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans"];
    edge [penwidth=0.5, arrowsize=0.5];
    compound=true;

    subgraph cluster_qnet {
        fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans";
        fontsize=12;
        label="qnet";
        tooltip="qnet";

        subgraph cluster_algebra {
            toolbox          [target="_top"; tooltip="qnet.algebra.toolbox",          href="../API/qnet.algebra.toolbox.html"];
            library          [target="_top"; tooltip="qnet.algebra.library",          href="../API/qnet.algebra.library.html"];
            core             [target="_top"; tooltip="qnet.algebra.core",             href="../API/qnet.algebra.core.html";             width=1.3];
            pattern_matching [target="_top"; tooltip="qnet.algebra.pattern_matching", href="../API/qnet.algebra.pattern_matching.html"; width=1.3];
            toolbox -> library;
            toolbox -> core;
            toolbox -> pattern_matching;
            library -> core;
            core    -> pattern_matching [weight=0, minlen=0];
            library -> pattern_matching;
            href = "../API/qnet.algebra.html"; target="_top";
            label="algebra";
            tooltip="qnet.algebra";
            graph[style=filled; fillcolor="#EEEEEE"];
        }

        convert          [target="_top"; tooltip="qnet.convert",       href="../API/qnet.convert.html"];
        visualization    [target="_top"; tooltip="qnet.visualization", href="../API/qnet.visualization.html"];
        printing         [target="_top"; tooltip="qnet.printing",      href="../API/qnet.printing.html"];
        utils            [target="_top"; tooltip="qnet.utils",         href="../API/qnet.utils.html"; width=1];

        { rank=same; convert[width=0.8]; visualization[width=0.8]; printing[width=0.8]; }
        convert       -> visualization [minlen=3, style=invis];
        visualization -> printing      [minlen=3];
        visualization -> toolbox  [minlen=2, lhead=cluster_algebra];
        printing      -> toolbox  [lhead=cluster_algebra];
        convert       -> toolbox  [lhead=cluster_algebra];

        core             -> utils [ltail=cluster_algebra];
        pattern_matching -> utils [ltail=cluster_algebra];
        convert          -> utils [minlen=6];
        printing         -> utils [minlen=6];
    }

}

I get the following exception:

>>> import pydot
>>> pydot.graph_from_dot_file("structure.dot", encoding='utf-8')
Traceback (most recent call last):
  File "<ipython-input-5-d070a81b3bde>", line 1, in <module>
    pydot.graph_from_dot_file("structure.dot", encoding='utf-8')
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/pydot.py", line 238, in graph_from_dot_file
    graphs = graph_from_dot_data(s)
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/pydot.py", line 221, in graph_from_dot_data
    return dot_parser.parse_dot_data(s)
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/dot_parser.py", line 554, in parse_dot_data
    err)
TypeError: Can't convert 'ParseException' object to str implicitly

That exception is actually raised during the handling of another Exception:

ParseException: Expected "}" (at char 351), (line:8, col:31)

It doesn’t seem to me that the file is an invalid dot file (and the graphviz command line utilities convert it fine)

Issue Analytics

State:
Created 5 years ago
Comments:20 (11 by maintainers)

Top GitHub Comments

1reaction

peternoweecommented, May 1, 2020

@ankostis: I have read up on logging and exception handling a bit and I have seen ‘some’ light. Thank you for your pointers.

I now agree with most of what you said:

Raise an exception.
Log a DEBUG-level log line.
Don’t bother configuring handlers or the root logger.
Yes, DEBUG log lines will get lost by default and someone has to make an effort to see them.

However, I am still not sure about what exception to raise exactly. I still want to help that unlucky end-user that gets confronted with it. For example, you wrote:

In a library, there is simple rule, when you don’t know what to do with an error, let it bubble!

and

[Lib-devs] should refrain from interpreting the errors the lb generates - these are not their business.

Is there room for an exception to those rules here? Because I do have some idea on what to do with the error and I do think we can actually add some valuable interpretation, like:

We know that we were parsing a DOT-string. PyParsing did not.
We can point the user to the DOT Language definition and the pydot issue tracker.

I have drafted a basic troubleshooting guide to be included in the pydot documentation that:

Explains the user how to get the DOT-string and pyparsing’s explanation:
- From the log.
- From the exception.
Points the user to:
- The DOT Language definition to check the DOT syntax.
- The maintainer of any intermediate software they use that may have auto-generated the DOT string.
- The pydot issue list in case they feel their DOT string was valid, and some general advice on reporting bugs.
- The PyParsing documentation in case they want to investigate in more detail.

I drafted it now as a new paragraph in the README.md, but it could also be split off to a new TROUBLESHOOTING.md of course.

The only thing I still need now, is a way to point the user to that documentation.

If we just allow the original exception to bubble up without any additional information, users with little experience reading tracebacks or otherwise unaware of pydot’s role, will not immediately look at our troubleshooting guide. They will just start searching for the term ParseException and end up in PyParsing’s documentation and possibly even ask for support there.

Some of the ways I considered to point the user to our documentation:

Write the DEBUG log out by default: No. I chased this for a long time, but finally concluded that this approach is all wrong. I kept running into potential logging configuration conflicts with other modules. And printing by default just goes against the whole nature of the DEBUG level. (Note: I will still log a DEBUG line, but I now accept that the user may have to do some work to see it.)
Logging a log line of level ERROR before raising an exception: No. The problem with this is that the exception may get handled at a higher level and then the log line is still there (“You’re giving me two errors.”, Mario Corchero, Pycon 2019, Logging HOWTO: When to use logging).
Add an in-line comment in our code at the end of the line that may indirectly cause the exception, so that the comment will be shown as part of the traceback:
```
File "dot_parser.py", line 575, in parse_dot_data
  tokens = graphparser.parseString(s)   # ParseException? See pydot documentation.
```
No. Drawbacks: Still not very visible in the middle of a traceback. Plus the comment would probably get deleted by another developer during the next code cleanup.
Drop the original ParseException and raise a new exception: No(?) It feels wrong to erase history like that. And copying over all the data is ugly and a lot of work.
Try to change the original ParseException’s message: No(?) It feels so hacky. Example: https://stackoverflow.com/questions/9157210/how-do-i-raise-the-same-exception-with-a-custom-message-in-python/9157277#9157277
Raising a new chained exception, chained to the original ParseException: Yes? In your previous comment, under “Are chained-exceptions over-rated?”, you showed that chained exceptions were not always necessary and pointed out that not using them avoided interrupting the stack-trace. I see your point, especially in the clean-up example you gave, but here I also want to weigh it against the benefit of being able to give the user some additional information. Can the pain of experienced users having to read two tracebacks be offset by the gain of inexperienced users saving troubleshooting time?

In case we decide to use chained exceptions, the question still remains what kind of exception we should use:
- Another one of the same exception (e.g. our ParseException chained to the original ParseException): No. Drawbacks: Confusing to have two of the same exceptions with two different messages. Copying all that data over is ugly, a lot of work and unnecessary duplication. Not copying the data makes the new exception non-conformant with its documented attributes.
- A built-in exception, such as ValueError, with just the message that we want to get out (chained to the original exception):
  - For pydot/pydot#218 (Graphviz/CalledProcessError): No. Because the original exception does not carry the supplied DOT-string.
  - For pydot/pydot#171 and pydot/pydot#219 (PyParsing/ParseException): No(?) I am not sure whether the original exception will always carry the supplied DOT-string. For example, the pyparsing code here suggests that it is possible that the attribute that normally holds the string is left empty.
- A custom exception (chained to the original): Yes? I hope we are not going to copy over any data from the original exception, now that it is chained to ours. Also, if we use unique names for our custom exceptions, we might not even need to literally point to our documentation, as a web search for that unique name would already point to pydot. An added benefit of using custom exceptions for both PR 218 and PR 219 is that we can offer a unified way of accessing the DOT-string. The question that then naturally comes up is whether we should create:
  - Two separate, independent exception classes (for starters), or
  - One single custom exception class for both cases (maybe later), or
  - Two custom exceptions derived from a single base class (maybe later).
  With those last two options, I run into even bigger questions, like: Why are pydot.py (from where Graphviz is called) and dot_parser.py (from where PyParsing is called) separate? Does sharing an exception or exception hierarchy between the two fit in with that? Should we define the exception hierarchy in a separate file? What to do with the existing exceptions, which have some problems already? Naming issues, etc. etc. My time is limited, so I hope to prevent such scope creep. Maybe we can avoid that discussion for a while now and start with the first option, i.e. create two custom exceptions for PR 218 and PR 219, completely independent from each other. We can use identical names for identical attributes, such as the DOT-string, so that won’t stand in the way of any later integration in an exception hierarchy. We could add a comment near each of the exception class definitions to make future developers aware of this. Further integration can be discussed later then, possibly in a separate PR.

To conclude the above list: I am currently thinking of a custom exception class with our own message and carrying only our additional attributes, chained to the original exception which has its own attributes. And for now no integration between the custom exception classes needed for PR 218 and PR 219.

About the contents of our message, I think it can contain a short interpretation from our side (e.g. Supplied string cannot be parsed as DOT or, more presumptuous, Invalid DOT-string or DOT-syntax error) and a pointer to our documentation (e.g. See pydot documentation for help.).

Example using ValueError:

Traceback (most recent call last):
  File "dot_parser.py", line 559, in parse_dot_data
    tokens = graphparser.parseString(s)
  File "pyparsing.py", line 1955, in parseString
    raise exc
  File "pyparsing.py", line 3003, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected {'graph' | 'digraph'}, found 'g'  (at char 0), (line:1, col:1)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "mre-with-logging.py", line 25, in <module>
    G = pydot.graph_from_dot_data("g blah blah")
  File "pydot.py", line 287, in graph_from_dot_data
    return dot_parser.parse_dot_data(s)
  File "dot_parser.py", line 575, in parse_dot_data
    raise ValueError(hint) from err
ValueError: Supplied string cannot be parsed as DOT. See pydot documentation for help.

Now the error ends in a clear message pointing to the pydot documentation.

Alternatively, using a custom exception with a unique name that should help users find pydot documentation by themselves:

[...]
pyparsing.ParseException: Expected {'graph' | 'digraph'}, found 'g'  (at char 0), (line:1, col:1)

The above exception was the direct cause of the following exception:
[...]
  File "dot_parser.py", line 583, in parse_dot_data
    raise new_err from err
dot_parser.InvalidDotError: Supplied string cannot be parsed as DOT

This last one is my current preference. It inherits from ValueError, but adds a custom attribute dot_string containing the string our function received and was also sent to pyparsing.

I hope you can let me know what you think.

Replying to your other comments:

Maybe we should just follow the Python support schedule and only support Python 3.5 and higher?

Absolutely 👍👍

Ok, I will assume Python 3.5+ in pydot/pydot#218 and pydot/pydot#219 from now on (next force-push). For Python 2, I created a very targeted bug fix for inclusion in a bug fix/point release in pydot/pydot#227.

I suggest then to take a look at my oscmd and the monekypatching of the standard-exception to contain also the cwd and sterr. Check also the TCs.

This is relevant only to PR pydot/pydot#218, I think. I had a quick look and I will look at it in more detail later and see what I can re-use, but I can already tell you that I don’t really feel comfortable monkey-patching the standard library, for the usual risks involved. I would rather use a custom exception, I think. Also, I have come to accept that not all of our additional data needs to be part of the exception string, so maybe there is no need to override __str__.

When you develop a library you still need to configure logging subsystem, for the test-cases, and for reviewing you log messages if they make sense, have errors, etc. It’s the amazing pytest that comes to the rescue; it avails the --log-level=DEBUG option, saving the lib developer from having to configure the logging sub-system.

Ok, I still need to look at testing. I will come back on this if I have any questions later.

Which btw it doesn’t need much effort:
logging.basicConfig(level=DEBUG)

Yes, that should do the trick in many cases. I have put this in the proposed documentation as the first method to try when someone wants to see the DEBUG lines. But there are two possible downsides:

If the root logger already has a handler attached to it (for example because an imported module set it up), the user’s later call to basicConfig will not have any effect. It will not add a handler, nor will it even change the level of the root logger anymore.
The level= argument here sets the level of the root logger to DEBUG (not its handler, btw, which basicConfig sets up to handle everything anyway). A root logger of level DEBUG results in all NOTSET loggers from all modules to inherit DEBUG as well. This could make the log quite noisy.

Therefore, I plan to document the following alternative as well:

logging.getLogger('pydot').addHandler(logging.StreamHandler())
logging.getLogger('pydot').setLevel(logging.DEBUG)
logging.getLogger('pydot.pydot').setLevel(logging.DEBUG)
logging.getLogger('pydot.dot_parser').setLevel(logging.DEBUG)

Notes:

Less prone to conflicts with logging configuration by other modules, because it does not touch the root logger/handler.
Attaching a handler to a library logger is normally not advised, but this is not going to be part of the code, but of the troubleshooting guide, where I think it is acceptable.
I explicitely set the levels of both the parent and the child loggers, because if another module has set the child loggers to a specific level already, they would not inherit from the parent anymore.

The above code results in the following logging tree (visualized using the logging_tree package):

<--""
   Level WARNING
   |
   o<--"pydot"
       Level DEBUG
       Handler Stream <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
       |
       o<--"pydot.dot_parser"
       |   Level DEBUG
       |
       o<--"pydot.pydot"
           Level DEBUG

Finally, two other issues that were brought up earlier in the context of PR 218 and/or PR 219 and on which my thoughts have started shifting:

Less immediate need for flags, options or environment variables to influence logging, error raising or verbosity. Partly because I have started to see the light with regard to logging and exception handling, but also because I am running into time constraints and I want to limit the scope of these two PRs. Others can make such suggestions in separate PRs and we can always rebase as things get moving.
Less need for a transition/backwards compatibility period for the API changes. This follows on earlier discussion here in https://github.com/pydot/pydot/issues/171#issuecomment-612582791 and in https://github.com/pydot/pydot/pull/219#discussion_r407183873. For pydot/pydot#218, I think backwards compatibility should be out of the question anyway, because the current behavior is so wrong (see the mess in pydot/pydot#203 for example). Both PRs now target a future major/minor release in which more API changes will occur, so I will just suffice with clear changelog entries and docstring changes. Here again, though, time constraints do play a role. Not only do I worry about the first implementation of backwards compatibility, but also about the long-term maintainability of the complex transition code and management of the transition schedule. If anyone has the time for it, feel free to send in a detailed proposal and I will reconsider.

1reaction

peternoweecommented, Apr 12, 2020

@ankostis In my PR https://github.com/pydot/pydot/pull/219 I also worked on this part of the code. I will reply to your suggestions here:

First, replying to https://github.com/pydot/pydot/issues/171#issuecomment-612397349:

There is still a bug lurking here, in the code handling of the origina-exception, where it attempts to concatenate exception with strings:

https://github.com/pydot/pydot/blob/48ba231b36012c5e13611807c9aac7d8ae8c15c4/dot_parser.py#L551-L554

It should become:
    print("%s%s^%s" % (err.line, " "*(err.column-1), err))

This is an alternative solution for the TypeError error. In my solution, I changed err to str(err). I tested your solution, both in Python 2.7.16 and 3.7.3, and it also works: No more TypeError:

>>> import pydot; pydot.graph_from_dot_data("graph { n [label=<some problematic DOT string>>]; }")
graph { n [label=<some problematic DOT string>>]; }          ^Expected "}", found '['  (at char 10), (line:1, col:11)

I am still missing the newlines though. Aren’t you? Perhaps some Linux/Windows difference? See https://github.com/pydot/pydot/pull/219/files#diff-baef597193866f900a2726a8a4667b12R556 for my solution which also adds newlines.

Second, replying to https://github.com/pydot/pydot/issues/171#issuecomment-612397742:

But it is better to let the exception bubble-up, or else, the user will receive a None dot.

and to https://github.com/pydot/pydot/issues/171#ref-commit-3efd534 which points to your monkey patch in the pydot-using software https://github.com/pygraphkit/graphtik/commit/3efd534f77c77044d75b59609ff1a7b335200f1c which patches out the pydot try... except block altogether:

Letting the exception bubble-up sounds reasonable, but perhaps this can also be accomplished by adding a raise statement at the end of the except-block, perhaps even doing exception chaining? I guess this means that we cannot return None anymore then. Perhaps some users are currently expecting and handling None, so that may be too big of a change for a point release, maybe even for a minor release.

Your patch also shows that you prefer pydot not to print details of the ParseException anymore. This touches on the general principle held by many that a library should not print anything at all. I ran into this issue elsewhere: pydot currently also prints a detailed error message when Graphviz dot returns an error code. I commented on that in https://github.com/pydot/pydot/pull/218#issuecomment-555360553. As I argued there, although I can understand the general notion that a library should not print, the practical problem is that not every user might immediately know what to do with an exception. The printing of error details by pydot allows end users to immediately start troubleshooting their DOT graph syntax rather than first having to learn how to dig into a ParseException or having to ask the maintainer of their pydot-using main software to add such exception-handling code. It may be possible to help the end users with this by other means, for example through good exception chaining with a pydot custom exception type that would lead the end user to some good pydot documentation on how to troubleshoot. But without such an alternative in place, I don’t know if it is a good idea to remove the printing already. Perhaps use a “debug” or “verbose” flag as suggested by @prmtl in https://github.com/pydot/pydot/pull/218#issuecomment-544122160. We could make it a pydot-wide flag, so that it controls both the printing of ParseException and Graphviz dot error details, or two separate flags, so that users may implement their own handling step by step, one pydot component at a time. Maybe at first, we can let the flags default to True, so that default behavior remains unchanged (=printing) and it can be included in a minor release already without much risk. Then anyone who wants to handle these exceptions himself already, can start and set them to False manually. Pydot documentation can suggest downstream developers to start looking into this. Then at a later stage, once we get some idea on how well the change has been received and how else we can help end users troubleshoot exceptions, we can announce the default will change to False (=no more printing) in the next major release for example.

Hope you can let me know what you think. Thanks.

Top Results From Across the Web

Exception on parsing dot file · Issue #171 · pydot ... - GitHub

off-by-one errors :-) I'm particularly intrigued by the 2nd class of errors, because it is intimately linked to the architecture of program.

Stack overflow exception while reading graph .dot file with ...

I want to read graph from dot file with over 10000 nodes, but i get stack overflow exception in the same line. I...

Solved: Re: Illegal Arg Exception when parsing CSV file t ...

Im getting the file through the GetFile processor and when I view the content from the queue I notice a red dot (see...

Developers - Exception on parsing dot file - - Bountysource

Exception on parsing dot file. ... With structure.dot as digraph qnet { rankdir="TB"; graph [pad="0", ranksep="0.25", nodesep="0.25"]; node [penwidth=0.5, ...

SyntaxError: JSON.parse: bad parsing - JavaScript | MDN

The JavaScript exceptions thrown by JSON.parse() occur when string failed to be parsed as JSON.