question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PyYAML 4.1 changes "safe" in more ways than immediately obvious

See original GitHub issue

It’s clear from the changelog that pyyaml 4.x now defaults to a style of safe_load for loading. This is probably good. However, it also appears that the definition of “safe” has changed somewhat.

In https://github.com/cdent/gabbi/pull/252 the ‘safe’ related tests work differently depending on whether >4 or <4 is used. As currently written they pass with 4 and fail with 3. The difference seems to be that “safe” in 3 and 4 mean different things:

  • in 4, safe will load custom tags that are defined in the same process, but not python/object, and unsafe will not load custom tags, but will load python/object
  • in 3, custom tags only load in unsafe, and python/object, neither in safe

The branch on that pull request can demonstrate the problem with different PyYAML versions. And master in the same repo will as well.

However, I have no confidence that I’m parsing what’s going on properly at all, so I need to come up with a minimal test case, which I’ll try to do real soon, but I first wanted to get this written down in case there is something obviously wrong in either my code or in PyYAML.

I will followup to this with the MTC, ASAP. Sorry for dropping noise like this, but needed to dump state.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
prisaparoycommented, Jun 28, 2018

This is a problem for me as well safe load working for custom tags, but not python objects: eg:

>>> s="""
... redshift-snapshot: !cl_pyspark.connector.redshift
...     host:     redshift-prod.myapp.net
...     user:     readonly
...     password: password
...     database: snowplow
...     s3_temp:  s3a://temp-space
... """
>>> yaml.load(s)
{'redshift-snapshot': RedshiftConnector[readonly@redshift-prod.myapp.net:5439/snowplow, s3_temp=s3a://temp-space/]}
>>>

s="""
... schema: !!python/object:pyspark.sql.types.StructType
...         fields:
...           - !!python/object:pyspark.sql.types.StructField { name: from_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
...           - !!python/object:pyspark.sql.types.StructField { name: to_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
... """
>>> yaml.load(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 74, in load
    return loader.get_single_data()
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 133, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 414, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pyspark.sql.types.StructType'
  in "<string>", line 2, column 9:
    schema: !!python/object:pyspark.sql.type ...
^


danger load works for python objects, but not for custom tags

s="""
... schema: !!python/object:pyspark.sql.types.StructType
...         fields:
...           - !!python/object:pyspark.sql.types.StructField { name: from_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
...           - !!python/object:pyspark.sql.types.StructField { name: to_id, metadata: {}, dataType: !!python/object:pyspark.sql.types.StringType {}, nullable: False }
... """

>>> yaml.danger_load(s)
{'schema': StructType(List(StructField(from_id,StringType,false),StructField(to_id,StringType,false)))}
>>>


>>> s="""
... redshift-snapshot: !cl_pyspark.connector.redshift
...     host:     redshift-prod.myapp.net
...     user:     readonly
...     password: password
...     database: snowplow
...     s3_temp:  s3a://temp-space/
... """
>>> yaml.danger_load(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 101, in danger_load
    return load(stream, DangerLoader)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/__init__.py", line 74, in load
    return loader.get_single_data()
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
    return self.construct_document(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 48, in construct_document
    for dummy in generator:
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 208, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 133, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
    data = constructor(self, node)
  File "/Users/john/projects/myapp/.venv/lib/python2.7/site-packages/yaml/constructor.py", line 414, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!cl_pyspark.connector.redshift'
  in "<string>", line 2, column 20:
    redshift-snapshot: !cl_pyspark.connector.redshift
                       ^
0reactions
ingydotnetcommented, Mar 13, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading Dangerously: PyYAML and Safety by Design
A PyYAML lead maintainer was thinking of reverting this “safe by default” change in the next 4.x release because it broke backwards ...
Read more >
PyYAML Documentation
A python object can be marked as safe and thus be recognized by yaml.safe_load . To do this, derive it from yaml.YAMLObject (as...
Read more >
YAML Ain't Markup Language (YAML™) revision 1.2.2
Since in a YAML representation, mapping keys are unordered and nodes may be referenced more than once (have more than one incoming “arrow”),...
Read more >
Changelog — Read the Docs user documentation 9.1.1 ...
Version 4.1.8¶ ... This release adds a few new features and bugfixes. The largest change is the addition of hidden versions, which allows...
Read more >
YAML - Wikipedia
It uses both Python-style indentation to indicate nesting, and a more compact format that uses [...] for lists and {...} for maps thus...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found