question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dumping objects with fields unknown to schema in versions >=3.0.0

See original GitHub issue

I am trying to serialize an object with some fields additional to that defined in the schema. While dumping, I am also specifying fields to be included using only. I was able to specify fields unknown to the schema in the only parameter when I was using version 2.19.5, but it’s breaking when I upgraded to 3.5.1. For clarity, consider the following code:

class MySchema(Schema):
    a = fields.Str()
    b = fields.Str()
    c = fields.Str()

class MyClass:
    def __init__(self, a, b, c, d, e):
        self.a = a
        self.b = b
        self.c = c
        self.d = d
        self.e = e


obj = MyClass("one", "two", "three", "four", "five")

data_out = MySchema(only=('a', 'b', 'c', 'd')).dump(obj)

print(data_out.data)

# Output with 2.19.5 (expected behavior) => {'a': 'one', 'c': 'three', 'b': 'two', 'd': 'four'}
# With 3.5.1 => ValueError: Invalid fields for <MySchema(many=False)>: {'d'}

One solution is to add those unknown fields to the Schema, but I was wondering whether there is a proper workaround for this issue when I am using the newer versions (specifically >=3.0.0).

I feel this is similar to #1198 but I was not able to find a solution for this in the thread.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
greggmicommented, Oct 20, 2021

I feel it needs to support dumping unknown fields, I want to use this to validate and populate fields before inserting into a DB but I will have unknown fields. While the load includes unknowns, I can’t insert objects into the DB. There already exists implicit loading, why not do the same for dumping? If an object cannot be inferred, throw an error.

For now I’ve worked around it with a post dump:

    @post_dump(pass_original=True)
    def keep_unknowns(self, output, orig, **kwargs):
        for key in orig:
            if key not in output:
                output[key] = orig[key]
        return output
1reaction
deckar01commented, Mar 20, 2020

That error occurs during schema construction, not the dump operation. The move to be more strict about field names came about to prevent typos from running without warning and silently losing data.

https://marshmallow.readthedocs.io/en/stable/upgrading.html#schemas-raise-validationerror-when-deserializing-data-with-unknown-keys

Marshmallow provides implicit field creation to ease the burden of declaring fields. This may work for your use case depending on the number of additional fields you need.

from marshmallow import Schema, fields


class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b

class MySchema(Schema):
    a = fields.Str()

class MySchemaB(MySchema):
    class Meta:
        additional = ['b']

obj = MyClass("one", "two")
MySchemaD().dump(obj)
# {'a': 'one', 'b': 'two'}

https://marshmallow.readthedocs.io/en/stable/quickstart.html#implicit-field-creation

Currently the only way to ingest truly unknown data is during load. If you are planning on dumping that data, it has to be declared explicitly, because there has never been a mechanism for dumping unknown data AFAIK.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Quickstart — marshmallow 3.19.0 documentation
Create a schema by defining a class with variables mapping attribute names to Field objects. from marshmallow import Schema, fields class ...
Read more >
great_expectations.data_context.types.base
Base schema class with which to define custom schemas. ... SorterConfigSchema (*, only: types. ... Changed in version 3.0.0: prefix parameter removed.
Read more >
marshmallow - Read the Docs
Serialize objects by passing them to your schema's dump method, which returns the ... input (e.g. to fail validation if an unknown field...
Read more >
Apache HBase ™ Reference Guide
Since version 3.0.0, HBase has upgraded to Log4j2, ... Private object as opaque; do not try to access its methods or fields directly....
Read more >
Upgrading Ruby on Rails
This has consequences for things like Etags that will change and cache keys as well. ... 2.15 Rails version is now included in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found