question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Refactoring transformers and Zod 3

See original GitHub issue

TLDR

Transformers are poorly designed. I want to reimplement them but there will be breaking changes. Under this proposal, Zod 2 will never come out of beta and we’ll jump straight to Zod 3 with a better implementation of transformers. Skip to “New Implementation” for details.

The reason Zod 2 has been in beta for so long (well, one of the reasons anyway!) is that I’ve been increasingly unsatisfied with Zod’s implementation of transformers. Multiple issues have cropped up that have led me to re-evaluate the current approach. At best, transformers are a huge footgun and at worst they’re fundamentally flawed.

Context

Previously (in v1) the ZodType base class tracked the inferred type in a generic parameter called Type. But transformers accept an Input of one type and return an Output of a different type. To account for this, Type was renamed to Output and a third generic property was added (Input). For non-transformers, Input and Output are the same. For transformers only, these types are different.

Screen Shot 2020-12-08 at 5 09 36 PM

Let’s look at stringToNumber as an example:

const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));

For stringToNumber, Input is string and Output is number. Makes sense.

What happens when you pass a value into stringToNumber.parse?

  1. The user passes a value into stringToNumber.parse
  2. The transformer passes this value through the parse function of its input schema (z.string()). If there are parsing errors, it throws a ZodError
  3. The transformer takes the output from that and passes it into the transformation function (val => parseFloat(val))
  4. The transformer takes the output of the transformation and validates it against the output schema (z.number())
  5. The result of that call is returned

Here’s the takeaway: for a generic transformer z.transformer(A, B, func), where A and B are Zod schemas, the argument of func should be the Output type of A and the return type is the Input type of B. This lets you do things like this:

const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));
const numberToBoolean = z.transformer(z.number(), z.boolean(), val => val > 25);
const stringToNumberToBoolean = z.transformer(stringToNumber, numberToBoolean, val => 5 * val);

The problems with transformers

After implementing transformers, I realized transformers could be used to implement another much-requested feature: default values. Consider this:

const stringWithDefault = z.transformer(z.string().optional(), z.string(), val => val || "trout");
stringWithDefault.parse("marlin") // => "marlin"
stringWithDefault.parse(undefined) // => "trout"

Voila, a schema that accepts string | undefined and returns string, substituting the default “trout” if the input is ever undefined.

So I implemented the .default(val:T) method in the ZodType base class as below (partially simplified)

default(def: Input) {
  return ZodTransformer.create(this.optional(), this, (x: any) => {
    return x === undefined ? def : x;
  });
}

Do you see the problem with that? I didn’t. Neither did anyone who read through the Transformers RFC which I left open for comment for a couple months before starting on implementation.

Basically this implementation doesn’t work at all when you use it on transformers.

Side note: should the .default method for stringToNumber accept a number or a string? As implemented above it should accept a string (the Input). But already this is unintuitive to many people.

stringToNumber.default("3.14")

// EQUIVALENT TO
const defaultedStringToNumber = z.transformer(
  stringToNumber.optional(),
  stringToNumber,
  val => val !== undefined ? val : "3.14"
)

defaultedStringToNumber.parse("5")
/* { ZodError: [
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "number",
    "path": [],
    "message": "Expected string, received number"
  }
] */

Let’s walk through why this fails. The input (“5”) is first passed into the transformer input (stringToNumber.optional()). This converts the string "5" to the number 5. This is then passed into the transformation function. But wait: val is now number | undefined, but the transformer function needs to return a string. Otherwise, if we pass 5 into stringToNumber.parse it’ll throw. So we need to convert 5 back to "5". That may seem easy in this toy example but it’s not possible in the general case. Zod can’t know how to magically undo the transformation function.

In practice, the current definition of default in ZodType shouldn’t have even been possible. The only reason the type checker didn’t catch this bug is because there are a regrettable number of anys floating around in Zod. It’s not a simple matter to switch them all to unknowns either; I’ve had to use any in several instance to get type inference and certain generic methods to work properly. I’ve tried multiple times to reduce the number of anys but I’ve never managed to crack it.

It’s possible this is a one-off issue. I could find some other way to implement .default() that doesn’t involve transformers. Unfortunately this isn’t even the only problem in Zod’s implementation.

The .transform method

Initially the only way to define transformers was with z.transformer(A, B, func). Eventually I implemented a utility function you can use like this:

z.string().transform(z.number(), val=>parseFloat(val));

 // equivalent to 
z.transformer(z.string(), z.number(), val=>parseFloat(val));

Some users were executing multiple transforms in sequence without changing the actual data type:

z.string()
  .transform(z.string(), (val) => val.toLowerCase())
  .transform(z.string(), (val) => val.trim())
  .transform(z.string(), (val) => val.replace(" ", "_"));

To reduce the boilerplate here, it was recommended that I overload the method definition to support this syntax:

z.string()
  .transform((val) => val.toLowerCase())
  .transform((val) => val.trim())
  .transform((val) => val.replace(" ", "_"));

If the first argument is a function instead of a Zod schema, Zod should assume that the transformation doesn’t transform the type. In other words, z.string().transform((val) => val.trim()) should be equivalent to z.string().transform(z.string(), (val) => val.trim()). Makes sense.

Consider using this method on a transformer:

stringToNumber.transform(/* transformation_func */);

What type signature do you expect for transformation_func?

Most would probably expect (arg: number)=>number. Some would expect (arg: string)=>string. Neither of those are right; it’s (arg: number)=>string. The transformation function expects an input of number (the Output of stringToNumber) and a return type of number (the Input of stringToNumber). This type signature is a natural consequence of a series of logical design decisions, but the end result is dumb. Intuitively, you should be able to append .transform(val => val) to any schema. Unfortunately due to how transformers are implemented, that’s not always possible.

More complicated examples

The fact that I incorrectly implemented both .transform and .default isn’t even the problem. The problem is that transformers make it difficult to write any generic functions on top of Zod (of which .transform and .default are two examples). Others have encountered similar issues. #199 and #213. are more complicated examples of how the current design of transformers makes it difficult to write any generic functions on top of Zod. Nested transformers in particular are a minefield.

A path forward

When I set out to implement transformers I felt strongly that each transformer should have a strongly defined input and output transformer. This led to me implementing transformers as a separate subclass of ZodType (ZodTransformer) in an attempt to make transformers compose nicely with other schemas. This is the root of the issues I’ve laid out above.

Instead I think Zod should adopt a new approach. For the sake of differentiation I’ll use a new term “mods” instead of “transformations”. Each Zod schema has a list of post-parse modification functions (analogous to Yup’s transform chain). When a value is passed into .parse, Zod will type check the value, then pass it through the mod chain.

const schema = z.string()
  .mod(val => val.length)
  .mod(val => val > 100);

type In = z.input<typeof schema> // string
type Out = z.input<typeof schema> // boolean

Unlike before, Zod doesn’t validate the data type between each modification. We’re relying on the TypeScript engine to infer the correct type based on the function definitions. In this sense, Zod is behaving just like I intended; it’s acting as a reliable source of type safety that lets you confidently implement the rest of your application logic — including mods. Re-validating the type between each modification is overkill; TypeScript’s type checker already does that.

Each schema will still have an Input (the inferred type of the schema) and an Output (the output type of the last mod in the mod chain). But because we’re avoiding the weird hierarchy of ZodTransformers everything behaves in a much more intuitive way.

One interesting ramification is that you could interleave mods and refinements. Zod could keep track of the order each mod/refinement was added and execute them all in sequence:

const schema = z.string()
  .mod(val => parseFloat(val))
  .refine(val => val > 25, { message: "Too short" })
  .mod(val => `${val}`)
  .refine(val => /^\d+$/.test(val), { message: "No floats allowed" });

Compatibility

I was using the “mod” terminology above to avoid confusion with the earlier discussion. In reality I would implement the “mod chain” concept using the existing syntax/methods: .default, .transform, etc. In fact I think I could switch over to the “mod” approach without any major API changes.

  • A.transform(func): instead of returning a ZodTransformer, this method would simply append func to the “mod chain”
  • A.transform(B, func): this would return A.transform(func).refine(val => B.safeParse(val).success)
  • z.transformer(A, B, func): this could just return A.transform(func).refine(val => B.safeParse(val).success)
  • A.default(defaultValue): this is trickier but still possible. This function would instantiate A.optional().mod(val => typeof val !== "undefined" ? val : defaultValue). Then all the mods of A would be transferred over to the newly created schema

Under the hood things would be working very differently but most projects could upgrade painlessly unless they explicitly use the ZodTransformer class in some way (which isn’t common).

I would still consider this to be a breaking change of course. If/when I make these changes, I plan to publish them as Zod v3. In this scenario Zod 2 would never leave beta, we’d jump straight to v3.

This transformers issue has caused me a lot of grief and headaches but I’m confident in the new direction; in fact I already have most of it working. I want to put this out for feedback from the community. The issues I’m describing are pretty subtle and not many people have run into them, but I believe the current implementation is currently untenable.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:24
  • Comments:35 (13 by maintainers)

github_iconTop GitHub Comments

5reactions
carlpatencommented, Dec 1, 2021

[Content status: strong opinions held weakly; not passing value judgments, just offering suggestions]

I have been reasoning about a schema as something that takes an input and returns a result of some type iff the input meets all the validation criteria. Knowing the type of the input isn’t necessarily useful, since often comes from deserializing an input.

To concur: IMO, a parsing library is (morally) a toolbox for building functions of type A => B | ZodError, with the contents of ZodError reflecting A’s structure. In a perfect world you’d want the library to permit composition of A => B | ZodError with B => C | ZodError (I believe this is called a “functorial” interface in functional programming). The hard part is keeping ZodError legible under a variety of potential combinations, and indeed that severely restricts the space of practical solutions.

If I understand correctly, @colinhacks proposes factoring A => B | ZodError into A => A | ZodError and A => B. @jstewmon would like to be able to compose this with a B => B | ZodError and possibly a B => C | ZodError. The thing is that unless you can somehow provide a reverse mapping from downstream validation errors to the original value, this will net you utterly incomprehensible error messages!

The question of how to reverse-map downstream errors into a form that reflects the original input is an interesting one, but I don’t think it should be a blocker for work on transformers. Just surfacing “validate, then transform” solves a ton of real-world use cases. We can talk about performing validation on the transformed values in a later proposal - IMO in the vast majority of cases it’s going to be an anti-pattern because it will completely wreck your error messages and perform likely unnecessary work on invalid inputs.

5reactions
o-alexandrovcommented, Dec 9, 2020

The only regret I have about zod is:

  • type-related methods in classes, for example of z.number()'s:
    • min, max, etc.

The problem with them is:

  • if unused, they cannot be removed as part of the dead code removal (for example w/ terser) because it’s possible class methods are dynamically requested

To shake off a lot of dead code in production, zod could:

  • remove all type-related methods like min, max etc
  • export them as ready-to-use either transform’s or refine’s functions
    • you could still protect the user from unwanted use of min for example for z.object(), by assigning proper typings, for example:
      • z.ZodTransformer<z.ZodNumber, z.ZodNumber>
Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - RFC: Refactoring transformers and Zod 3 -
Coming soon: A brand new website interface for an even better experience!
Read more >
Untitled
Inotia 3 ending video, Correspondant ouest france arzon, Harvey nichols edinburgh uk, ... Half width doors interior, Transformers season 2 dvd.
Read more >
Automated Refactoring of a U.S. Air Force Mainframe to AWS
Once the COBOL-to-Java code automated refactoring solution was selected, a three-phase approach emerged to meet the entirety of the USAF's ...
Read more >
class-validator - npm
Decorator-based property validation for classes.. Latest version: 0.14.0, last published: 3 days ago. Start using class-validator in your ...
Read more >
Untitled
Vibact tablet side effects, O512, Froscon social event, Saints row 3 create a ... Fowler 1999 refactoring, Rainer effenberger, Nexon support change email....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found