RFC: Refactoring transformers and Zod 3
See original GitHub issueTLDR
Transformers are poorly designed. I want to reimplement them but there will be breaking changes. Under this proposal, Zod 2 will never come out of beta and we’ll jump straight to Zod 3 with a better implementation of transformers. Skip to “New Implementation” for details.
The reason Zod 2 has been in beta for so long (well, one of the reasons anyway!) is that I’ve been increasingly unsatisfied with Zod’s implementation of transformers. Multiple issues have cropped up that have led me to re-evaluate the current approach. At best, transformers are a huge footgun and at worst they’re fundamentally flawed.
Context
Previously (in v1) the ZodType base class tracked the inferred type in a generic parameter called Type
. But transformers accept an Input of one type and return an Output of a different type. To account for this, Type
was renamed to Output
and a third generic property was added (Input
). For non-transformers, Input
and Output
are the same. For transformers only, these types are different.
Let’s look at stringToNumber
as an example:
const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));
For stringToNumber
, Input is string
and Output is number
. Makes sense.
What happens when you pass a value into stringToNumber.parse
?
- The user passes a value into
stringToNumber.parse
- The transformer passes this value through the
parse
function of its input schema (z.string()). If there are parsing errors, it throws a ZodError - The transformer takes the output from that and passes it into the transformation function (
val => parseFloat(val)
) - The transformer takes the output of the transformation and validates it against the output schema (z.number())
- The result of that call is returned
Here’s the takeaway: for a generic transformer z.transformer(A, B, func)
, where A and B are Zod schemas, the argument of func
should be the Output
type of A and the return type is the Input
type of B. This lets you do things like this:
const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));
const numberToBoolean = z.transformer(z.number(), z.boolean(), val => val > 25);
const stringToNumberToBoolean = z.transformer(stringToNumber, numberToBoolean, val => 5 * val);
The problems with transformers
After implementing transformers, I realized transformers could be used to implement another much-requested feature: default values. Consider this:
const stringWithDefault = z.transformer(z.string().optional(), z.string(), val => val || "trout");
stringWithDefault.parse("marlin") // => "marlin"
stringWithDefault.parse(undefined) // => "trout"
Voila, a schema that accepts string | undefined
and returns string
, substituting the default “trout” if the input is ever undefined.
So I implemented the .default(val:T)
method in the ZodType base class as below (partially simplified)
default(def: Input) {
return ZodTransformer.create(this.optional(), this, (x: any) => {
return x === undefined ? def : x;
});
}
Do you see the problem with that? I didn’t. Neither did anyone who read through the Transformers RFC which I left open for comment for a couple months before starting on implementation.
Basically this implementation doesn’t work at all when you use it on transformers.
Side note: should the
.default
method forstringToNumber
accept a number or a string? As implemented above it should accept a string (theInput
). But already this is unintuitive to many people.
stringToNumber.default("3.14")
// EQUIVALENT TO
const defaultedStringToNumber = z.transformer(
stringToNumber.optional(),
stringToNumber,
val => val !== undefined ? val : "3.14"
)
defaultedStringToNumber.parse("5")
/* { ZodError: [
{
"code": "invalid_type",
"expected": "string",
"received": "number",
"path": [],
"message": "Expected string, received number"
}
] */
Let’s walk through why this fails. The input (“5”) is first passed into the transformer input (stringToNumber.optional()
). This converts the string "5"
to the number 5
. This is then passed into the transformation function. But wait: val
is now number | undefined
, but the transformer function needs to return a string
. Otherwise, if we pass 5
into stringToNumber.parse
it’ll throw. So we need to convert 5
back to "5"
. That may seem easy in this toy example but it’s not possible in the general case. Zod can’t know how to magically undo the transformation function.
In practice, the current definition of default
in ZodType shouldn’t have even been possible. The only reason the type checker didn’t catch this bug is because there are a regrettable number of any
s floating around in Zod. It’s not a simple matter to switch them all to unknown
s either; I’ve had to use any
in several instance to get type inference and certain generic methods to work properly. I’ve tried multiple times to reduce the number of any
s but I’ve never managed to crack it.
It’s possible this is a one-off issue. I could find some other way to implement .default()
that doesn’t involve transformers. Unfortunately this isn’t even the only problem in Zod’s implementation.
The .transform
method
Initially the only way to define transformers was with z.transformer(A, B, func)
. Eventually I implemented a utility function you can use like this:
z.string().transform(z.number(), val=>parseFloat(val));
// equivalent to
z.transformer(z.string(), z.number(), val=>parseFloat(val));
Some users were executing multiple transforms in sequence without changing the actual data type:
z.string()
.transform(z.string(), (val) => val.toLowerCase())
.transform(z.string(), (val) => val.trim())
.transform(z.string(), (val) => val.replace(" ", "_"));
To reduce the boilerplate here, it was recommended that I overload the method definition to support this syntax:
z.string()
.transform((val) => val.toLowerCase())
.transform((val) => val.trim())
.transform((val) => val.replace(" ", "_"));
If the first argument is a function instead of a Zod schema, Zod should assume that the transformation doesn’t transform the type. In other words, z.string().transform((val) => val.trim())
should be equivalent to z.string().transform(z.string(), (val) => val.trim())
. Makes sense.
Consider using this method on a transformer:
stringToNumber.transform(/* transformation_func */);
What type signature do you expect for transformation_func
?
Most would probably expect (arg: number)=>number
. Some would expect (arg: string)=>string
. Neither of those are right; it’s (arg: number)=>string
. The transformation function expects an input of number
(the Output of stringToNumber
) and a return type of number
(the Input of stringToNumber
). This type signature is a natural consequence of a series of logical design decisions, but the end result is dumb. Intuitively, you should be able to append .transform(val => val)
to any schema. Unfortunately due to how transformers are implemented, that’s not always possible.
More complicated examples
The fact that I incorrectly implemented both .transform
and .default
isn’t even the problem. The problem is that transformers make it difficult to write any generic functions on top of Zod (of which .transform
and .default
are two examples). Others have encountered similar issues. #199 and #213. are more complicated examples of how the current design of transformers makes it difficult to write any generic functions on top of Zod. Nested transformers in particular are a minefield.
A path forward
When I set out to implement transformers I felt strongly that each transformer should have a strongly defined input and output transformer. This led to me implementing transformers as a separate subclass of ZodType (ZodTransformer) in an attempt to make transformers compose nicely with other schemas. This is the root of the issues I’ve laid out above.
Instead I think Zod should adopt a new approach. For the sake of differentiation I’ll use a new term “mods” instead of “transformations”. Each Zod schema has a list of post-parse modification functions (analogous to Yup’s transform chain). When a value is passed into .parse
, Zod will type check the value, then pass it through the mod chain.
const schema = z.string()
.mod(val => val.length)
.mod(val => val > 100);
type In = z.input<typeof schema> // string
type Out = z.input<typeof schema> // boolean
Unlike before, Zod doesn’t validate the data type between each modification. We’re relying on the TypeScript engine to infer the correct type based on the function definitions. In this sense, Zod is behaving just like I intended; it’s acting as a reliable source of type safety that lets you confidently implement the rest of your application logic — including mods. Re-validating the type between each modification is overkill; TypeScript’s type checker already does that.
Each schema will still have an Input (the inferred type of the schema) and an Output (the output type of the last mod in the mod chain). But because we’re avoiding the weird hierarchy of ZodTransformers everything behaves in a much more intuitive way.
One interesting ramification is that you could interleave mods and refinements. Zod could keep track of the order each mod/refinement was added and execute them all in sequence:
const schema = z.string()
.mod(val => parseFloat(val))
.refine(val => val > 25, { message: "Too short" })
.mod(val => `${val}`)
.refine(val => /^\d+$/.test(val), { message: "No floats allowed" });
Compatibility
I was using the “mod” terminology above to avoid confusion with the earlier discussion. In reality I would implement the “mod chain” concept using the existing syntax/methods: .default
, .transform
, etc. In fact I think I could switch over to the “mod” approach without any major API changes.
A.transform(func)
: instead of returning a ZodTransformer, this method would simply appendfunc
to the “mod chain”A.transform(B, func)
: this would returnA.transform(func).refine(val => B.safeParse(val).success)
z.transformer(A, B, func)
: this could just returnA.transform(func).refine(val => B.safeParse(val).success)
A.default(defaultValue)
: this is trickier but still possible. This function would instantiateA.optional().mod(val => typeof val !== "undefined" ? val : defaultValue)
. Then all the mods of A would be transferred over to the newly created schema
Under the hood things would be working very differently but most projects could upgrade painlessly unless they explicitly use the ZodTransformer class in some way (which isn’t common).
I would still consider this to be a breaking change of course. If/when I make these changes, I plan to publish them as Zod v3. In this scenario Zod 2 would never leave beta, we’d jump straight to v3.
This transformers issue has caused me a lot of grief and headaches but I’m confident in the new direction; in fact I already have most of it working. I want to put this out for feedback from the community. The issues I’m describing are pretty subtle and not many people have run into them, but I believe the current implementation is currently untenable.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:24
- Comments:35 (13 by maintainers)
Top GitHub Comments
[Content status: strong opinions held weakly; not passing value judgments, just offering suggestions]
To concur: IMO, a parsing library is (morally) a toolbox for building functions of type
A => B | ZodError
, with the contents ofZodError
reflecting A’s structure. In a perfect world you’d want the library to permit composition ofA => B | ZodError
withB => C | ZodError
(I believe this is called a “functorial” interface in functional programming). The hard part is keepingZodError
legible under a variety of potential combinations, and indeed that severely restricts the space of practical solutions.If I understand correctly, @colinhacks proposes factoring
A => B | ZodError
intoA => A | ZodError
andA => B
. @jstewmon would like to be able to compose this with aB => B | ZodError
and possibly aB => C | ZodError
. The thing is that unless you can somehow provide a reverse mapping from downstream validation errors to the original value, this will net you utterly incomprehensible error messages!The question of how to reverse-map downstream errors into a form that reflects the original input is an interesting one, but I don’t think it should be a blocker for work on transformers. Just surfacing “validate, then transform” solves a ton of real-world use cases. We can talk about performing validation on the transformed values in a later proposal - IMO in the vast majority of cases it’s going to be an anti-pattern because it will completely wreck your error messages and perform likely unnecessary work on invalid inputs.
The only regret I have about
zod
is:z.number()
's:min
,max
, etc.The problem with them is:
terser
) because it’s possible class methods are dynamically requestedreact
’s team claimed classes are complicated for minifiersTo shake off a lot of dead code in production,
zod
could:min
,max
etcmin
for example forz.object()
, by assigning proper typings, for example:z.ZodTransformer<z.ZodNumber, z.ZodNumber>