RFC: Arbitrary derivation and registration
See original GitHub issueThere has been a fair amount of discussion about his in various issues: #109, #198, #334, #390. Since in v3 there is an opportunity to redesign and improve, I’d like to open a broader discussion here to get a good overview of the ins and outs.
Why is it worth having a 'T -> Arbitrary<'T>
mapping?
It is possible to make a working FsCheck without any mapping at all. It would mean that:
- the user would have to always specify Arbirtary instances explicitly, instead or in addition to the types,
- reflective derivation of Arbitrary instances is removed or significantly limited. (The latter obviously depends on some map of
T -> Arbitrary<T>
for at least the primitive types.)
One could argue the difference between fun (a:string) -> ...
and Prop.forAll(Arb.Default.String(), fun a -> ...)
is not big.
Where it does become annoying is if you are generating a more elaborate type, e.g. a tuple-like type:
Prop.forAll(Arb.Tuple(Arb.Int,Arb.String,myArbForMyType, Arb.Option(Arb.char)), fun (i,s,mytype,charopt) -> ....)
For a record type the API could look like:
Prop.forAll(Arb.Tuple(Arb.Int,Arb.String,myArbForMyType, Arb.Option(Arb.char)) |> Arb.convert tupleToMyRecordType tupleFromMyRecordType,
fun myRecord -> ....)
For a union type, esp. a recursive it’s unclear to me if there is a short way of describing and Arbitrary
instance. Presumably we need an Arb.fix, Arb.choice and Arb.apply to deal with recursion, choice between different union types and applying the union case to its field respectively.
It all seems a bit tedious.
Furthermore, having a mapping is appealing to newcomers because really all you have to know is how to write a function checking some property of a random instance of your type, and there is a very good chance FsCheck will already be able to run some tests for you.
The mapping is also a place where users can easily extend and tweak the whole set of automatically generated types:
- Extend: say you have a custom type that FsCheck doesn’t handle out of the box. Then you can simply register a generator for that type, and presto, FsCheck can now not only generate that type, but also lists, arrays, record types, union types, etc of that type.
- Tweak: say you don’t want to generate NaN for floats, because it’s not important for your domain. But the default list, array, record type generators will all generate NaN if they contain floats. Then you can simply override the generator for floats, and all the other generators will pick up the custom generator and so you won’t get NaNs anywhere.
What is bad about having a 'T -> Arbitrary<'T>
mapping?
It is currently implemented as a piece of global, mutable state and so carries associated costs.
Should it be global? ThreadLocal? AsyncLocal? SomethingElseLocal?
Also it’s not clear which arbitrary instances are in effect at any one time in the code, because it depends on which Arb.register
calls have been executed beforehand. This is particularly problematic in multi-threaded scenarios.
Some specific problems with the current implementation
- The mapping is kept in a single, immutable dictionary, but there is one
ThreadLocal
value where that dictionary is set. This seems to make it work with xunit’s parallelization. It almost certainly does not work withTask
. - There is a single source of default
Arbitrary
instances inArbitrary.fs
which is kept in a format so that the registration mechanism (inTypeClass.fs
) can read it. Each instance is defined as a parameter-less static method. This has several disadvantages: - If a generator is dependent on another generator, e.g.
FsList() : Arbitrary<list<T>>
needs a generator forT
, that other generator is accessed viaArb.from<T>
in the implementation ofFsList
which recurively consults the Arbitrary mapping to lookup the Arbitrary instance forT
. This in effect allows the tweaking of the mapping I described above: if the user overrides the mapping forint
, thenint list
will effectively be changed to, viz it will use the overriddenint
Arbitrary. But it causes confusion because it’s pretty easy to accidentally call the generator for the type you’re defining a mapping for, which leads to an infinite loop. - There is no method like
FsList(Arbitrary<'a>)
to parametrize the list generator in a more explicit way. So even if users would want to use the explicit style explained above to defineArbitrary
instances, if they want to change any of the Arbitrary instances globally, they’ll need to redefine thatArbitrary
from scratch. - It’s harder (but not impossible) to split the file in multiple more coherent pieces.
Some ways to take this forward
- Option 1: do away with the mapping altogether. Users have to write
Prop.forAll
explicitly and pass in theArbitrary
instances to use. The problem with this is that it makes reflectiveArbitrary
instance derivation based on the type impossible - some map somewhere from type toArbitrary
is clearly necessary to do that. One can use a method like the currentArb.Default.Derive<'T>()
which may take some configuration parameters - e.g. it can take the map to use explicitly, and throw like it does now when it can’t derive an instance reflectively and it’s not in the given map. - Option 2: we have a mapping but it’s immutable. I.e. basically like we have now but
Arb.register
does not exist. This makes tweaking or extending the map impossible, but this can be remediated by basically separating the mapping and the API inArb.Default
. I.e. if we parameterize all the methods onArb.Default
to take the Arbitrary instances they need, instead of relying onArb.from<'T>
and such, includingArb.Default.Derive
, it’s still possible to tweak and extend the reflective Arbitrary generation inArb.Default.Derive
. But the mapping can’t be extended outside of FsCheck itself. I.e. it becomes harder to support extenal libraries that haveArbitrary
instances for new types out of the box - these would always have to be used explicitly. Not that this is actually happening now, but it’s worth mentioning. - Option 3: we have a mapping but it’s append-only. I.e. you can only add new types. Trying to override an existing type in the mapping throws. Similar limitation re:tweaking Arbitrary instances as option 2, but this does allow extending the mapping. And I don’t think it has any of the downsides mentioned above, because you’d see clear error messages when anything untoward happens.
- Option 4: we have a mutable mapping but it’s not global, but passed around while building the
Property
(so likely via theGen
computation expression somehow). So it’s like a state monad really, the mapping is passed around functionally but implicitly. The default mapping could be set inConfig
. - Option 5: we have a global, one-time mutable mapping that is defined and fixed at static initialization time, e.g. when the first test in an assembly is run. We could implement this by scanning the test assembly for places where mappings are defined (somehow, e.g. types attributed with a FsCheck defined attribute) and then building the map before the first test is run. It is impossible to change or affect any other way.
API for defining the map
Given the above, it seems unavoidable to have a map somewhere (unless you want to try and argue to throw out reflective Arbitrary
derivation. That is a very hard sell.) so the question comes up on how to define the mapping from type T to Arbitrary of T. Note that it must be some place that allows to bring new type parameters in scope, to allow for the definition of Arbitrary instances for generic types. Again, a few options:
- static methods (as now) - can get the mapping by calling something like
ArbMapping.Derive<ArbType>()
- instance methods (has the slight advantage that the Arbitrary instance for a set of instance methods can be parametrized by values, i.e. mapping is derived by
ArbMapping.Derive(new ArbType(...parameters...))
- types that derive from
Arbitrary<T>
, mapping derived byArbMapping.Derive(new Int32Arb(), new CharArb(),...)
and to be anywhere near convenient we’d need to have some way to scan an assembly for these types as well.
Thanks for reading this far…
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:16 (11 by maintainers)
Top GitHub Comments
Different unit tests should not interfere with each other, so my vote is against any static global state.
I would like to have the ability for a scoped Arb registration.
So my preference would be
So from a user perspective, I would like to write either
which just uses the default context, or
where I create a custom isolated context and pass that through.
With the context being just a normal variable, they can be shared, created in helper methods, … It should also be possible to replace Arbs, so I should be able to write
ArbContext.Core.Add(myIntArb)
.It may also make sense to differentiate between an immutable
Core
context, and a mutableDefault
context (well, a mutable binding, the context itself would still be immutable).The Core context is everything included by default in FsCheck.
The Default context defaults to the Core context, but if all my tests use the same Arbs anyway, then I could overwrite the Default context once, and all calls that do not explicitly pass a different context will use that.
Your assembly-scanning strategy also looks usefull, so there could be a helper method which creates a Context from an Assembly:
This is basically your option 4), but with an immutable context instead of a mutable one.
I think it’d be a good idea to do this. This will have an impact on usability, so I don’t suggest that this should stand alone, but creating a ‘core’ of FsCheck that works like that would, ACAICT, enable us (and third parties) to evolve a ‘usability layer’ on top of the core engine.
FsCheck could still ship with a default Arbitrary mapping, but with enough of the core exposed to enable other people to do things differently.
I’m not saying that I want to have things radically different; I only have an intuition that separating concerns like this would lead to a better architecture overall.
It should be immutable, and the context should be explicit. It’s just a map. If you want to change it, pass it as an argument to a function.
As the options go, I’d prefer a combination where option 1 is the core of FsCheck, but with option 4 on top of it. I’m not sure why you write ‘mutable mapping’, though… Is that a typo? Shouldn’t it be immutable mapping?
Option 5 seems debilitating, because you’d only be able to change FsCheck once for an entire test run. This implies to me that it would be impossible to run two different test cases with different configurations.
Regarding an API for mappings, I’d suggest to do away with the type class emulation. The way .NET works, it adds overhead, but no type safety.
The map itself could be defined purely in terms of functions. The map’d be something like
Type -> Arbitrary option
. The problem is that at the bottom level of Reflection, things aren’t generic. In order to create a generic value (like, say,Arbitrary<Foo>
), you need some sort of run-time conversion. This, again, implies to me some sort ofoption
-based API.If we want to be able to scan a type (or assembly) for custom definitions, I think that we should define a proper .NET interface, so that users at least get a bit of type-safety: Implement this interface, and you can register your custom Arbitrary. If you do it wrong, then your code doesn’t compile.