Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Arbitrary derivation and registration

See original GitHub issue

There has been a fair amount of discussion about his in various issues: #109, #198, #334, #390. Since in v3 there is an opportunity to redesign and improve, I’d like to open a broader discussion here to get a good overview of the ins and outs.

Why is it worth having a `'T -> Arbitrary<'T>` mapping?

It is possible to make a working FsCheck without any mapping at all. It would mean that:

the user would have to always specify Arbirtary instances explicitly, instead or in addition to the types,
reflective derivation of Arbitrary instances is removed or significantly limited. (The latter obviously depends on some map of T -> Arbitrary<T> for at least the primitive types.)

One could argue the difference between fun (a:string) -> ... and Prop.forAll(Arb.Default.String(), fun a -> ...) is not big.

Where it does become annoying is if you are generating a more elaborate type, e.g. a tuple-like type:

Prop.forAll(Arb.Tuple(Arb.Int,Arb.String,myArbForMyType, Arb.Option(Arb.char)), fun (i,s,mytype,charopt) -> ....)

For a record type the API could look like:

Prop.forAll(Arb.Tuple(Arb.Int,Arb.String,myArbForMyType, Arb.Option(Arb.char)) |> Arb.convert tupleToMyRecordType tupleFromMyRecordType, 
fun myRecord -> ....)

For a union type, esp. a recursive it’s unclear to me if there is a short way of describing and Arbitrary instance. Presumably we need an Arb.fix, Arb.choice and Arb.apply to deal with recursion, choice between different union types and applying the union case to its field respectively.

It all seems a bit tedious.

Furthermore, having a mapping is appealing to newcomers because really all you have to know is how to write a function checking some property of a random instance of your type, and there is a very good chance FsCheck will already be able to run some tests for you.

The mapping is also a place where users can easily extend and tweak the whole set of automatically generated types:

Extend: say you have a custom type that FsCheck doesn’t handle out of the box. Then you can simply register a generator for that type, and presto, FsCheck can now not only generate that type, but also lists, arrays, record types, union types, etc of that type.
Tweak: say you don’t want to generate NaN for floats, because it’s not important for your domain. But the default list, array, record type generators will all generate NaN if they contain floats. Then you can simply override the generator for floats, and all the other generators will pick up the custom generator and so you won’t get NaNs anywhere.

What is bad about having a `'T -> Arbitrary<'T>` mapping?

It is currently implemented as a piece of global, mutable state and so carries associated costs.

Should it be global? ThreadLocal? AsyncLocal? SomethingElseLocal?

Also it’s not clear which arbitrary instances are in effect at any one time in the code, because it depends on which Arb.register calls have been executed beforehand. This is particularly problematic in multi-threaded scenarios.

Some specific problems with the current implementation

The mapping is kept in a single, immutable dictionary, but there is one ThreadLocal value where that dictionary is set. This seems to make it work with xunit’s parallelization. It almost certainly does not work with Task.
There is a single source of default Arbitrary instances in Arbitrary.fs which is kept in a format so that the registration mechanism (in TypeClass.fs) can read it. Each instance is defined as a parameter-less static method. This has several disadvantages:
If a generator is dependent on another generator, e.g. FsList() : Arbitrary<list<T>> needs a generator for T, that other generator is accessed via Arb.from<T> in the implementation of FsList which recurively consults the Arbitrary mapping to lookup the Arbitrary instance for T. This in effect allows the tweaking of the mapping I described above: if the user overrides the mapping for int, then int list will effectively be changed to, viz it will use the overridden int Arbitrary. But it causes confusion because it’s pretty easy to accidentally call the generator for the type you’re defining a mapping for, which leads to an infinite loop.
There is no method like FsList(Arbitrary<'a>) to parametrize the list generator in a more explicit way. So even if users would want to use the explicit style explained above to define Arbitrary instances, if they want to change any of the Arbitrary instances globally, they’ll need to redefine that Arbitrary from scratch.
It’s harder (but not impossible) to split the file in multiple more coherent pieces.

Some ways to take this forward

Option 1: do away with the mapping altogether. Users have to write Prop.forAll explicitly and pass in the Arbitrary instances to use. The problem with this is that it makes reflective Arbitrary instance derivation based on the type impossible - some map somewhere from type to Arbitrary is clearly necessary to do that. One can use a method like the current Arb.Default.Derive<'T>() which may take some configuration parameters - e.g. it can take the map to use explicitly, and throw like it does now when it can’t derive an instance reflectively and it’s not in the given map.
Option 2: we have a mapping but it’s immutable. I.e. basically like we have now but Arb.register does not exist. This makes tweaking or extending the map impossible, but this can be remediated by basically separating the mapping and the API in Arb.Default. I.e. if we parameterize all the methods on Arb.Default to take the Arbitrary instances they need, instead of relying on Arb.from<'T> and such, including Arb.Default.Derive, it’s still possible to tweak and extend the reflective Arbitrary generation in Arb.Default.Derive. But the mapping can’t be extended outside of FsCheck itself. I.e. it becomes harder to support extenal libraries that have Arbitrary instances for new types out of the box - these would always have to be used explicitly. Not that this is actually happening now, but it’s worth mentioning.
Option 3: we have a mapping but it’s append-only. I.e. you can only add new types. Trying to override an existing type in the mapping throws. Similar limitation re:tweaking Arbitrary instances as option 2, but this does allow extending the mapping. And I don’t think it has any of the downsides mentioned above, because you’d see clear error messages when anything untoward happens.
Option 4: we have a mutable mapping but it’s not global, but passed around while building the Property (so likely via the Gen computation expression somehow). So it’s like a state monad really, the mapping is passed around functionally but implicitly. The default mapping could be set in Config.
Option 5: we have a global, one-time mutable mapping that is defined and fixed at static initialization time, e.g. when the first test in an assembly is run. We could implement this by scanning the test assembly for places where mappings are defined (somehow, e.g. types attributed with a FsCheck defined attribute) and then building the map before the first test is run. It is impossible to change or affect any other way.

API for defining the map

Given the above, it seems unavoidable to have a map somewhere (unless you want to try and argue to throw out reflective Arbitrary derivation. That is a very hard sell.) so the question comes up on how to define the mapping from type T to Arbitrary of T. Note that it must be some place that allows to bring new type parameters in scope, to allow for the definition of Arbitrary instances for generic types. Again, a few options:

static methods (as now) - can get the mapping by calling something like ArbMapping.Derive<ArbType>()
instance methods (has the slight advantage that the Arbitrary instance for a set of instance methods can be parametrized by values, i.e. mapping is derived by ArbMapping.Derive(new ArbType(...parameters...))
types that derive from Arbitrary<T>, mapping derived by ArbMapping.Derive(new Int32Arb(), new CharArb(),...) and to be anywhere near convenient we’d need to have some way to scan an assembly for these types as well.

Thanks for reading this far…

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:16 (11 by maintainers)

Top GitHub Comments

6reactions

0x53Acommented, Sep 24, 2017

Different unit tests should not interfere with each other, so my vote is against any static global state.

I would like to have the ability for a scoped Arb registration.

So my preference would be

a core context which contains all the base Arbs (int, strings, …) and autogenerated Arbs (tuples, lists, …)
a way to either create a new context, or update a contaxt.
contexts should be immutable, so any update creates a new context (basically ImmutableDictionary)

So from a user perspective, I would like to write either

Prop.forAll(fun a -> ...)

which just uses the default context, or

let myArbContext = ArbContext.Core.Add(myCustomArb)
Prop.forAll(myArbContext , fun a -> ...)

where I create a custom isolated context and pass that through.

With the context being just a normal variable, they can be shared, created in helper methods, … It should also be possible to replace Arbs, so I should be able to write ArbContext.Core.Add(myIntArb).

It may also make sense to differentiate between an immutable Core context, and a mutable Default context (well, a mutable binding, the context itself would still be immutable).

The Core context is everything included by default in FsCheck.

The Default context defaults to the Core context, but if all my tests use the same Arbs anyway, then I could overwrite the Default context once, and all calls that do not explicitly pass a different context will use that.

Your assembly-scanning strategy also looks usefull, so there could be a helper method which creates a Context from an Assembly:

ArbContext.Default <- ArbContext.FromAssembly(Assembly.GetCurrentAssembly)
// or
ArbContext.Default <- ArbContext.FromAllCurrentlyLoadedAssemblies()

This is basically your option 4), but with an immutable context instead of a mutable one.

1reaction

ploehcommented, Oct 9, 2017

It is possible to make a working FsCheck without any mapping at all.

I think it’d be a good idea to do this. This will have an impact on usability, so I don’t suggest that this should stand alone, but creating a ‘core’ of FsCheck that works like that would, ACAICT, enable us (and third parties) to evolve a ‘usability layer’ on top of the core engine.

FsCheck could still ship with a default Arbitrary mapping, but with enough of the core exposed to enable other people to do things differently.

I’m not saying that I want to have things radically different; I only have an intuition that separating concerns like this would lead to a better architecture overall.

Should it be global? ThreadLocal? AsyncLocal? SomethingElseLocal?

It should be immutable, and the context should be explicit. It’s just a map. If you want to change it, pass it as an argument to a function.

As the options go, I’d prefer a combination where option 1 is the core of FsCheck, but with option 4 on top of it. I’m not sure why you write ‘mutable mapping’, though… Is that a typo? Shouldn’t it be immutable mapping?

Option 5 seems debilitating, because you’d only be able to change FsCheck once for an entire test run. This implies to me that it would be impossible to run two different test cases with different configurations.

Regarding an API for mappings, I’d suggest to do away with the type class emulation. The way .NET works, it adds overhead, but no type safety.

The map itself could be defined purely in terms of functions. The map’d be something like Type -> Arbitrary option. The problem is that at the bottom level of Reflection, things aren’t generic. In order to create a generic value (like, say, Arbitrary<Foo>), you need some sort of run-time conversion. This, again, implies to me some sort of option-based API.

If we want to be able to scan a type (or assembly) for custom definitions, I think that we should define a proper .NET interface, so that users at least get a bit of type-safety: Implement this interface, and you can register your custom Arbitrary. If you do it wrong, then your code doesn’t compile.