Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

High rate of duplicates when using fc.tuple

See original GitHub issue

🐛 Bug Report

High rate of duplicate values are generated when using fc.tuple.

To Reproduce

const fc = require('fast-check');


const cases = [
	[
		fc.emailAddress(),
		'using just `fc.emailAddress() results in no duplicates'
	],

	[
		fc.tuple(fc.integer(), fc.emailAddress()).map((arr) => arr[1]),
		'using tuple with a different type (e.g. `fc.integer()`) results in no duplicates'
	],

	// This is the case where behavior seems incorrect
	[
		fc.tuple(fc.emailAddress(), fc.emailAddress()).map((arr) => arr[1]),
		'using the same type in a tuple results in duplicates at a much higher rate than one would expect'
	]
];

cases.forEach(([arb, description]) => {
	console.log(`${description}:`);

	const seenEmails = {};

	fc.check(fc.property(arb, (emailAddress) => {
		if (seenEmails[emailAddress]) {
			console.log(`\t${emailAddress}`);
		}
		seenEmails[emailAddress] = true;
	}), {
		numRuns: 1000,
		seed: -1653245226 // should also be able to replicate without providing seed
	});
	console.log();
});

Expected behavior

There should be no duplicates (or at least an extremely small chance of duplicates) in the 3rd case as demonstrated above.

Your environment

Packages / Softwares	Version(s)
fast-check	1.20.1
node	8.11.1

Issue Analytics

State:
Created 4 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

dubzzzcommented, Jan 24, 2020

The issue is now fixed on master. I will certainly release a new minor of fast-check soon to make those changes available.

The fix was to add the jump capability on the most important random number generators provided by pure-rand (whenever possible). jump is equivalent to ask the random number generator to skip thousands of generated values. It’s a classical way to run independant computations based on random values. In the case of xorshift128+ - the random number generator used by fast-check by default - the jump skips 2^64 values.

One of the consequences of this change is that the next minor will not generate the same values as the minor right before.

1reaction

dubzzzcommented, Jan 17, 2020

From my recent analysis (following your bug report), the recommended way to generate multiple random subsequences using a prng is to implement a jump method. It consists into offsetting the prng as if it was asked to generate a very huge amount of random numbers. In the case of xorshift128+ (default prng used by fast-check), the recommendation is to offset it by 2^64 😳 Hopefully there is a quick algorithm to do that efficiently. I’m working on adding the jump capability in pure-rand (lib uses by fast-check to generate its random values). Too be continued…

Top Results From Across the Web

Find duplicate items within a list of list of tuples Python

I want to find the matching item from the below given list.My List may be super large. The very first item in the...

Built-in Types — Python 3.11.1 documentation

The following sections describe the standard types that are built into the interpreter. The principal built-in types are numerics, sequences, mappings, ...

Python | Removing duplicates from tuple - GeeksforGeeks

Method #1 : Using set() + tuple() This is the most straight forward way to remove duplicates. In this, we convert the tuple...

Indexing and Selecting Data — pandas 0.13.1 documentation

Expect more work to be invested higher-dimensional data structures ... You can also use the levels of a DataFrame with a MultiIndex as...

Write a python program to count number of words in a string

1 Program to count the total number of words using for loop; 1. ... Example Input: Python is an interpreted high-level programming language...