High rate of duplicates when using fc.tuple
See original GitHub issue🐛 Bug Report
High rate of duplicate values are generated when using fc.tuple
.
To Reproduce
const fc = require('fast-check');
const cases = [
[
fc.emailAddress(),
'using just `fc.emailAddress() results in no duplicates'
],
[
fc.tuple(fc.integer(), fc.emailAddress()).map((arr) => arr[1]),
'using tuple with a different type (e.g. `fc.integer()`) results in no duplicates'
],
// This is the case where behavior seems incorrect
[
fc.tuple(fc.emailAddress(), fc.emailAddress()).map((arr) => arr[1]),
'using the same type in a tuple results in duplicates at a much higher rate than one would expect'
]
];
cases.forEach(([arb, description]) => {
console.log(`${description}:`);
const seenEmails = {};
fc.check(fc.property(arb, (emailAddress) => {
if (seenEmails[emailAddress]) {
console.log(`\t${emailAddress}`);
}
seenEmails[emailAddress] = true;
}), {
numRuns: 1000,
seed: -1653245226 // should also be able to replicate without providing seed
});
console.log();
});
Expected behavior
There should be no duplicates (or at least an extremely small chance of duplicates) in the 3rd case as demonstrated above.
Your environment
Packages / Softwares | Version(s) |
---|---|
fast-check | 1.20.1 |
node | 8.11.1 |
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Find duplicate items within a list of list of tuples Python
I want to find the matching item from the below given list.My List may be super large. The very first item in the...
Read more >Built-in Types — Python 3.11.1 documentation
The following sections describe the standard types that are built into the interpreter. The principal built-in types are numerics, sequences, mappings, ...
Read more >Python | Removing duplicates from tuple - GeeksforGeeks
Method #1 : Using set() + tuple() This is the most straight forward way to remove duplicates. In this, we convert the tuple...
Read more >Indexing and Selecting Data — pandas 0.13.1 documentation
Expect more work to be invested higher-dimensional data structures ... You can also use the levels of a DataFrame with a MultiIndex as...
Read more >Write a python program to count number of words in a string
1 Program to count the total number of words using for loop; 1. ... Example Input: Python is an interpreted high-level programming language...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The issue is now fixed on master. I will certainly release a new minor of fast-check soon to make those changes available.
The fix was to add the
jump
capability on the most important random number generators provided by pure-rand (whenever possible).jump
is equivalent to ask the random number generator to skip thousands of generated values. It’s a classical way to run independant computations based on random values. In the case of xorshift128+ - the random number generator used by fast-check by default - the jump skips 2^64 values.One of the consequences of this change is that the next minor will not generate the same values as the minor right before.
From my recent analysis (following your bug report), the recommended way to generate multiple random subsequences using a prng is to implement a jump method. It consists into offsetting the prng as if it was asked to generate a very huge amount of random numbers. In the case of xorshift128+ (default prng used by fast-check), the recommendation is to offset it by 2^64 😳 Hopefully there is a quick algorithm to do that efficiently. I’m working on adding the jump capability in pure-rand (lib uses by fast-check to generate its random values). Too be continued…