Thread safety concerns due to "memory barrier" pattern in value classes
See original GitHub issueMany of the Value
classes contain instances of a pattern that I think is an incorrect attempt at creating a “memory barrier”. This pattern is known to be incorrect in Java.
The essence of the incorrect memory barrier pattern is
- Writers acquire a lock, initialize an object, release the lock, re-acquire the lock, then write the initialized object to a place where readers can see it. Finally, they release the lock again.
- Readers do not use any synchronization.
This pattern is seductive because it does constrain the order in which writes reach main memory. There is also no issue if all reads and writes happen in the same thread. However, it is incorrect when there are concurrent writers and readers because the readers do not have to observe those writes in order. Readers can observe writes out-of-order if they page-in the reference to the initialized object before they page-in the writes that initialize the fields of the object.
Here is an example from FcnRcdValue
. The “writer” pathway is in createIndex()
:
int[] tbl = new int[len];
Arrays.fill(tbl, -1);
synchronized(this) {
for (int i = 0; i < this.domain.length; i++) {
int loc = (this.domain[i].hashCode() & 0x7FFFFFFF) % len;
while (tbl[loc] != -1) {
loc = (loc + 1) % len;
}
tbl[loc] = i;
}
}
synchronized(this) { this.indexTbl = tbl; }
The first synchronized
block initializes the cells of the local array tbl
, and the second one writes tbl
to this.indexTbl
where it can be picked up by readers.
The reader pathway is in select(arg)
:
if (this.indexTbl != null) {
return this.values[this.lookupIndex(arg)];
}
if (this.isNorm) this.createIndex();
if (this.indexTbl != null) {
return this.values[this.lookupIndex(arg)];
}
In this example, the reader’s first check (this.indexTbl != null
) can be true even when the cells of the indexTbl
array are uninitialized (i.e. all zero). Even if the cells are initialized in main memory, those writes may not be visible to the reader yet. This could lead to readers crashing or returning incorrect values for function applications.
These classes contain instances of this pattern:
FcnRcdValue
SetCapValue
SetCupValue
SetDiffValue
SetOfFcnsValue
SetOfRcdsValue
SetOfTuplesValue
SubsetValue
UnionValue
Either this is a very subtle concurrency bug, or the synchronization is actually unnecessary and it can be removed to improve TLC’s performance by one iota.
If this is a concurrency bug, I believe it can be fixed with careful use of the volatile
keyword and removal (or merging) of some synchronized
blocks.
I have never seen problems due to this code. However, I believe it is worth reporting and fixing.
Issue Analytics
- State:
- Created 3 years ago
- Comments:21 (17 by maintainers)
Thanks, @craft095 and @Calvin-L for the reviews!
Sure I will do.