"Column family ID mismatch" during tests and devmode in some hardware/OS
See original GitHub issueLagom Version (1.2.x / 1.3.x / etc)
1.4.x
API (Scala / Java / Neither / Both)
Lagom Cassandra persistence (probably both programming APIs)
Expected Behavior
Running ServiceTest
enabling Cassandra persistence (using withCassandra
) will:
- Start Cassandra
- Start the service
- run the tests
- Shutdown everything
Actual Behavior
- Start Cassandra
- Start the service
- (often) the startup fails with a
Column family ID mismatch
Reproducible Test Case
This is really hard to reproduce. Even with a reproducer, the failure only happens in certain environments and only sometimes.
After some investigation, I think the problem is the fact that Lagom has three different threads trying to alter the Cassandra SCHEMA:
- write-side setup (creates tables for the Journal, the Snapshots and the queries)
- a
ClusterStartupTask
to create theOffsetStore
for the read-side - a
ClusterStartupTask
to run the user code in the `globalPrepare``
We managed to run the same test suite in different environments where a particular environment (don’t have details wrt HW/OS/JVM version) fails 50% of the times. After inspecting the logs I noticed the failure seems to happen (at least) when the ClusterStartupTask
running user’s globalPrepare
starts before the ClusterStartupTask
creating the OffsetStore
completed. The consequence of both ClusterStartupTasks
running concurrently is that separate schema changes are produced which triggers the infamous Column family ID mismatch
.
Note: the above is a hypothesis I’m basing on seeing the following in the logs of a failed execution:
(cleaned up logs)
2019-04-26T04:26:16.7834873Z 04:26:16.771 Executing cluster start task cassandraOffsetStorePrepare.
...
2019-04-26T04:26:17.1181499Z 04:26:17.062 Executing cluster start task readSideGlobalPrepare-UserLandProcessor.
...
2019-04-26T04:26:32.5744257Z 04:26:32.569 Cluster start task cassandraOffsetStorePrepare done
...
While successful executions would display:
(made up logs)
Executing cluster start task cassandraOffsetStorePrepare.
Cluster start task cassandraOffsetStorePrepare done
Executing cluster start task readSideGlobalPrepare-UserLandProcessor.
Cluster start task readSideGlobalPrepare-UserLandProcessor done
where tasks don’t run concurrently.
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (9 by maintainers)
Top GitHub Comments
akka-persistence-cassandra
has all statements parameterized: her’s the statments for the journal and the statements for the snapshots.Then lagom statments are split into offset store and keyspace.
@lejoow they’re in the same place, but you’ll need to look at the version of the file corresponding to the version of Akka Persistence Cassandra you have, such as https://github.com/akka/akka-persistence-cassandra/blob/v0.61/core/src/main/scala/akka/persistence/cassandra/journal/CassandraStatements.scala