question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Column family ID mismatch" during tests and devmode in some hardware/OS

See original GitHub issue

Lagom Version (1.2.x / 1.3.x / etc)

1.4.x

API (Scala / Java / Neither / Both)

Lagom Cassandra persistence (probably both programming APIs)

Expected Behavior

Running ServiceTest enabling Cassandra persistence (using withCassandra) will:

  1. Start Cassandra
  2. Start the service
  3. run the tests
  4. Shutdown everything

Actual Behavior

  1. Start Cassandra
  2. Start the service
  3. (often) the startup fails with a Column family ID mismatch

Reproducible Test Case

This is really hard to reproduce. Even with a reproducer, the failure only happens in certain environments and only sometimes.


After some investigation, I think the problem is the fact that Lagom has three different threads trying to alter the Cassandra SCHEMA:

  1. write-side setup (creates tables for the Journal, the Snapshots and the queries)
  2. a ClusterStartupTask to create the OffsetStore for the read-side
  3. a ClusterStartupTask to run the user code in the `globalPrepare``

We managed to run the same test suite in different environments where a particular environment (don’t have details wrt HW/OS/JVM version) fails 50% of the times. After inspecting the logs I noticed the failure seems to happen (at least) when the ClusterStartupTask running user’s globalPrepare starts before the ClusterStartupTask creating the OffsetStore completed. The consequence of both ClusterStartupTasks running concurrently is that separate schema changes are produced which triggers the infamous Column family ID mismatch.

Note: the above is a hypothesis I’m basing on seeing the following in the logs of a failed execution:

(cleaned up logs)
2019-04-26T04:26:16.7834873Z 04:26:16.771 Executing cluster start task cassandraOffsetStorePrepare.
...
2019-04-26T04:26:17.1181499Z 04:26:17.062 Executing cluster start task readSideGlobalPrepare-UserLandProcessor.
...
2019-04-26T04:26:32.5744257Z 04:26:32.569 Cluster start task cassandraOffsetStorePrepare done
...

While successful executions would display:

(made up logs)
Executing cluster start task cassandraOffsetStorePrepare.
Cluster start task cassandraOffsetStorePrepare done
Executing cluster start task readSideGlobalPrepare-UserLandProcessor.
Cluster start task readSideGlobalPrepare-UserLandProcessor done

where tasks don’t run concurrently.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ignasi35commented, May 9, 2019

@ignasi35 thanks for sharing this - where can we get the CQL templates to use to create all the required tables and mat views for Lagom to work?

akka-persistence-cassandra has all statements parameterized: her’s the statments for the journal and the statements for the snapshots.

Then lagom statments are split into offset store and keyspace.

0reactions
TimMoorecommented, May 9, 2019

@lejoow they’re in the same place, but you’ll need to look at the version of the file corresponding to the version of Akka Persistence Cassandra you have, such as https://github.com/akka/akka-persistence-cassandra/blob/v0.61/core/src/main/scala/akka/persistence/cassandra/journal/CassandraStatements.scala

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to resolve issue in Cassandra "ConfigurationException
It ended up giving an exception "org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch". When we execute the command ...
Read more >
Org.Apache.Cassandra.Exceptions.Configurationexception
Issue: Column family ID mismatch during tests in some hardware multiple threads altering the KEYSPACE Lagom's read side tries to create the offsetStore ......
Read more >
Cassandra: "Column family ID mismatch" at startup - Lagom
With a clean cassandra server, sbt runAll results in several “Column family ID mismatch” errors in the cassandra log.
Read more >
Column family ID mismatch" - Datastax Community
I am getting error while executing alter table script. ServerError: java.lang.RuntimeException: java.util.concurrent.
Read more >
Browsing Programming, Coding & Software Development
I've tested this tool on projects based on both pjsip (300+ kLOC) and baresip but handling of some rarely used language constructions is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found