Importing lot of versions of a schema from KafkaSQL causes crash loop
See original GitHub issueHello,
we’ve run into an issue that is partially caused by user’s mistake, but migh affect someone else, so I would like to describe it and ask for your help/advice. We use Apicurio v2.2.4 in docker image on k8s cluster, with KafkaSQL storage underneath.
The cause
One of our Apicurio instance users uses schema-registry the way, that they send a ‘PUT’ request with every request on their schema with the same content:
PUT /api/artifacts/com.example.MySchema1
We are working on the improval of their process, this is not the standard usecase of course. But what it caused so far is, that we have 10000+ versions of this schema.
Also, it means, that every version is a Kafka message to be processed.
The issue
When our pods with Apicurio are restarted, Apicurio loads and processes messages from Kafka topic. When it gets to the messages with (many) new versions of the problematic schema, these messages are processed the way, that it causes all database sessions to disconect before finishing and Apicurio pod crashes.
This is the trace from h2:mem database processing one of the versions:
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: e0fb998a-e294-4cc2-9997-0f4e73150296
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:59 #:1*/SELECT value FROM sequences WHERE name = ? AND tenantId = ? {1: 'globalId', 2: '_'};
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:81 #:1*/MERGE INTO sequences (tenantId, name, value) KEY (tenantId, name) VALUES(?, ?, ?) {1: '_', 2: 'globalId', 3: 52609075};
2022-07-11 15:01:12 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:01:12 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 36f5c6bf-7df1-4e28-a1df-46cf36d89aef
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:76 #:1*/SELECT c.contentId FROM content c WHERE c.contentHash = ? AND c.tenantId = ? {1: 'fa5d4b21ae39e9e978a7f1d06b7ed4d6dfdc1840c2e27eefd33181579ee8260f', 2: '_'};
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) Updating artifact null com.example.MySchema1 with a new version (content).
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:315 #:1*/SELECT a.*, v.contentId, v.globalId, v.version, v.versionId, v.state, v.name, v.description, v.labels, v.properties, v.createdBy AS modifiedBy, v.createdOn AS modifiedOn FROM artifacts a JOIN versions v ON a.tenantId = v.tenantId AND a.latest = v.globalId WHERE a.tenantId = ? AND a.groupId = ? AND a.artifactId = ? {1: '_', 2: '__$GROUPID$__', 3: 'com.example.MySchema1'};
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:305 #:1 t:13*/INSERT INTO versions (globalId, tenantId, groupId, artifactId, version, versionId, state, name, description, createdBy, createdOn, labels, properties, contentId) VALUES (?, ?, ?, ?, ?, (SELECT MAX(versionId) + 1 FROM versions WHERE tenantId = ? AND groupId = ? AND artifactId = ?), ?, ?, ?, ?, ?, ?, ?, ?) {1: 52609075, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1', 5: NULL, 6: '_', 7: '__$GROUPID$__', 8: 'com.example.MySchema1', 9: 'ENABLED', 10: 'MySchema1', 11: NULL, 12: '', 13: TIMESTAMP '2022-07-11 11:50:25.212', 14: NULL, 15: NULL, 16: 605};
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:134 #:1*/UPDATE versions SET version = (SELECT versionId FROM versions WHERE tenantId = ? AND globalId = ?) WHERE tenantId = ? AND globalId = ? {1: '_', 2: 52609075, 3: '_', 4: 52609075};
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:85 #:1*/UPDATE artifacts SET latest = ? WHERE tenantId = ? AND groupId = ? AND artifactId = ? {1: 52609075, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1'};
2022-07-11 15:01:12 jdbc[3]:
/*SQL l:176 #:1*/SELECT v.*, a.type FROM versions v JOIN artifacts a ON v.tenantId = a.tenantId AND v.groupId = a.groupId AND v.artifactId = a.artifactId WHERE v.tenantId = ? AND v.globalId = ? {1: '_', 2: 52609075};
2022-07-11 15:01:12 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:01:12 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.
This is how it disconnects during processing:
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: fd634a51-ba81-4064-81a6-3ffe3772eb43
2022-07-11 15:01:14 database: disconnecting session #3
2022-07-11 15:01:14 database: disconnected session #3
2022-07-11 15:01:14 database: disconnecting session #4
2022-07-11 15:01:14 database: disconnected session #4
2022-07-11 15:01:14 database: disconnecting session #5
2022-07-11 15:01:14 database: disconnected session #5
2022-07-11 15:01:14 database: disconnecting session #6
2022-07-11 15:01:14 database: disconnected session #6
2022-07-11 15:01:14 database: disconnecting session #7
2022-07-11 15:01:14 database: disconnected session #7
2022-07-11 15:01:14 database: disconnecting session #8
2022-07-11 15:01:14 database: disconnected session #8
2022-07-11 15:01:14 database: disconnecting session #9
2022-07-11 15:01:14 database: disconnected session #9
2022-07-11 15:01:14 database: disconnecting session #10
2022-07-11 15:01:14 database: disconnected session #10
2022-07-11 15:01:14 database: disconnecting session #11
2022-07-11 15:01:14 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) SqlRegistryStorage constructed successfully. JDBC URL: jdbc:h2:mem:registry_db;DB_CLOSE_ON_EXIT=FALSE;TRACE_LEVEL_SYSTEM_OUT=2
2022-07-11 15:01:14 database: disconnected session #11
2022-07-11 15:01:14 database: disconnecting session #12
2022-07-11 15:01:14 database: disconnected session #12
2022-07-11 15:01:14 database: disconnecting session #13
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 database: disconnected session #13
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Registry exception detected: java.lang.RuntimeException: java.sql.SQLException: This pool is closed and does not handle any more connections!
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5b551d28-dc44-4f44-84b0-b88f37c98840
2022-07-11 15:01:14 database: disconnecting session #14
2022-07-11 15:01:14 database: disconnected session #14
2022-07-11 15:01:14 database: disconnecting session #15
2022-07-11 15:01:14 database: disconnected session #15
2022-07-11 15:01:14 database: disconnecting session #16
2022-07-11 15:01:14 database: disconnected session #16
2022-07-11 15:01:14 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) SqlRegistryStorage constructed successfully. JDBC URL: jdbc:h2:mem:registry_db;DB_CLOSE_ON_EXIT=FALSE;TRACE_LEVEL_SYSTEM_OUT=2
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 database: disconnecting session #17
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Registry exception detected: java.lang.RuntimeException: java.sql.SQLException: This pool is closed and does not handle any more connections!
And this is how the container crashes:
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 477a0bde-97df-4b6f-acf5-c06287bd3944
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.quarkus.runtime.ExecutorRecorder$2] (Shutdown thread) loop: 1, remaining: 60000000000, intervalRemaining: 5000000000, interruptRemaining: 10000000000
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 349dd267-6841-4327-99f7-8189d3b33c2b
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: b56f0d56-c151-4db0-a396-4abae8222033
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5b24233f-8159-4218-a399-420ba19b26cd
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 7fba2dbb-b75f-4aac-a69e-ef58956ba62e
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: cac5e98f-0a9c-4041-8bca-707b9aaa18f3
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: cfe0f411-1ebb-44de-b1cb-9a1316cb6ec7
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 2f0d0e2e-374c-4727-9eee-87c4fe62a477
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: a403b190-68d7-48ef-a394-46a583fdb1f2
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5bb6bb57-01c3-4dfd-b4ce-2b0005470bdb
2022-07-11 15:01:14 INFO <> [io.quarkus.bootstrap.runner.Timing] (Shutdown thread) apicurio-registry-storage-kafkasql stopped in 0.115s
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
What I’ve noticed is, that it only crashes, when processing kafka messages with newly added versions. There is no problem with processing big amount of versions when I import schemas to a clean topic via export/import apis. Imported versions are processed slightly differently and it causes no issues with processing:
2022-07-11 15:00:29 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 6d71e375-24cd-4a2f-844b-d97d520f9b48
2022-07-11 15:00:29 jdbc[3]:
/*SQL l:103 #:1*/SELECT COUNT(a.artifactId) FROM artifacts a WHERE a.tenantId = ? AND a.groupId = ? AND a.artifactId = ? {1: '_', 2: '__$GROUPID$__', 3: 'com.example.MySchema1'};
2022-07-11 15:00:29 jdbc[3]:
/*SQL l:72 #:1*/SELECT COUNT(globalId) FROM versions WHERE globalId = ? AND tenantId = ? {1: 52604860, 2: '_'};
2022-07-11 15:00:29 jdbc[3]:
/*SQL l:211 #:1*/INSERT INTO versions (globalId, tenantId, groupId, artifactId, version, versionId, state, name, description, createdBy, createdOn, labels, properties, contentId) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) {1: 52604860, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1', 5: '6585', 6: 6585, 7: 'ENABLED', 8: 'MySchema1', 9: NULL, 10: '', 11: TIMESTAMP '2022-06-28 12:11:50.741', 12: NULL, 13: NULL, 14: 605};
2022-07-11 15:00:29 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) Artifact version entity imported successfully.
2022-07-11 15:00:29 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:00:29 jdbc[3]:
/*SQL */COMMIT;
2022-07-11 15:00:29 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:00:29 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.
My questions
Would it be, please, possible to fix the processing of new versions? It is unusual usecase, but apparently possible. I think it is slightly similar to crash loops caused by globalIds thay you fixed a year ago in #1500 . Is there a way to prevent such behavior of users in Apicurio setup?
Is it, in case of such issues, possible to keep session connected longer? It starts here:
2022-07-11 15:00:20 database: connecting session #4 to mem:registry_db
ends 54 seconds later in my case:
2022-07-11 15:01:14 database: disconnecting session #4
2022-07-11 15:01:14 database: disconnected session #4
and none of the setups for H2 DB parameters or quarkus.datasource parameters helped me to affect the lifetime of the session.
Issue Analytics
- State:
- Created a year ago
- Comments:20 (10 by maintainers)
Top GitHub Comments
Thanks a lot for getting back @petolexa ! Appreciated! And happy that we solved the mistery 🙂
Hi @petolexa ! Thanks for sharing those results!
I think this analysis is not correct, from the events you shared looks pretty clear that the pod is getting killed because of probe failures (not the other way around). This also explains the “disconnections” you are observing, now I understand that they are caused by the Pod receiving a
SIG-TERM
.I encourage you to tweak the probes to have a much more relaxed frequency and timeouts as a first step.