question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Importing lot of versions of a schema from KafkaSQL causes crash loop

See original GitHub issue

Hello,

we’ve run into an issue that is partially caused by user’s mistake, but migh affect someone else, so I would like to describe it and ask for your help/advice. We use Apicurio v2.2.4 in docker image on k8s cluster, with KafkaSQL storage underneath.

The cause

One of our Apicurio instance users uses schema-registry the way, that they send a ‘PUT’ request with every request on their schema with the same content: PUT /api/artifacts/com.example.MySchema1 We are working on the improval of their process, this is not the standard usecase of course. But what it caused so far is, that we have 10000+ versions of this schema.

Also, it means, that every version is a Kafka message to be processed.

The issue

When our pods with Apicurio are restarted, Apicurio loads and processes messages from Kafka topic. When it gets to the messages with (many) new versions of the problematic schema, these messages are processed the way, that it causes all database sessions to disconect before finishing and Apicurio pod crashes.

This is the trace from h2:mem database processing one of the versions:

2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: e0fb998a-e294-4cc2-9997-0f4e73150296
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:59 #:1*/SELECT value FROM sequences WHERE name = ? AND tenantId = ? {1: 'globalId', 2: '_'};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:81 #:1*/MERGE INTO sequences (tenantId, name, value) KEY (tenantId, name) VALUES(?, ?, ?) {1: '_', 2: 'globalId', 3: 52609075};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:01:12 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 36f5c6bf-7df1-4e28-a1df-46cf36d89aef
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:76 #:1*/SELECT c.contentId FROM content c WHERE c.contentHash = ? AND c.tenantId = ? {1: 'fa5d4b21ae39e9e978a7f1d06b7ed4d6dfdc1840c2e27eefd33181579ee8260f', 2: '_'};
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) Updating artifact null com.example.MySchema1 with a new version (content).
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:315 #:1*/SELECT a.*, v.contentId, v.globalId, v.version, v.versionId, v.state, v.name, v.description, v.labels, v.properties, v.createdBy AS modifiedBy, v.createdOn AS modifiedOn FROM artifacts a JOIN versions v ON a.tenantId = v.tenantId AND a.latest = v.globalId WHERE a.tenantId = ? AND a.groupId = ? AND a.artifactId = ? {1: '_', 2: '__$GROUPID$__', 3: 'com.example.MySchema1'};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:305 #:1 t:13*/INSERT INTO versions (globalId, tenantId, groupId, artifactId, version, versionId, state, name, description, createdBy, createdOn, labels, properties, contentId) VALUES (?, ?, ?, ?, ?, (SELECT MAX(versionId) + 1 FROM versions WHERE tenantId = ? AND groupId = ? AND artifactId = ?), ?, ?, ?, ?, ?, ?, ?, ?) {1: 52609075, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1', 5: NULL, 6: '_', 7: '__$GROUPID$__', 8: 'com.example.MySchema1', 9: 'ENABLED', 10: 'MySchema1', 11: NULL, 12: '', 13: TIMESTAMP '2022-07-11 11:50:25.212', 14: NULL, 15: NULL, 16: 605};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:134 #:1*/UPDATE versions SET version = (SELECT versionId FROM versions WHERE tenantId = ? AND globalId = ?) WHERE tenantId = ? AND globalId = ? {1: '_', 2: 52609075, 3: '_', 4: 52609075};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:85 #:1*/UPDATE artifacts SET latest = ? WHERE tenantId = ? AND groupId = ? AND artifactId = ? {1: 52609075, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1'};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL l:176 #:1*/SELECT v.*, a.type FROM versions v JOIN artifacts a ON v.tenantId = a.tenantId AND v.groupId = a.groupId AND v.artifactId = a.artifactId WHERE v.tenantId = ? AND v.globalId = ? {1: '_', 2: 52609075};
2022-07-11 15:01:12 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:01:12 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:01:12 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:12 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.

This is how it disconnects during processing:

2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: fd634a51-ba81-4064-81a6-3ffe3772eb43
2022-07-11 15:01:14 database: disconnecting session #3
2022-07-11 15:01:14 database: disconnected session #3
2022-07-11 15:01:14 database: disconnecting session #4
2022-07-11 15:01:14 database: disconnected session #4
2022-07-11 15:01:14 database: disconnecting session #5
2022-07-11 15:01:14 database: disconnected session #5
2022-07-11 15:01:14 database: disconnecting session #6
2022-07-11 15:01:14 database: disconnected session #6
2022-07-11 15:01:14 database: disconnecting session #7
2022-07-11 15:01:14 database: disconnected session #7
2022-07-11 15:01:14 database: disconnecting session #8
2022-07-11 15:01:14 database: disconnected session #8
2022-07-11 15:01:14 database: disconnecting session #9
2022-07-11 15:01:14 database: disconnected session #9
2022-07-11 15:01:14 database: disconnecting session #10
2022-07-11 15:01:14 database: disconnected session #10
2022-07-11 15:01:14 database: disconnecting session #11
2022-07-11 15:01:14 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) SqlRegistryStorage constructed successfully.  JDBC URL: jdbc:h2:mem:registry_db;DB_CLOSE_ON_EXIT=FALSE;TRACE_LEVEL_SYSTEM_OUT=2
2022-07-11 15:01:14 database: disconnected session #11
2022-07-11 15:01:14 database: disconnecting session #12
2022-07-11 15:01:14 database: disconnected session #12
2022-07-11 15:01:14 database: disconnecting session #13
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 database: disconnected session #13
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Registry exception detected: java.lang.RuntimeException: java.sql.SQLException: This pool is closed and does not handle any more connections!
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5b551d28-dc44-4f44-84b0-b88f37c98840
2022-07-11 15:01:14 database: disconnecting session #14
2022-07-11 15:01:14 database: disconnected session #14
2022-07-11 15:01:14 database: disconnecting session #15
2022-07-11 15:01:14 database: disconnected session #15
2022-07-11 15:01:14 database: disconnecting session #16
2022-07-11 15:01:14 database: disconnected session #16
2022-07-11 15:01:14 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) SqlRegistryStorage constructed successfully.  JDBC URL: jdbc:h2:mem:registry_db;DB_CLOSE_ON_EXIT=FALSE;TRACE_LEVEL_SYSTEM_OUT=2
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 database: disconnecting session #17
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Registry exception detected: java.lang.RuntimeException: java.sql.SQLException: This pool is closed and does not handle any more connections!

And this is how the container crashes:

2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 477a0bde-97df-4b6f-acf5-c06287bd3944
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.quarkus.runtime.ExecutorRecorder$2] (Shutdown thread) loop: 1, remaining: 60000000000, intervalRemaining: 5000000000, interruptRemaining: 10000000000
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 349dd267-6841-4327-99f7-8189d3b33c2b
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: b56f0d56-c151-4db0-a396-4abae8222033
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5b24233f-8159-4218-a399-420ba19b26cd
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 7fba2dbb-b75f-4aac-a69e-ef58956ba62e
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: cac5e98f-0a9c-4041-8bca-707b9aaa18f3
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: cfe0f411-1ebb-44de-b1cb-9a1316cb6ec7
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 2f0d0e2e-374c-4727-9eee-87c4fe62a477
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: a403b190-68d7-48ef-a394-46a583fdb1f2
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 5bb6bb57-01c3-4dfd-b4ce-2b0005470bdb
2022-07-11 15:01:14 INFO <> [io.quarkus.bootstrap.runner.Timing] (Shutdown thread) apicurio-registry-storage-kafkasql stopped in 0.115s
2022-07-11 15:01:14 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:01:14 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Unexpected exception detected: Error injecting javax.transaction.TransactionManager io.quarkus.narayana.jta.runtime.interceptor.TransactionalInterceptorBase.transactionManager

What I’ve noticed is, that it only crashes, when processing kafka messages with newly added versions. There is no problem with processing big amount of versions when I import schemas to a clean topic via export/import apis. Imported versions are processed slightly differently and it causes no issues with processing:

2022-07-11 15:00:29 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Processing Kafka message with UUID: 6d71e375-24cd-4a2f-844b-d97d520f9b48
2022-07-11 15:00:29 jdbc[3]: 
/*SQL l:103 #:1*/SELECT COUNT(a.artifactId) FROM artifacts a WHERE a.tenantId = ? AND a.groupId = ? AND a.artifactId = ? {1: '_', 2: '__$GROUPID$__', 3: 'com.example.MySchema1'};
2022-07-11 15:00:29 jdbc[3]: 
/*SQL l:72 #:1*/SELECT COUNT(globalId) FROM versions WHERE globalId = ? AND tenantId = ? {1: 52604860, 2: '_'};
2022-07-11 15:00:29 jdbc[3]: 
/*SQL l:211 #:1*/INSERT INTO versions (globalId, tenantId, groupId, artifactId, version, versionId, state, name, description, createdBy, createdOn, labels, properties, contentId) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) {1: 52604860, 2: '_', 3: '__$GROUPID$__', 4: 'com.example.MySchema1', 5: '6585', 6: 6585, 7: 'ENABLED', 8: 'MySchema1', 9: NULL, 10: '', 11: TIMESTAMP '2022-06-28 12:11:50.741', 12: NULL, 13: NULL, 14: 605};
2022-07-11 15:00:29 INFO <_> [io.apicurio.registry.storage.impl.sql.AbstractSqlRegistryStorage] (KSQL Kafka Consumer Thread) Artifact version entity imported successfully.
2022-07-11 15:00:29 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:00:29 jdbc[3]: 
/*SQL */COMMIT;
2022-07-11 15:00:29 DEBUG <_> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Clearing tenant id after message processed
2022-07-11 15:00:29 DEBUG <> [io.apicurio.registry.storage.impl.kafkasql.sql.KafkaSqlSink] (KSQL Kafka Consumer Thread) Kafka message successfully processed. Notifying listeners of response.

My questions

Would it be, please, possible to fix the processing of new versions? It is unusual usecase, but apparently possible. I think it is slightly similar to crash loops caused by globalIds thay you fixed a year ago in #1500 . Is there a way to prevent such behavior of users in Apicurio setup?

Is it, in case of such issues, possible to keep session connected longer? It starts here:

2022-07-11 15:00:20 database: connecting session #4 to mem:registry_db

ends 54 seconds later in my case:

2022-07-11 15:01:14 database: disconnecting session #4
2022-07-11 15:01:14 database: disconnected session #4

and none of the setups for H2 DB parameters or quarkus.datasource parameters helped me to affect the lifetime of the session.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:20 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
andreaTPcommented, Sep 17, 2022

Thanks a lot for getting back @petolexa ! Appreciated! And happy that we solved the mistery 🙂

1reaction
andreaTPcommented, Sep 16, 2022

Hi @petolexa ! Thanks for sharing those results!

only show repeated liveness and healthiness issue as the result of the crash

I think this analysis is not correct, from the events you shared looks pretty clear that the pod is getting killed because of probe failures (not the other way around). This also explains the “disconnections” you are observing, now I understand that they are caused by the Pod receiving a SIG-TERM.

I encourage you to tweak the probes to have a much more relaxed frequency and timeouts as a first step.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Integrating Spark Structured Streaming with the Confluent ...
In spark, create the confluent rest service object to get the schema. Convert the schema string in the response object into an Avro...
Read more >
Migrate Schemas - Confluent Documentation
In this case, you migrate your existing self-managed Schema Registry to Confluent Cloud Schema Registry as a primary. All new schemas are registered...
Read more >
Common Errors and Solutions | CockroachDB Docs
Ambiguous errors can be caused by nodes crashing, network failures, or timeouts. If you experience a lot of these errors when things are...
Read more >
Troubleshoot Dataflow errors - Google Cloud
This error occurs if a single operation causes the worker code to fail four times. Dataflow fails the job, and this message is...
Read more >
10.0.x Release Notes Vertica Software Version
Vertica for SQL on Hadoop is for deployment on Hadoop nodes. ... EXPORT_TABLES in the SELECT statement of EXPORT TO VERTICA caused the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found