dotnet process crashes when using wildcard topics
See original GitHub issueDescription
We are developing a medium sized application that uses Kafka as publish - subscribe message broker. The application crashes when we subscribe several consumers using wildcard topics. If we do not use wildcard topics everything works as expected.
Most of the times the crash happens a few seconds after subscribing the consumer. The crash cause varies among access violation, stack overflow and heap corruption, and it is usually located in librdkafka.DLL, although sometimes other dlls appear, like ntdll.DLL.
We have verified that exactly the same code running exactly the same versions of Confluent.Kafka nuget and dotnet only crashes in Windows. If the code is executed in a Linux host, it never crashes.
We have also verified that this crash happens at least with versions 0.11.4, 0.11.6 and 1.0.0-RC4 of Confluent.Kafka nuget.
We have prepared a toy example below that reproduces the crash.
How to reproduce
With version 1.0.0-RC4 of the Nuget, using dotnet 2.2.105, the following code reproduces the error in Windows 10 x64:
static void Main(string[] args)
{
var conf1 = new ConsumerConfig
{
GroupId = "test-consumer-group1",
BootstrapServers = "localhost:9092",
AutoOffsetReset = AutoOffsetReset.Latest
};
var conf2 = new ConsumerConfig
{
GroupId = "test-consumer-group2",
BootstrapServers = "localhost:9092",
AutoOffsetReset = AutoOffsetReset.Latest
};
var conf3 = new ConsumerConfig
{
GroupId = "test-consumer-group3",
BootstrapServers = "localhost:9092",
AutoOffsetReset = AutoOffsetReset.Latest
};
var conf4 = new ConsumerConfig
{
GroupId = "test-consumer-group4",
BootstrapServers = "localhost:9092",
AutoOffsetReset = AutoOffsetReset.Latest
};
var c1 = new ConsumerBuilder<Ignore, string>(conf1).Build();
var c2 = new ConsumerBuilder<Ignore, string>(conf2).Build();
var c3 = new ConsumerBuilder<Ignore, string>(conf3).Build();
var c4 = new ConsumerBuilder<Ignore, string>(conf4).Build();
c1.Subscribe("^tenants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.plants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.notifications\\..*");
c2.Subscribe("^tenants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.plants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.notifications\\..*");
c3.Subscribe("^tenants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.plants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.notifications\\..*");
c4.Subscribe("^tenants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.plants\\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\\.notifications\\..*");
Console.WriteLine("All consumers subscribed.");
CancellationTokenSource cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, e) => {
e.Cancel = true; // prevent the process from terminating.
cts.Cancel();
};
while (true)
{
try
{
var cr1 = c1.Consume(cts.Token);
Console.WriteLine($"1. Consumed message '{cr1.Value}' at: '{cr1.TopicPartitionOffset}'.");
var cr2 = c2.Consume(cts.Token);
Console.WriteLine($"2. Consumed message '{cr2.Value}' at: '{cr2.TopicPartitionOffset}'.");
var cr3 = c3.Consume(cts.Token);
Console.WriteLine($"3. Consumed message '{cr3.Value}' at: '{cr3.TopicPartitionOffset}'.");
var cr4 = c4.Consume(cts.Token);
Console.WriteLine($"4. Consumed message '{cr4.Value}' at: '{cr4.TopicPartitionOffset}'.");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occured: {e.Error.Reason}");
}
}
c1.Close();
c2.Close();
c3.Close();
c4.Close();
}
The output of the program shows the message “All consumers subscribed” and then dies after a few seconds. If the topic are replaced by “tenants\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\.plants\.a77b7bec-c9d5-468c-89d8-cc3dc293354f\.notifications\.a” (the other three ending respectively in b, c and d) then the application does not crash.
In the code of the project the way in which we setup the consumers and manage the messages is much more involved (we use several async methods), but this minimal example reproduces the error exactly in the same way.
We can provide more details on our setup and even a zip file with the project if needed.
Checklist
Please provide the following information:
-
A complete (i.e. we can run it), minimal program demonstrating the problem. No need to supply a project file.
-
Confluent.Kafka nuget version. Crash reproduced using versions 0.11.4, 0.11.6 and 1.0.0-RC4.
-
Apache Kafka version. Kafka 2.1.0, Zookeeper 3.4.12 and java 1.8.0_181.
-
Client configuration.
-
Operating system. dotnet version 2.2.105 running under Windows 10 x64. The issue is not reproducible in Ubuntu 16.04.5 LTS even using exactly the same versions for everything.
-
Provide logs (with “debug” : “…” as necessary in configuration). Application crashes without printing anything to console. The Windows application event log shows the following information:
-
If the crash happens in librdkafka (most of the times):
Name of the application with errors: dotnet.exe, versión: 2.2.27207.3, timestamp: 0x5c0ab1b7
Name of the module with errors: librdkafka.DLL, versión: 0.0.0.0, timestamp: 0x5c99628f
Exception code: 0xc00000fd
Error offset: 0x00000000000cb36e
Identifier of the process with errors: 0x6938
Application with errors start time: 0x01d4eeb93b8ac8e1
Path to the application with errors: C:\Program Files\dotnet\dotnet.exe
Path to the application module with errors: C:\Users\vmartin\.nuget\packages\librdkafka.redist\1.0.0\runtimes\win-x64\native\librdkafka.DLL
Report identifier: 879113b6-44ff-4e07-94ce-1094048db72a
- If the crash happens in ntdll.dll (sometimes):
Name of the application with errors: dotnet.exe, versión: 2.2.27207.3, marca de tiempo: 0x5c0ab1b7
Name of the module with errors: ntdll.dll, versión: 10.0.17763.404, marca de tiempo: 0xbf6ea104
Exception code: 0xc0000374
Error offset: 0x00000000000faf89
Identifier of the process with errors: 0x6a54
Application with errors start time: 0x01d4eedb61eb4cea
Path to the application with errors: C:\Program Files\dotnet\dotnet.exe
Path to the application module with errors: C:\WINDOWS\SYSTEM32\ntdll.dll
Report identifier: 00e2b779-0858-46e6-b15b-590d80884540
- Provide broker log excerpts. No log lines are usually written to the log of the broker when the crash happens.
- Critical issue.
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (6 by maintainers)
Top GitHub Comments
I am seeing the same problem on a Windows 10 machine running version 1.0.1. We using lz4 compression on the producer side (as mentioned in #482 ). Our consumer application creates two consumer objects that use wildcard subscriptions (with different topic patterns) on startup. Shortly afterwards, the application crashes. Sometimes due to a StackOverflow. Other times due to a MemoryAccessViolation.
@edenhill Have you been able to reproduce this yet? I cannot see any issues in the librdkafka repo that seem related to this bug.
We hit this issue today. I can confirm the behavior - it crashes on windows if there are more than two consumers using wildcard subscription. As it crashes without any (managed) exception or stack trace it took me a while to figure out what’s going.