failed to get Partitioned metadata : Policies not found for mytenant/mynamespace namespace
See original GitHub issueDescribe the bug We have 4 pulsar cluster running on kubernetes with geo-replication. All worked well. Before going to prod, we did some load testing. And during one of them (last one at time of writing), pulsar brokers started to loop on exceptions like:
13:30:24.053 [pulsar-io-21-3] WARN org.apache.pulsar.client.impl.BinaryProtoLookupService - [non-persistent://mytenant/mynamespace/mytopic-98f088294f0c7a509ab4b1a5412b79308a75d50e] failed to get Partitioned metadata : Policies not found for session/user_events namespace
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: Policies not found for mytenant/mynamespace
Note : we have a lot of non-persistent / not partitioned topics in that namespace (23k). Too many?
To Reproduce Well, difficult. We got this “by chance” during a load test and I cannot find out what caused it
During my investigation so far I tried many things, including:
pulsar-admin namespaces policies mytenant/mynamespace
that returns information without any problem. But I really want to understand what happened and not only restart the cluster and start from fresh.
Everything looks fine (according to logs):
- zookeeper
- global zookeeper (used for geo replication)
- bookkeeper (I checked even if we do not persist those topics)
Only brokers in all regions do log a lot (well, maybe not that much per topic, but with so many topics it is quite a lot).
Any clue on how to investigate / fix it (in case it happens in prod once we do that move) ?
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (4 by maintainers)
Top GitHub Comments
Getting the issue again with all kind of topics :
And for the latter, I checked that all 3 zk instances have the same content for
/namespace/<mytenant>/<mynamespace>
When I do a
pulsar-admin topics list tenant/namespace
I get a responseHTTP 500 Internal Server Error
.Checking the logs I can see on one broker (the one handling the request):
On other 2 brokers I get:
and
On zookeeper, on all 3 nodes, if I check the content of
/namespace/tenant/namespace
I get:So I indeed have some bundles that have “disappeared” from brokers. But I got no broker crash or zookeeper issue whatsoever.
But I do not see any error in zookeeper logs apart maybe things like:
but those are info logs.
Still a mystery for me … But learning everytime, so now going to rebuild the cluster and check a few more things before using it to ensure it is all ok
We were getting this exact error when trying to deploy a source (after adding a zookeeper cluster for just configuration, giving us a local zookeeper cluster and a configuration zookeeper cluster). I started exploring in the local Zookeeper cluster, and I found that two of our brokers had different entries. It looked like this:
So, we tried deploying the source from a broker that wasn’t host07 or host08, and it succeeded without the exception.