Namerd request to Consul "times out" if DTab stored in Consul is malformed
See original GitHub issueIssue Type:
- Bug report
- Feature request
What happened:
When requesting DTabs from Consul, if Namerd has not already cached a valid DTab configuration, Namerd will stall (as if it’s waiting for data from Consul) if the DTab is malformed. The logs do not display a warning that the DTab is malformed and the Namerd Admin console hangs waiting for a response from the Namerd node.
E 1220 00:26:45.879 UTC THREAD27: adminhttp
com.twitter.finagle.CancelledRequestException: request cancelled. Remote Info: Not Available
What you expected to happen:
I expect an error in the Namerd logs saying it can’t parse the DTab and any request via the Admin console to return that error.
How to reproduce it (as minimally and precisely as possible):
- Create a Consul cluster.
- Create a Consul KV entry with an invalid DTab.
- Create a Namerd config that uses Consul Storage via that Consul keyspace
- Start Namerd.
- Go to Namerd Admin console. Attempt to access the DTab.
Anything else we need to know?:
Environment:
- This has occurred since we first started using Namerd (0.9.x) and is still present in the latest version (1.3.4)
- Running everything AWS ECS, with Consul versions 0.9.3 and 1.0.2 (tried both). However, we’ve experienced the same problem via Docker Compose.
Here’s the Consul info (just in case):
/ # consul info -http-addr 10.26.8.35:8500
agent:
check_monitors = 0
check_ttls = 0
checks = 4
services = 4
build:
prerelease =
revision = 112c060
version = 0.9.3
consul:
bootstrap = true
known_datacenters = 1
leader = true
leader_addr = 10.26.8.35:8300
server = true
raft:
applied_index = 1479
commit_index = 1479
fsm_pending = 0
last_contact = 0
last_log_index = 1479
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:10.26.8.35:8300 Address:10.26.8.35:8300}]
latest_configuration_index = 1
num_peers = 0
protocol_version = 2
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 2
goroutines = 85
max_procs = 2
os = linux
version = go1.9
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 2
members = 2
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
/ # consul members
Node Address Status Type Build Protocol DC Segment
master-1 10.26.8.35:8301 alive server 0.9.3 2 qa-feature-something-something <all>
qa-services-feature-something-something-i-0fb6030d99d904dc9 10.26.8.67:8301 alive client 0.9.3 2 qa-feature-something-something <default>
And our Namerd configuration (keep in mind the IP’s are generated at deployment time):
admin:
ip: 0.0.0.0
port: 9991
interfaces:
- kind: io.l5d.thriftNameInterpreter
ip: 0.0.0.0
port: 4100
- kind: io.l5d.httpController
ip: 0.0.0.0
port: 4180
namers:
- kind: io.l5d.consul
host: 10.26.8.67
prefix: /consul
includeTag: true
storage:
kind: io.l5d.consul
host: 10.26.8.67
pathPrefix: /namerd/dtabs
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (8 by maintainers)
Top Results From Across the Web
Common Error Messages - Troubleshoot | Consul
Common errors result from failed actions, timeouts, multiple entries, bad and expired certificates, invalid characters, syntax parsing, malformed responses, ...
Read more >Namerd got high CPU usage and lots of IO for writing logs
Hi team and @siggy Recently i got an weird issue that as title, Namerd continuously wrote logs, 10M about a min, after restarting...
Read more >CHANGELOG.md ... - GitLab
Mirror of https://github.com/hashicorp/consul.git. ... Lock and Semaphore would return earlier than their requested timeout when unable to acquire the lock.
Read more >Troubleshooting installation and upgrade - IBM
Storage requests and resources fluctuate over time. ... Running oc describe pod ibm-vault-deploy-consul-0 shows an out of memory (OOM) error, similar to the ......
Read more >HTTP - Developers - Dropbox.com
It can be stored and re-used multiple times. id_token String If the request includes OIDC scopes and is completed in the response_type=code flow,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@hawkw OK, success:
The malformed Dtab was:
And the logs show:
And the UI request failed (instead of hung waiting forever):
Okay, I’ve figured out what’s going wrong here. Should have a fix ready soon! 😃