[BUG] - Existing certificate secret name is not being picked up.
See original GitHub issueDescribe the bug
When using a existing certificate type existing and providing secret name in the nebari config file, the secret name isn’t being passed into the manifested terraform files.
Expected behavior
We should be to utilize an existing secret for our tls certificate.
OS and architecture in which you are running Nebari
AWS, EKS
How to Reproduce the problem?
When adding this code block in our nebari config file and then running this command nebari deploy -c nebari-config.yaml --dns-provider cloudflare --dns-auto-provision
certificate:
type: existing
secret_name: my-tls-certificate-secret
Command output
[terraform]: # module.kubernetes-ingress.kubernetes_manifest.tlsstore_default[0] will be updated in-place
[terraform]: ~ resource "kubernetes_manifest" "tlsstore_default" {
[terraform]: ~ object = {
[terraform]: ~ spec = {
[terraform]: ~ defaultCertificate = {
[terraform]: ~ secretName = "my-tls-certificate-secret" -> ""
[terraform]: }
[terraform]: }
[terraform]: # (3 unchanged elements hidden)
[terraform]: }
[terraform]: # (1 unchanged attribute hidden)
[terraform]: }
Versions and dependencies used.
No response
Compute environment
AWS
Integrations
No response
Anything else?
Researching past changes, appears issue might originated from this change https://github.com/nebari-dev/nebari/pull/1421/files#diff-a77d45450ea0a454e8b346914975631f93dc83a04d3886ec9e99977703a93164L191.
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
TLS secret is not being created when using ClusterIssuer and ...
Cert -manager will create the secret once it's able to update the cert from Let's Encrypt.
Read more >what does "Issuing certificate as Secret does not exist" error ...
I used kubectl get and kubectl describe to understand the status of these resources.
Read more >Common SSL Certificate Errors and How to Fix Them
The Common Name You Have Entered Does Not Match the Base Option. This error appears when you are ordering a Wildcard SSL Certificate...
Read more >Troubleshooting - cert-manager Documentation
In this section, you will learn troubleshooting techniques that will help you find the root cause if your Certificate fails to be issued...
Read more >Troubleshooting Apache SSL Certificate Errors - DigiCert.com
This error can be caused by mod_ssl not being installed on a server. This module is required by Apache to create SSL connections....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The secret is there, everything works so long as the TLSStore resource is updated to look for it. It just looks like the
secret_name
parameter from nebari_config.yaml isn’t being passed through to Terraform so it’s always empty after an apply.That
secretName
is an empty string after Terraform runs, even though it is an input variable that based on everything I can find in the code and docs as well as known working behavior in prior versions of QHub would work as expected. It simply appears to be getting dropped somewhere in the middle at this point, my best guess is the change made here which was the last place I found in the history where that variable was explicitly populated. @abilal-mss and I observed the same behavior on a fresh nebari install yesterday so I don’t think it has anything necessarily to do with the environment that experienced all the volume issues, nor can I guess how those could be related but crazier things have happened.Also as a semi-related but way less important potential bug with that TLSStore, there’s a condition on its creation that suggests it’s not meant to be there at all when the
certificate-secret-name
variable is null. But that variable is being defaulted to an empty string further upstream so I’m not sure it ever is null, or at least I haven’t seen a case when that condition is evaluating to false in my testing. Not sure it matters but just fyi.Regarding the volume mounting issue, for the first day or so it was only the conda-store nfs volume failing to mount which was preventing Jupyter single user pods as well as dask workers for any existing user environments from spinning up. It wasn’t 100% failing at first, started off with intermittent failures then gradually increased in consistency throughout the day to the point that no new pods would schedule. Besides the obvious investigations of pod logs for everything that seemed relevant, checking the node available storage, confirming the conda nfs service was running and accessible, redeploying pretty much everything in the nebari namespace, wasn’t having much luck pinning down a cause.
When it reached the point that everything was failing I was mostly in fire-fighting mode and just had to get things back up, which is when I started terminating nodes to get fresh instances. Whether that fixed it or a reboot of the node would have sufficed I can’t say, but I’m not sure there’s really much difference at the end of the day.
Thanks for the additional context,
@sblair-metrostar
!@viniciusdc @eskild –
@sblair-metrostar
’s analysis of the issue and its origination sounds valid to me, what do you think?