Disaster Recovery in BAF
See original GitHub issueDescribe the bug As a part of DR testing, I was trying to recover the BAF deployments but could not recover BAF(fabric) from a kubernetes level failure.
To Reproduce Steps to reproduce the behavior:
- Install fabric network using BAF with one orderer and two organizations(1 peer)
- After successful deployment of BAF, change the kubernetes configuration for all the organizations in the network.yaml to another working kubernetes cluster.
- Run the deploy-network.yaml
- deploy-network.yaml fails with the below error
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Could not find or access './build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem'\nSearched in:\n\t/home/blockchain-automation-framework/platforms/hyperledger-fabric/configuration/roles/create/crypto/peer/files/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem\n\t/home/blockchain-automation-framework/platforms/hyperledger-fabric/configuration/roles/create/crypto/peer/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem\n\t/home/blockchain-automation-framework/platforms/hyperledger-fabric/configuration/roles/create/crypto/peer/tasks/files/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem\n\t/home/blockchain-automation-framework/platforms/hyperledger-fabric/configuration/roles/create/crypto/peer/tasks/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem\n\t/home/blockchain-automation-framework/platforms/shared/configuration/../../hyperledger-fabric/configuration/files/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem\n\t/home/blockchain-automation-framework/platforms/shared/configuration/../../hyperledger-fabric/configuration/./build/crypto-config/peerOrganizations/org1-net/peers/peer0.org1-net/msp/cacerts/ca-org1-net-7054.pem on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
PLAY RECAP *******************************************************************************************************************************************************************************************************************
localhost : ok=309 changed=99 unreachable=0 failed=1 skipped=435 rescued=0 ignored=0
Expected behavior The deployment of the network should go through.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- OS: From docker container
- Version [e.g. 22]
- Cloud environment: AKS
- K8S Version: 1.17.11
Additional context I tried the above test on the same kubernetes cluster the network was deployed on by deleting namespace of an organization (ca, ca-tools, peer, pvc, services…etc are deleted). When I run the deploy-network.yaml after deleting the namespace, I get the beow error
Getting secrets from Vault Server: http://vault-test.eastus.azurecontainer.io:8200
{ "errors": [ "permission denied" ] }
ERROR: unable to retrieve vault login token: {
"errors": [
"permission denied"
]
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Brodie Assistance Fund 2021 Guidelines - Nlets
The Brodie Assistance Fund (BAF) is a fund available to Nlets Representatives and ... Complete the BAF Application ... Disaster Recovery Efforts with...
Read more >Business continuity and disaster recovery - Microsoft Learn
Effective business continuity and disaster recovery (BCDR) design provides platform-level capabilities that meet these requirements.
Read more >How to Implement A Disaster Recovery Plan to Protect Your ...
Steps to implement a disaster recovery plan · 1. Establish a response team · 2. Define the level of severity · 3. Deploy...
Read more >Bill Anderson Fund on Twitter: "The BAF is dedicated to ...
The BAF is dedicated to improving disaster preparedness, response, and recovery in marginalised communities by connecting students and early ...
Read more >Disaster Recovery Setup - Chef Software
Steps to setup the Production and Disaster Recovery Cluster · Deploy the Primary cluster following the deployment instructions by clicking here.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Awesome. On a related note, I was also testing BAF with velero backup and restore. I hit a bug with velero restore which got fixed recently - https://github.com/vmware-tanzu/velero/issues/3027. I would test velero approach as well, and will document it.
Hi @sivaramsk, my scenario is not exactly the same. The scenario I have researched is a complete shutdown of the cluster (scaling deployments to 0) and then the same cluster restarting - without re-running the
network.yaml
. I think your suggestion of pointing to a new cluster for 1 or more organizations is also very valid. I’ll discuss this with the team.