(rds): AWS::RDS::DBInstance should not have EngineVersion property set for Aurora clusters
See original GitHub issueDescribe the bug
Currently, when upgrading RDS Aurora between versions, a runtime error will occur that leaves the CloudFormation stack in an unrecoverable UPDATE_ROLLBACK_FAILED
state. This is because CDK is setting the EngineVersion
property on AWS::RDS::DBInstance
, which the Cfn documentation states that you should not set when using an Aurora cluster:
Amazon Aurora
Not applicable. The version number of the database engine to be used by the DB instance is managed by the DB cluster.
When upgrading between versions that require downtime for upgrade, it causes this error:
The stack named BlimmerTestAuroraUpgradeStack failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [DatabaseCluster68FC2945, DatabaseClusterInstance1C566869D]. ): The specified DB Instance is a member of a cluster. Modify the DB engine version for the DB Cluster using the ModifyDbCluster API (Service: Rds, Status Code: 400, Request ID: 9998e162-bb47-4ff0-a6ad-91665e964ff2), DB cluster isn’t available for modification with status upgrading. (Service: Rds, Status Code: 400, Request ID: 5b967f09-58a2-42dc-aa20-3a8bffbf705a)
Here’s a test stack with the event log:
The worst part about this bug is that the Database stack is left in the unrecoverable UPDATE_ROLLBACK_FAILED
state, which means that you either have to:
a) attempt to complete the rollback. this is impossible because the cluster actually upgrades even though the CloudFormation steps fail. A rollback between the new target major version and the old version is not allowed by RDS.
b) delete the stack. this obviously is not ideal because databases should not be deleted in most cases. worse, you can’t update the RetentionPolicy
to try to retain the database while deleting the stack.
Expected Behavior
I expected to be able to update a DatabaseCluster between major supported versions via the engine
property on DatabaseCluster.
Current Behavior
As mentioned above, if you try to upgrade an Aurora cluster between major versions, you’ll encounter this error, which leaves the stack in an unrecoverable state.
The stack named BlimmerTestAuroraUpgradeStack failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [DatabaseCluster68FC2945, DatabaseClusterInstance1C566869D]. ): The specified DB Instance is a member of a cluster. Modify the DB engine version for the DB Cluster using the ModifyDbCluster API (Service: Rds, Status Code: 400, Request ID: 9998e162-bb47-4ff0-a6ad-91665e964ff2), DB cluster isn’t available for modification with status upgrading. (Service: Rds, Status Code: 400, Request ID: 5b967f09-58a2-42dc-aa20-3a8bffbf705a)
Reproduction Steps
- Create a new stack with an older major version of Aurora Postgres:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { DatabaseCluster, DatabaseClusterEngine, AuroraPostgresEngineVersion } from 'aws-cdk-lib/aws-rds';
import { Vpc } from 'aws-cdk-lib/aws-ec2';
export class CdkBugReportsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new DatabaseCluster(this, 'DatabaseCluster', {
engine: DatabaseClusterEngine.auroraPostgres({
version: AuroraPostgresEngineVersion.VER_10_18,
}),
instanceProps: {
vpc: new Vpc(this, 'Vpc')
}
})
}
}
cdk deploy
the stack above- Update to a newer major version of Aurora Postgres. At the time of writing, 10.18 -> 13.4 is a valid upgrade target:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { DatabaseCluster, DatabaseClusterEngine, AuroraPostgresEngineVersion } from 'aws-cdk-lib/aws-rds';
import { Vpc } from 'aws-cdk-lib/aws-ec2';
export class CdkBugReportsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new DatabaseCluster(this, 'DatabaseCluster', {
engine: DatabaseClusterEngine.auroraPostgres({
version: AuroraPostgresEngineVersion.VER_13_4, // Changed
}),
instanceProps: {
vpc: new Vpc(this, 'Vpc')
}
})
}
}
cdk deploy
Observe: the stack update will fail with the aforementioned error, leaving the Database stack in an unrecoverable state.
Possible Solution
Short Term Workaround
As a short-term solution, users can reach into the L1 construct to remove the EngineVersion
property like this:
const cfnInstances = cluster.node.children.filter((child) => child instanceof CfnDBInstance);
if (cfnInstances.length === 0) {
throw new Error("Couldn't pull CfnDBInstances from the L1 constructs!");
}
cfnInstances.forEach((cfnInstance) => delete (cfnInstance as CfnDBInstance).engineVersion);
I’ve tested removing the EngineVersion
property from an existing CloudFormation Stack/DbInstance. It shows the following diff:
Stack BlimmerTestAuroraUpgradeStack
Resources
[~] AWS::RDS::DBInstance DatabaseCluster/Instance1 DatabaseClusterInstance1C566869D
└─ [-] EngineVersion
└─ 10.18
And the change applied with (what appears to be) no changes to the actual instances:
BlimmerTestAuroraUpgradeStack: creating CloudFormation changeset...
BlimmerTestAuroraUpgradeStack | 0/3 | 11:31:22 AM | UPDATE_IN_PROGRESS | AWS::CloudFormation::Stack | BlimmerTestAuroraUpgradeStack User Initiated
BlimmerTestAuroraUpgradeStack | 0/3 | 11:31:29 AM | UPDATE_IN_PROGRESS | AWS::RDS::DBInstance | DatabaseCluster/Instance1 (DatabaseClusterInstance1C566869D)
BlimmerTestAuroraUpgradeStack | 1/3 | 11:31:32 AM | UPDATE_COMPLETE | AWS::RDS::DBInstance | DatabaseCluster/Instance1 (DatabaseClusterInstance1C566869D)
BlimmerTestAuroraUpgradeStack | 2/3 | 11:31:33 AM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack | BlimmerTestAuroraUpgradeStack
BlimmerTestAuroraUpgradeStack | 3/3 | 11:31:34 AM | UPDATE_COMPLETE | AWS::CloudFormation::Stack | BlimmerTestAuroraUpgradeStack
I verified in the RDS console that the instance didn’t shut down or have any other events during this cdk deploy
.
So, I believe it should be safe for all Aurora DatabaseCluster users to use this workaround without downtime. However, I’ve only tested this on my test cluster, so your mileage may vary.
Note that this should only be done for Aurora clusters, per the AWS::RDS::DBInstance
CFN documentation
Longer Term Fix
CDK should detect when the engine
passed is an Aurora RDS engine. In this case, it should not set the AWS::RDS::DBInstance
EngineVersion
property
Additional Information/Context
I confirmed this issue by testing in my AWS accounts. I also have an internal ticket (case #10630169951) opened with AWS Support about this issue. In addition to a CDK fix, it seems this could be handled more elegantly on the CloudFormation side.
CDK CLI Version
2.38.1 (build a5ced21)
Framework Version
No response
Node.js Version
16 LTS
OS
MacOS
Language
Typescript
Language Version
No response
Other information
Because of the unrecoverable state in which the DatabaseCluster stack is left, I’d highly recommend this being a P1 bug.
Issue Analytics
- State:
- Created a year ago
- Reactions:5
- Comments:8 (5 by maintainers)
Thanks all for raising this issue. We are hitting this in production as well.
It seems like this issue impacts a significant number of customers, and I’ve tagged it as P1, which means it should be on our near-term roadmap.
We welcome community contributions! If you are able, we encourage you to contribute (https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md) a bug fix or new feature to the CDK. If you decide to contribute, please start an engineering discussion in this issue to ensure there is a commonly understood design before submitting code. This will minimize the number of review cycles and get your code merged faster.