cdk version 2.33 onwards is getting stuck
See original GitHub issueDescribe the bug
I am trying to deploy an S3 bucket using 2.32.1 and it’s working just fine. My cdk is run from Jenkins and is written in Typescript(node v16) running inside a docker container
Jenkins is running cdk cli version 2.44.0. When I upgrade the package in the package.json to 2.33.0 onwards, the same deployment command is getting stuck and the pipeline is staying hang.
Am I missing something? Are there any breaking changes in 2.33.0? from the release notes I couldn’t find any useful information.
Thanks, Gal
Expected Behavior
Using cdk packages(aws-cdk in DevDependencies and aws-cdk-lib in dependencies) will work so I will be able to deploy the S3 bucket with the latest versions.
Current Behavior
When I am using cdk packages in version 2.32.1 it works just fine. I am able to deploy the S3 bucket. After upgrading to version 2.33.0 or any later version, the cdk synth/diff/deploy is getting hang…
Reproduction Steps
The Jenkins pipeline is running inside docker containers. On the Jenkins agent, docker server is installed. The first container in the pipeline is based on python 3.8. Inside it, another docker container of nodejs v16(alpine dist) is running with cdk-cli version 2.44.0 installed.
This is the package.json:
{
"name": "general",
"version": "0.1.0",
"bin": {
"general": "bin/general.js"
},
"scripts": {
"build": "tsc",
"watch": "tsc -w",
"test": "jest",
"cdk": "cdk"
},
"devDependencies": {
"@types/jest": "^27.5.2",
"@types/node": "^10.17.27",
"@types/prettier": "2.6.0",
"aws-cdk": "2.32.1",
"jest": "^27.5.1",
"ts-jest": "^27.1.4",
"ts-node": "^10.9.1",
"typescript": "~3.9.7"
},
"dependencies": {
"aws-cdk-lib": "2.32.1",
"constructs": "^10.0.0",
"@aws-cdk/aws-glue-alpha": "^2.32.1-alpha.0",
"source-map-support": "^0.5.21"
}
}
```{
"name": "general",
"version": "0.1.0",
"bin": {
"general": "bin/general.js"
},
"scripts": {
"build": "tsc",
"watch": "tsc -w",
"test": "jest",
"cdk": "cdk"
},
"devDependencies": {
"[@types/jest](https://npmjs.com/package/@types/jest)": "[^27.5.2](https://npmjs.com/package/@types/jest)",
"[@types/node](https://npmjs.com/package/@types/node)": "[^10.17.27](https://npmjs.com/package/@types/node)",
"[@types/prettier](https://npmjs.com/package/@types/prettier)": "[2.6.0](https://npmjs.com/package/@types/prettier)",
"[aws-cdk](https://npmjs.com/package/aws-cdk)": "[2.32.1](https://npmjs.com/package/aws-cdk)",
"[jest](https://npmjs.com/package/jest)": "[^27.5.1](https://npmjs.com/package/jest)",
"[ts-jest](https://npmjs.com/package/ts-jest)": "[^27.1.4](https://npmjs.com/package/ts-jest)",
"[ts-node](https://npmjs.com/package/ts-node)": "[^10.9.1](https://npmjs.com/package/ts-node)",
"[typescript](https://npmjs.com/package/typescript)": "[~3.9.7](https://npmjs.com/package/typescript)"
},
"dependencies": {
"[aws-cdk-lib](https://npmjs.com/package/aws-cdk-lib)": "[2.32.1](https://npmjs.com/package/aws-cdk-lib)",
"[constructs](https://npmjs.com/package/constructs)": "[^10.0.0](https://npmjs.com/package/constructs)",
"[@aws-cdk/aws-glue-alpha](https://npmjs.com/package/@aws-cdk/aws-glue-alpha)": "[^2.32.1-alpha.0](https://npmjs.com/package/@aws-cdk/aws-glue-alpha)",
"[source-map-support](https://npmjs.com/package/source-map-support)": "[^0.5.21](https://npmjs.com/package/source-map-support)"
}
}
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.44.0
Framework Version
No response
Node.js Version
16
OS
Ubuntu 18/20
Language
Typescript
Language Version
No response
Other information
No response
Issue Analytics
- State:
- Created 10 months ago
- Reactions:3
- Comments:15 (7 by maintainers)
The CDK behavior is as follows:
autoDeleteObjects
creates a Custom Resource that will clear the bucket on stack deletion.cdk.out
directory as part of asset staging. This is the same for all assets. The directory these files are copied into depends on the hash of all source files going into it, so the source bundle needs to be complete before this step can start.The change was:
node_modules
directory. This was actually incorrect, as thenode_modules
directory should be considered a read-only repository of library code. So we changed the code generation to be moved to the system’s temporary directory.overlayfs
file system.$TMP
dir back to a location inside a Docker volume mount)The problem was:
0
bytes.copyFile
function keeps on retrying the call to copy more and more bytes over, getting0
every time, and waiting until the copy is complete. This never finishes, and so the build appears to hang.0
, allowing the copy to succeed.Full props to @nburtsev for figuring this out. I’m not sure I myself would have been able to put all of this together.
In summary:
The CDK does not directly communicate with the kernel–we just perform filesystem copies. Bugs in the interaction of other pieces of software cause the file copy to loop endlessly if the right combination of circumstances is hit.
Hi @rix0rrr
Thanks for your help with this issue! It was very helpful after we spent long days or even weeks on this issue.
Can you please give us a high level description about the communication between cdk and the linux kernel? what was changed in cdk and how is it related to the kernel version?
In addition, I think it’s very important to add validation and make sure that all the system requirements are met when I install my cdk project’s dependencies(via pip, npm or other tools) and throw a clear exception as much as possible so at least we will have a clue next time.