index.json in S3 grows indefinitely and causes errors
See original GitHub issueDescription
Publishing a package that is 3Mb will be saved to s3 on its own but also to the index.json:
(src/put/publish.js)
json['dist-tags'][tag] = version;
json._attachments[`${name}-${version}.tgz`] = pkg._attachments[`${name}-${version}.tgz`];
json.versions[version] = versionData;
...
await storage.put(
`${name}/${version}.tgz`,
json._attachments[`${name}-${version}.tgz`].data, // eslint-disable-line no-underscore-dangle
'base64',
);
await storage.put(
`${name}/index.json`,
JSON.stringify(json),
);
If you publish 100 times, the index.json will be roughly 300Mb, which will fail or be grossly inefficient:
const pkgBuffer = await storage.get(`${name}/index.json`);
json = JSON.parse(pkgBuffer.toString());
Would the solution be to clear out json._attachements
prior to saving to index.json?
What recommendations do you have since I have already run into the issue, can I simply delete the entirety of the bucket in s3 if I don’t care about past releases?
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Resolve JSON errors in Amazon Athena - AWS
Run a command similar to the following: CREATE EXTERNAL TABLE IF NOT EXISTS json_validator (jsonrow string) ROW FORMAT DELIMITED FIELDS ...
Read more >How to index JSON files stored in S3 by keys? - Stack Overflow
Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. Full docs on S3 Select. Here's a nice blog...
Read more >Troubleshoot Dataflow errors | Google Cloud
When running in streaming mode, a bundle including a failing item is retried indefinitely, which might cause your pipeline to permanently stall.
Read more >awswrangler.s3.to_json adds __index_level_0__ to table ...
When using to.json to write json files s3 and create a glue table at the same time using "orient='records' and lines=True, there appears...
Read more >Changelog - Cypress Documentation
testIsolation=false caused invalid configuration validation when running cypress ... An error will be thrown if both a cypress.json file and cypress.config.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So this existed as I had to piece together how npm works. Turns out they used couchdb so this was why I copied over that functionallity.
They hit the same perf issues and look to have taken the attachments out of the package.json themselves.
Merged your proposed fix @ganapativs and tagged a release, give it a go and let me know how you get on.
Sorry for the delay on this been a crazy past couple of months.
@jonsharratt Im guessing the get request removed attachments, but does the PUT request attempt to return the patched index.json as a response - and does not remove attachments?