question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exposing all of npm through dat to dep

See original GitHub issue

Hi @watilde. Dep made me do a new version of an old project I never finished. I have pushed new commits to https://github.com/mafintosh/dat-npm and am running it on a server.

The goal is to have a single Dat with all of NPM in it. There are two steps. Step 1 is to collect all of the NPM registry metadata into a single hyperdb (hypercore). We expose it as a hyperdb like this: https://github.com/mafintosh/dat-npm/blob/master/get.js (note this is broken because of a hyperdb bug but will be fixed soon). The data returned looks like this:

{
  "0.0.1": {
    "dependencies": {
      "xml2js-expat": "0.2.0"
    },
    "devDependencies": {}
  },
  "0.1.0": {
    "dependencies": {
      "xml2js-expat": "0.2.0"
    },
    "devDependencies": {}
  },
  "0.2.2": {
    "dependencies": {
      "xml2js-expat": "0.2.0"
    },
    "optionalDependencies": {},
    "devDependencies": {
      "mocha": "0.x.x"
    }
  },
  "0.2.3": {
    "dependencies": {
      "xml2js-expat": "0.2.x"
    },
    "devDependencies": {
      "mocha": "0.x.x"
    }
  }
}

Step 2 is to distribute the NPM tarballs over Dat. Because the tarballs are very large, we do not want to download them all up front. We want to download them on-demand. Dat does not have a mechanism yet for requesting files on-demand, so I was thinking we could have a simple REST API where you could do GET https://npm.datproject.org/request@1.2.1 and the response would return 200 OK when the request@1.2.1 tarball has been added to the npm-dat. Then the client program could request the tarball from the Dat repository and it would be there.

To integrate with dep I have some questions:

  • How would a user tell dep to use Dat for modules instead of npm? Maybe --dat flag in the CLI and it would ue Dat for everything? Or a dat: true setting in a config somewhere? A user could also specify it one at a time for each dependency, but I think it would be cool to tell dep to use Dat for all modules also.
  • Is the version metadata above enough for dep to do recursive module resolution, or do you need other info from package.json?
  • In the dep code, how much work would it be to integrate the Step 1 hyperdb as a registry for metadata?
  • In the dep code, how much work would it be to integrate the Step 2 Dat as the source of tarballs?

For the dep integration I imagine an algorithm like this:

  • User types dep install --dat
  • dep reads package.json, does hyperdb.get('/modules/<pkg>', cb) for each one
  • dep recursively resolves metadata using hyperdb
  • Each time dep needs a tarball, it would GET https://npm.datproject.org/module@version
  • When response comes back 200 OK dep would then read the tarball from the Dat

In the future we can replace the npm.datproject.org server with a pure-Dat solution, but for now the REST API is a way we can deploy this sooner.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:9
  • Comments:19 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
maxogdencommented, Oct 28, 2017

OK it’s all imported and live updating from the changes feed:

max@useast:/mnt/bigdisk/dat-npm$ tail -f server.log
[dat-npm] GET https://registry.npmjs.org/chrome-devtools-frontend/-/chrome-devtools-frontend-1.0.512372.tgz
[dat-npm] wrote /modules/chrome-devtools-frontend, seq=3028012
[dat-npm] GET https://registry.npmjs.org/ideohint/-/ideohint-1.7.1.tgz
[dat-npm] wrote /modules/ideohint, seq=3028013
[dat-npm] GET https://registry.npmjs.org/build-react-svg/-/build-react-svg-0.0.2.tgz
[dat-npm] wrote /modules/build-react-svg, seq=3028014
[dat-npm] GET https://registry.npmjs.org/staff-queue-check-dialog-vkm/-/staff-queue-check-dialog-vkm-1.0.13.tgz
[dat-npm] wrote /modules/staff-queue-check-dialog-vkm, seq=3028015
[dat-npm] GET https://registry.npmjs.org/glkit/-/glkit-0.0.1087.tgz
[dat-npm] wrote /modules/glkit, seq=3028016
[dat-npm] GET https://registry.npmjs.org/glkit/-/glkit-0.0.1088.tgz
[dat-npm] wrote /modules/glkit, seq=3028017
[dat-npm] GET https://registry.npmjs.org/zeromq-broker/-/zeromq-broker-0.0.20.tgz
[dat-npm] wrote /modules/zeromq-broker, seq=3028018
^A^C
max@useast:/mnt/bigdisk/dat-npm$ ls
get.js  index.js  node_modules  npm-meta.db  npm-tarballs.db  package.json  README.md  server.js  server.log  test.js
max@useast:/mnt/bigdisk/dat-npm$ du -sh npm-tarballs.db/
2.1T	npm-tarballs.db/
max@useast:/mnt/bigdisk/dat-npm$ du -sh npm-meta.db/
9.6G	npm-meta.db/

2reactions
maxogdencommented, Sep 7, 2017

@watilde I accidentally crashed NPM yesterday (sorry NPM) 😂

screen shot 2017-09-06 at 6 09 27 pm

I changed my approach. Now I am downloading all tarballs as I process the NPM changes feed. This is the way to integrate it now: https://github.com/mafintosh/dat-npm/blob/master/get.js

The changes feed processor will take a while to finish, so not all modules will show up in the query. For each change, it downloads all tarballs and writes them to the tarballs hyperdrive, then writes the metadata to the metadata hyperdb. So if the metadata record is there, then all the tarballs will be ready. The example get.js uses the ‘pushpop’ module which has already been written. Here’s what the server logs look like:

[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.1.2.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.0.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.1.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.2.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.3.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.4.tgz
[dat-npm] GET http://registry.npmjs.org/strider-deconst-content/-/strider-deconst-content-1.2.5.tgz
[dat-npm] wrote /modules/strider-deconst-content, seq=286302
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.0.0.tgz
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.0.1.tgz
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.0.2.tgz
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.0.3.tgz
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.0.4.tgz
[dat-npm] GET http://registry.npmjs.org/babel-preset-es2015-riot/-/babel-preset-es2015-riot-1.1.0.tgz
[dat-npm] wrote /modules/babel-preset-es2015-riot, seq=286303
[dat-npm] GET http://registry.npmjs.org/exalt/-/exalt-1.0.0.tgz
[dat-npm] wrote /modules/exalt, seq=286304
[dat-npm] GET http://registry.npmjs.org/gridsystem/-/gridsystem-0.1.0.tgz
[dat-npm] GET http://registry.npmjs.org/gridsystem/-/gridsystem-0.2.0.tgz
[dat-npm] wrote /modules/gridsystem, seq=286305
[dat-npm] GET http://registry.npmjs.org/exalted/-/exalted-1.0.0.tgz
[dat-npm] wrote /modules/exalted, seq=286306

I’ll report back when the import is finished. @watilde is this enough for you to integrate into dep?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Controlling the Node.js Security Risk of NPM Dependencies
Using npm packages inevitably exposes you to certain security risks. Follow these points to reduce your security exposure substantially.
Read more >
How to list all the Node.js modules I have linked with npm
To list all globally linked modules, this works (documentation https://docs.npmjs.com/cli/ls): npm ls -g --depth=0 --link=true.
Read more >
Understanding dependency management with Node Modules
Packages can be installed in projects on your local machine using a command line package manager such as npm or yarn and are...
Read more >
depcheck
depcheck. Depcheck is a tool for analyzing the dependencies in a project to see: how each dependency is used, which dependencies are useless ......
Read more >
[BUG] npm install will randomly hang forever and cannot ...
There are handles held by the Node process npm is using, and one for the ... After trying all day to reinstall and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found