Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get list of modified files between two commits

See original GitHub issue

Given two commits x and y (identified by their hash), it is possible to list all files which have been modified in y relative to x using the following command:

$ git diff --stat x y

I am currently working on a project which already uses isomorphic-git and would greatly benefit from this feature (I am trying to obtain the set of files which was changed from one commit to the next).

I took a look at walkBeta1 and statusMatrix but they do both not seem to be able to satisfy my needs. Issue #251 is similar, but seems to be only related to unstaged changes.

Is there a way of obtaining this information without e.g. iterating over the output of listFiles and comparing file (MD5)hashes?

Issue Analytics

State:
Created 4 years ago
Reactions:6
Comments:20 (8 by maintainers)

Top GitHub Comments

8reactions

wmhiltoncommented, May 19, 2019

Weeellllll… yeah so technically you can do it with walkBeta1. (I think you can do anything with walkBeta1 with enough work.) Here you are!

This example compares the current commit with the previous commit, but hopefully it’s clear how to repurpose it to compare any two commits.

#!/usr/bin/env node

// @ts-check
const fs = require('fs')
const { log, plugins, walkBeta1, TREE } = require('isomorphic-git')
plugins.set('fs', fs)

async function main () {

  // Use git log to get the SHA-1 object ids of the previous two commits
  const commits = await log({ dir: process.cwd(), depth: 2 })
  const oids = commits.map(commit => commit.oid)

  // Make TREE objects for the first and last commits
  const A = TREE({ fs, gitdir: `${process.cwd()}/.git`, ref: oids[0] })
  const B = TREE({ fs, gitdir: `${process.cwd()}/.git`, ref: oids[oids.length - 1] })

  // Get a list of the files that changed
  let results = await walkBeta1({
    trees: [A, B],
    map: async function ([A, B]) {

      // Ignore directories
      if (A.fullpath === '.') return
      await A.populateStat();
      if (A.type === 'tree') return
      await B.populateStat();
      if (B.type === 'tree') return

      // Figure out the SHA-1 object ids.
      await A.populateHash();
      await B.populateHash();

      // Skip pairs where the oids are the same
      if (A.oid === B.oid) return

      // Otherwise return the oids
      return {
        fullpath: A.fullpath,
        A: A.oid,
        B: B.oid
      }

    }
  })
  console.log(results)
}

main()

Let me know how that works. Maybe we can use it as the starting point for writing a full-fledged git diff command.

3reactions

KrishnaPGcommented, Jun 1, 2019

Thank you @TomasHubelbauer and @kpj .

I do not know the semantics of these functions (I am still familiarizing with this package), but based on the above example code, this is what the intention seems to be:

Promise.all(
[
    A.populateStat().then(() => { if (A.type !== 'tree') A.populateHash() } ), 
    B.populateStat().then(() => { if (B.type !== 'tree') B.populateHash() } )
])

The bench-marking certainly helps. Especially when the git tree is large, one should be able to clearly see the difference.

The point of @TomasHubelbauer is valid. This would not make the underlying code execute faster. It would only remove the unnecessary bottlenecks in the code, which returns the results sooner, hence the appearance of running faster. (Now, why this works is, while the browser and NodeJS user code are run on a single thread, the underlying IndexedDB, HDD FS drivers etc. all code that is written in C/C++ is, majorly multi-threaded, capable of serving multiple read requests in parallel (proportional to the number of processor cores usually. Writes is a different story.) Not to mention the in-memory buffer-queues internally used for serving the frequent read accesses. So our parallel Promise code would only be taking advantage of what is already there below the hood, hence more smoother experience for the user / UI.)

Because of this, any remaining bottleneck in the performance is more likely to come from the CPU / computations, rather than the IO (DB/FS). In this case, it would be things like calculating the hash, encrypt/decrypt etc., which, no matter what we do, endup running synchronously on the single thread, unless the underlying calculation code is explicitly modelled to take advantage of multiple-threads and cores (e.g. service workers etc.), which is usually hard to do in JS.

Also, given the conditional check there in the example code if (A.oid === B.oid) return it would be worth to return some value from the promise (where the A.populateHash() is called) and compare the results (to ensure correct result in all if else cases).

The async package has more rigorous and structured methods (such as waterfall, filter, series, parallel etc.) to compose these asynchronous flows into complex hierarchies.

Few additional references on these lines for anyone interested:

IndexedDB best practices - Keeping your app performant
Breaking the Borders of IndexedDB (https://hacks.mozilla.org/2014/06/breaking-the-borders-of-indexeddb/)
Also, Failing early is an important aspect that is relevant here. Here is some nice discussion on stack-overflow: Fail early.