question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get list of modified files between two commits

See original GitHub issue

Given two commits x and y (identified by their hash), it is possible to list all files which have been modified in y relative to x using the following command:

$ git diff --stat x y

I am currently working on a project which already uses isomorphic-git and would greatly benefit from this feature (I am trying to obtain the set of files which was changed from one commit to the next).

I took a look at walkBeta1 and statusMatrix but they do both not seem to be able to satisfy my needs. Issue #251 is similar, but seems to be only related to unstaged changes.

Is there a way of obtaining this information without e.g. iterating over the output of listFiles and comparing file (MD5)hashes?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:6
  • Comments:20 (8 by maintainers)

github_iconTop GitHub Comments

8reactions
wmhiltoncommented, May 19, 2019

Weeellllll… yeah so technically you can do it with walkBeta1. (I think you can do anything with walkBeta1 with enough work.) Here you are!

This example compares the current commit with the previous commit, but hopefully it’s clear how to repurpose it to compare any two commits.

#!/usr/bin/env node

// @ts-check
const fs = require('fs')
const { log, plugins, walkBeta1, TREE } = require('isomorphic-git')
plugins.set('fs', fs)

async function main () {

  // Use git log to get the SHA-1 object ids of the previous two commits
  const commits = await log({ dir: process.cwd(), depth: 2 })
  const oids = commits.map(commit => commit.oid)

  // Make TREE objects for the first and last commits
  const A = TREE({ fs, gitdir: `${process.cwd()}/.git`, ref: oids[0] })
  const B = TREE({ fs, gitdir: `${process.cwd()}/.git`, ref: oids[oids.length - 1] })

  // Get a list of the files that changed
  let results = await walkBeta1({
    trees: [A, B],
    map: async function ([A, B]) {

      // Ignore directories
      if (A.fullpath === '.') return
      await A.populateStat();
      if (A.type === 'tree') return
      await B.populateStat();
      if (B.type === 'tree') return

      // Figure out the SHA-1 object ids.
      await A.populateHash();
      await B.populateHash();

      // Skip pairs where the oids are the same
      if (A.oid === B.oid) return

      // Otherwise return the oids
      return {
        fullpath: A.fullpath,
        A: A.oid,
        B: B.oid
      }

    }
  })
  console.log(results)
}

main()

Let me know how that works. Maybe we can use it as the starting point for writing a full-fledged git diff command.

3reactions
KrishnaPGcommented, Jun 1, 2019

Thank you @TomasHubelbauer and @kpj .

I do not know the semantics of these functions (I am still familiarizing with this package), but based on the above example code, this is what the intention seems to be:

Promise.all(
[
    A.populateStat().then(() => { if (A.type !== 'tree') A.populateHash() } ), 
    B.populateStat().then(() => { if (B.type !== 'tree') B.populateHash() } )
])

The bench-marking certainly helps. Especially when the git tree is large, one should be able to clearly see the difference.

The point of @TomasHubelbauer is valid. This would not make the underlying code execute faster. It would only remove the unnecessary bottlenecks in the code, which returns the results sooner, hence the appearance of running faster. (Now, why this works is, while the browser and NodeJS user code are run on a single thread, the underlying IndexedDB, HDD FS drivers etc. all code that is written in C/C++ is, majorly multi-threaded, capable of serving multiple read requests in parallel (proportional to the number of processor cores usually. Writes is a different story.) Not to mention the in-memory buffer-queues internally used for serving the frequent read accesses. So our parallel Promise code would only be taking advantage of what is already there below the hood, hence more smoother experience for the user / UI.)

Because of this, any remaining bottleneck in the performance is more likely to come from the CPU / computations, rather than the IO (DB/FS). In this case, it would be things like calculating the hash, encrypt/decrypt etc., which, no matter what we do, endup running synchronously on the single thread, unless the underlying calculation code is explicitly modelled to take advantage of multiple-threads and cores (e.g. service workers etc.), which is usually hard to do in JS.

Also, given the conditional check there in the example code if (A.oid === B.oid) return it would be worth to return some value from the promise (where the A.populateHash() is called) and compare the results (to ensure correct result in all if else cases).

The async package has more rigorous and structured methods (such as waterfall, filter, series, parallel etc.) to compose these asynchronous flows into complex hierarchies.

Few additional references on these lines for anyone interested:

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to list only the names of files that changed between two ...
git diff --name-only SHA1 SHA2. where you only need to include enough of the SHA hash to identify the commits. You can also...
Read more >
Find all files modified between commits in Git - Coderwall
Find all files modified between commits in Git ... There are many occasions where you may need to get a list of files...
Read more >
Getting a list of the changed files | Git Version Control Cookbook
The following command lists all the files changed since the last release ... the paths of the files as output changed by the...
Read more >
Step by step: Detecting files added between two commits in git
Showing names and status of files changed between two commits. 1, git diff --name-status HEAD HEAD~1 ...
Read more >
3 Best Ways to List all the Changed Files After Git Commit
One of the very popular method to check all the Commit IDs and the changes done through this commit ID is by using...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found