question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider using a managed Git backend

See original GitHub issue

TL;DR

I’ve done some experiments with a managed Git backend (instead of libgit2).

In all scenarios, performance is better with the managed backend. In the best case, throughput increases more than tenfold. The prototype source code is available here: https://github.com/qmfrederik/quamotion.gitversioning.

There are other managed Git backends (GitNet, GitSharp, NGit) available.

Backend Repository Mean Error StdDev
managed Cuemon 6.585 ms 0.0977 ms 0.0914 ms
libgit2 Cuemon 12.946 ms 0.1425 ms 0.1333 ms
managed SuperSocket 7.583 ms 0.1067 ms 0.0998 ms
libgit2 SuperSocket 61.295 ms 0.8168 ms 0.7640 ms
managed local 1.938 ms 0.0161 ms 0.0151 ms
libgit2 local 24.805 ms 0.3896 ms 0.3253 ms
managed xunit 57.891 ms 0.6076 ms 0.5386 ms
libgit2 xunit 62.054 ms 0.9426 ms 0.7871 ms

Why

NerdBank.GitVersioning uses LibGit2Sharp as its back-end. It comes with a couple of drawbacks:

  • Performance - libgit2 is a general-purpose library and perhaps not geared towards read-only scenarios like nbgv, P/Invoke overhead,… .
  • Maintainability - it looks like development on LibGit2Sharp has stalled a bit (last commit is from April this year)
  • Portability - there’s a very long list of issues related to nbgv not on various Linux distributions

What

My goal was to implement a minimal viable Git backend which you can use to calculate the Git height. That’s all.

This includes

  • Read commits, trees and blobs from a local Git repository
  • Support for ‘packed’ Git repositories (i.e. what you get when you call git gc or after a fresh Git clone) and deltafied objects
  • In-memory caching

I’ve also applied some of the suggestions related to performance made by @filipnavara and @djluck, such as

  • Using tree IDs to check whether files have changed, instead of parsing the full version.json contents
  • Using the .NET Core JSON API instead of Newtonsoft.Json

I’ve not yet attempted / further exploration

  • Walking the Git commit graph (freshly cloned GitHub repositories do not appear to have a Git graph file)
  • The git tree and version.json objects are fully loaded into memory before parsing them; we can probably further improve performance by only reading the data we actually need.

Validation & lessons learned

  • It can be done I ran tests on three popular GitHub repositories which use nbgv, and Git height calculation seems to work.
  • Keep it simple Most of the GitHub repositories which use nbgv do so in a very simple way - a standard version.json file in the repository root, no path filters,… . nbgv has a lot of configuration knobs, which may impact performance.
  • Packed repositories have different performance characteristics than unpacked repositories. It turns out that performance of freshly cloned GitHub repositories is very different from local repositories, because all files are stored in git packs. It took some time to get the performance on par with libgit2 for repos with a large git height (like xunit); but it looks good now.
  • Room for improvement I’m sure there’s still room for improvement if you want to squeeze out extra performance

What’s next

Obviously, that’s up to the maintainers of this repository. Personally, I’ve spent too much time on getting LibGit2Sharp working on the platforms I care about (Visual Studio Code on Ubuntu, to name one) and will want to move to a purely managed build task for calculating Git height. My preference is to keep using nbgv, so I can take this further and open a PR if there’s interest in getting it merged.

@filipnavara mentioned he has a repository with a very large git height (> 1500 IIRC). It’s be interesting to run the benchmarks on that repository, too, and see how the managed implementation holds up (I’m guessing running benchmarks will uncover some bugs, too), both in an ‘unpacked’ and a packed state of the repository.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
filipnavaracommented, Sep 4, 2020

Thanks for giving this a try. In fact I was planning to prototype this myself so you saved me quite a bit of time. I’ll likely get to try it early next week and I’ll report back the results.

1reaction
qmfrederikcommented, Sep 8, 2020

@AArnott

So would we put the managed git implementation directly into this repo? That is probably fine.

That’s what I would suggest at the moment.

Another place to look for a good base is SourceLink, which I believe has a readonly managed git implementation already.

Yes, I initially started from the code which is in the SourceLink repository. But it’s a very light implementation - it supports reading the URL of the remote repository and the name of the branch you’re on, and that’s pretty much it. There’s no support for reading commits, trees or blobs.

So the long-term value I see in a managed git impl is primarily around working everywhere instead of being limited to where libgit2sharp has native binary support. And if libgit2sharp ever finishes removing their native HTTPS dependency, that would be resolved too.

Yes, but there are other issues caused by libgit2sharp:

  1. Support for Visual Studio Code on Linux is still broken (because it uses Omnisharp which uses Mono which has different native library loading rules - #417)
  2. There’s the libcurl dependency, too.
  3. Then there’s support for new architectures (e.g. macOS on ARM, Windows on ARM, Alpline Linux on ARM, perhaps Android if it evolves into a desktop OS,…). You’ll get it all for free when you do a managed implementation (and dotnet/core follows).

Here’s what the timeline for linux-arm64 looked like:

  • 28-mar-2019 through 5-may-2019: libgit2/libgit2sharp#1686
  • 11-jun-2019 through now: libgit2/libgit2sharp#1686 (never merged)
  • 9-sep-2019/28-oct-2019 through 20-nov-2019: libgit2/libgit2sharp#1714, libgit2/libgit2sharp#1732 and libgit2/libgit2sharp#1741
  • 28-oct-2019 through 21-nov-2019: libgit2/libgit2sharp#1686

So, that’s 7 months to get it done. You’d get it ‘for free’ with a managed implementation.

Honestly, I never want to go there again. The complexity of having to raise PRs across three repositories with two different owners, the complexity in debugging this (a MSBuild task which runs a managed wrapper around native code, which can execute in .NET, .NET Core or Mono, with all the different loading rules,…).

I think that time is better spent building a performant, managed, read-only Git implementation.

If you share the concern about this, let’s push on libgit2sharp one more time to see what they’re thinking, and give me a few more days to put together all the nbgv perf improvements already in the pipe. Then you can see if you’re still motivated to put in the work.

It’s always good to reach out to libgit2sharp and see what they are up to. libgit2/libgit2sharp.nativebinaries#96 was merged a while ago, but not sure what the impact is.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using git repository as a database backend
Using git repository as a database backend · category = directory, document = file · getting document by ID => changing directories +...
Read more >
9.1 Git and Other Systems - Git as a Client
This tool allows you to use Git as a valid client to a Subversion server, so you can use ... Remember that if...
Read more >
Flutter, with a Git backend? : r/FlutterDev
I've been using Flutter exclusively to make web apps for 3 years. I have one app that is a Process Management application.
Read more >
Git Fileserver Backend Walkthrough
The gitfs backend allows Salt to serve files from git repositories. It can be enabled by adding git to the fileserver_backend list, and...
Read more >
Use Git as the backend for chat
GIC uses Git as its engine, so you need an empty Git repository to serve as its chatroom and logger. The repository can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found