Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`git://` protocol support

See original GitHub issue

I have a project that uses one of your downstreams (antora). Due to current infrastructure limitations (gitolite + cgit), there are three ways to fetch a project:

Using an ssh key (no usernames / passwords / oauth) - obviously not ok, since mirroring is meant to be allowed.
“dumb” http (cgit) - something that is incompatible with current isomorphic-git.
the git:// protocol - which does not seem to be supported right now

The error when attempting 3 is error: Content source uses an unsupported transport protocol: git://[...].

Please add support to the git protocol, as it makes isomorphic-git impossible to use under the circumstances listed above, as well as similar ones (e.g ssh+git-daemon only setups, etc).

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:18 (11 by maintainers)

Top GitHub Comments

6reactions

CosmicToastcommented, Dec 26, 2018

For github, I highly suspect it’s 0%, but there are other git hosts out there besides github/gitlab 😃 For my repo, it doesn’t make a significant difference (based on below calculations). Here’s a (relatively short) analysis of git clone performance:

Abstract

It is suspected that due to its lack of ability to prepare custom packs, the “dumb http” git protocol will perform worse than git-daemon, as well as be less reliable. We test the former and discuss the latter. The data indicates that the flat performance hit due to using git is greater than the one for dumb http (though both are negligible), but that git scales significantly better for larger repositories.

Methodology

We take two repositories: Alpine’s user-handbook repository, and the linux kernel.

We then create a directory into which we locally cache both of them:

mkdir ~/git
cd ~/git
git clone --bare git://git.alpinelinux.org/docs/user-handbook.git
git clone --bare https://github.com/torvalds/linux.git

We then enable “dumb http” support:

for f in user-handbook linux
do
  cd $f.git
  mv hooks/post-update.sample hooks/post-update
  chmod +x hooks/post-update
  ./hooks/post-update
  cd ..
done

Then we run a static http server and git-daemon (note: the following two commands are ran in separate terminals):

python -m http.server
git daemon --base-path=. --export-all --reuseaddr --informative-errors

Each repository will be cloned 3 times, into tmpfs, each of which will be timed, using the following commands:

c_git() { time git clone -q git://localhost/$1 $1-git && rm -rf $1-git }
c_http() { time git clone -q http://localhost:8000/$1 $1-http && rm -rf $1-http }

This approach means that we can eliminate variables such as read/write speed (reads are from cache, writes are to ram), network speed (everything happens over lo) and similar - allowing us to measure specifically protocol overhead.

Data

Small Repository with Git

Run Number	Time (s)	% Relative to Mean
1	0.072	104%
2	0.063	91%
3	0.073	106%
Mean	0.069	100%
Sigma	0.0043	6.2%

Small Repository with Dumb HTTP

Run Number	Time (s)	% Relative to Mean
1	0.032	94%
2	0.032	94%
3	0.038	112%
Mean	0.034	100%
Sigma	0.0026	7.6%

Large Repository with Git

Run Number	Time (s)	% Relative to Mean
1	363.03	99.9%
2	366.22	100.8%
3	360.52	99.2%
Mean	363.26	100%
Sigma	1.98	0.5%

Large Repository with Dumb HTTP

Run Number	Time (s)	% Relative to Mean
1	459.26	100.9%
2	452.58	99.5%
3	452.48	99.5%
Mean	454.77	100%
Sigma	2.99	0.7%

Mean Summary and Comparisons

Checkout Size	Checkout Type	Time (s)	% Relative to Alternative Type
Small	Git	0.069	203%
Small	HTTP	0.034	49%
Large	Git	363.26	80%
Large	HTTP	454.77	125%

Analysis and Conclusion

The data shows that “git” has a significant initial overhead cost, but that it scales significantly better than “dumb http”. However, the scaling, while linear, appears to be lower than 1:1 - meaning that as repositories get larger and larger, this becomes less important, though this may be due to io bottlenecking (even on tmpfs). It is also notable that git has reliably smaller standard deviations, suggesting it is more consistent.

It is doubtful that repositories will get much larger than 2gb, so we can consider http overhead over git to be at least 25%, While git is shown as slower for smaller repositories, that difference is mostly negligible. As such, the git-daemon-based protocol should be preferred over “dumb” http for read operations.

Additional Notes Regarding Reliability

Observing the behavior of the HTTP server, we can see that each object is downloaded separately using HTTP GET. This would normally not be a problem, but because of the nature of large repositories, it downloads the pack like this - something that will not deal well with packet loss. Whether or not this applies to the git protocol is unknown, and should be investigated separately.

Issues

Unfortunately, it is not possible to calculate initial per-protocol overhead, nor graph the increase in time based on commit-byte. This is because git does not offer a protocol-less cloning mechanism (my understanding is that the file-based one is still greater than cp). If one wanted to make this more rigorous, one would write an extension to git-clone that would only perform cp(1), and use that as the control, as well as making a single-commit single-empty-file repository to get a reliable 0. With that, it would become possible to calculate and plot the actual protocol overhead / commit-byte. There’s also a lack of sample data - this should be repeated with a statistically significant, randomly selected set of repositories (but I’m lazy and time constrained).

Implementation Notes

All of the above tests were ran on a i5-5200U laptop with 8GB of ram, that was otherwise idle. If attempting to reproduce, I recommend adjusting system fd limit, and increasing filetree caching aggressiveness - as well as minimizing swappiness, to avoid hitting unnecessary IO. If one happens to have additional RAM, everything should be in tmpfs, and the rm -rf step can be skipped (don’t forget to increment indexes in that case).

3reactions

CosmicToastcommented, Jan 27, 2019

git:// has nothing in common with ssh. The git protocol is implemented by git-daemon(1), listens on a separate TCP port, and has the path rewritten based on the arguments provided to it (e.g --base-path and co, see the example benchmark above for how to invoke it). The ssh protocol is implemented similar to local files (e.g git clone /srv/something), with the path being what goes after the : (in user@host:path format) or after the first non-protocol / (in ssh://host/path format).

The git:// protocol provides no authentication whatsoever, and intentionally so.

For more details on git://, please see https://git-scm.com/docs/git-daemon.