question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`git://` protocol support

See original GitHub issue

I have a project that uses one of your downstreams (antora). Due to current infrastructure limitations (gitolite + cgit), there are three ways to fetch a project:

  1. Using an ssh key (no usernames / passwords / oauth) - obviously not ok, since mirroring is meant to be allowed.
  2. “dumb” http (cgit) - something that is incompatible with current isomorphic-git.
  3. the git:// protocol - which does not seem to be supported right now

The error when attempting 3 is error: Content source uses an unsupported transport protocol: git://[...].

Please add support to the git protocol, as it makes isomorphic-git impossible to use under the circumstances listed above, as well as similar ones (e.g ssh+git-daemon only setups, etc).

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:18 (11 by maintainers)

github_iconTop GitHub Comments

6reactions
CosmicToastcommented, Dec 26, 2018

For github, I highly suspect it’s 0%, but there are other git hosts out there besides github/gitlab 😃 For my repo, it doesn’t make a significant difference (based on below calculations). Here’s a (relatively short) analysis of git clone performance:

Abstract

It is suspected that due to its lack of ability to prepare custom packs, the “dumb http” git protocol will perform worse than git-daemon, as well as be less reliable. We test the former and discuss the latter. The data indicates that the flat performance hit due to using git is greater than the one for dumb http (though both are negligible), but that git scales significantly better for larger repositories.

Methodology

We take two repositories: Alpine’s user-handbook repository, and the linux kernel.

We then create a directory into which we locally cache both of them:

mkdir ~/git
cd ~/git
git clone --bare git://git.alpinelinux.org/docs/user-handbook.git
git clone --bare https://github.com/torvalds/linux.git

We then enable “dumb http” support:

for f in user-handbook linux
do
  cd $f.git
  mv hooks/post-update.sample hooks/post-update
  chmod +x hooks/post-update
  ./hooks/post-update
  cd ..
done

Then we run a static http server and git-daemon (note: the following two commands are ran in separate terminals):

python -m http.server
git daemon --base-path=. --export-all --reuseaddr --informative-errors

Each repository will be cloned 3 times, into tmpfs, each of which will be timed, using the following commands:

c_git() { time git clone -q git://localhost/$1 $1-git && rm -rf $1-git }
c_http() { time git clone -q http://localhost:8000/$1 $1-http && rm -rf $1-http }

This approach means that we can eliminate variables such as read/write speed (reads are from cache, writes are to ram), network speed (everything happens over lo) and similar - allowing us to measure specifically protocol overhead.

Data

Small Repository with Git

Run Number Time (s) % Relative to Mean
1 0.072 104%
2 0.063 91%
3 0.073 106%
Mean 0.069 100%
Sigma 0.0043 6.2%

Small Repository with Dumb HTTP

Run Number Time (s) % Relative to Mean
1 0.032 94%
2 0.032 94%
3 0.038 112%
Mean 0.034 100%
Sigma 0.0026 7.6%

Large Repository with Git

Run Number Time (s) % Relative to Mean
1 363.03 99.9%
2 366.22 100.8%
3 360.52 99.2%
Mean 363.26 100%
Sigma 1.98 0.5%

Large Repository with Dumb HTTP

Run Number Time (s) % Relative to Mean
1 459.26 100.9%
2 452.58 99.5%
3 452.48 99.5%
Mean 454.77 100%
Sigma 2.99 0.7%

Mean Summary and Comparisons

Checkout Size Checkout Type Time (s) % Relative to Alternative Type
Small Git 0.069 203%
Small HTTP 0.034 49%
Large Git 363.26 80%
Large HTTP 454.77 125%

Analysis and Conclusion

The data shows that “git” has a significant initial overhead cost, but that it scales significantly better than “dumb http”. However, the scaling, while linear, appears to be lower than 1:1 - meaning that as repositories get larger and larger, this becomes less important, though this may be due to io bottlenecking (even on tmpfs). It is also notable that git has reliably smaller standard deviations, suggesting it is more consistent.

It is doubtful that repositories will get much larger than 2gb, so we can consider http overhead over git to be at least 25%, While git is shown as slower for smaller repositories, that difference is mostly negligible. As such, the git-daemon-based protocol should be preferred over “dumb” http for read operations.

Additional Notes Regarding Reliability

Observing the behavior of the HTTP server, we can see that each object is downloaded separately using HTTP GET. This would normally not be a problem, but because of the nature of large repositories, it downloads the pack like this - something that will not deal well with packet loss. Whether or not this applies to the git protocol is unknown, and should be investigated separately.

Issues

Unfortunately, it is not possible to calculate initial per-protocol overhead, nor graph the increase in time based on commit-byte. This is because git does not offer a protocol-less cloning mechanism (my understanding is that the file-based one is still greater than cp). If one wanted to make this more rigorous, one would write an extension to git-clone that would only perform cp(1), and use that as the control, as well as making a single-commit single-empty-file repository to get a reliable 0. With that, it would become possible to calculate and plot the actual protocol overhead / commit-byte. There’s also a lack of sample data - this should be repeated with a statistically significant, randomly selected set of repositories (but I’m lazy and time constrained).

Implementation Notes

All of the above tests were ran on a i5-5200U laptop with 8GB of ram, that was otherwise idle. If attempting to reproduce, I recommend adjusting system fd limit, and increasing filetree caching aggressiveness - as well as minimizing swappiness, to avoid hitting unnecessary IO. If one happens to have additional RAM, everything should be in tmpfs, and the rm -rf step can be skipped (don’t forget to increment indexes in that case).

3reactions
CosmicToastcommented, Jan 27, 2019

git:// has nothing in common with ssh. The git protocol is implemented by git-daemon(1), listens on a separate TCP port, and has the path rewritten based on the arguments provided to it (e.g --base-path and co, see the example benchmark above for how to invoke it). The ssh protocol is implemented similar to local files (e.g git clone /srv/something), with the path being what goes after the : (in user@host:path format) or after the first non-protocol / (in ssh://host/path format).

The git:// protocol provides no authentication whatsoever, and intentionally so.

For more details on git://, please see https://git-scm.com/docs/git-daemon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

4.1 Git on the Server - The Protocols
Git can use four distinct protocols to transfer data: Local, HTTP, Secure Shell (SSH) and Git. Here we'll discuss what they are and...
Read more >
Improving Git protocol security on GitHub | The GitHub Blog
Improving Git protocol security on GitHub. We're changing which keys are supported in SSH and removing unencrypted Git protocol. Only users ...
Read more >
GitHub to Phase out Support for Git Protocol, DSA Keys ... - InfoQ
With a strong focus on having customer data as secure as possible, GitHub has decided to remove support for the unencrypted Git protocol, ......
Read more >
Deprecation of the git:// protocol on GitHub - Read the Docs Blog
Git submodule URLs · Pip VCS support. If you are trying to clone a repository using the Git protocol, you may see an...
Read more >
Configuring Git Protocol v2 - GitLab Docs
Set and configure Git protocol v2. ... CentOS 6 / RHEL 6 sudo service sshd restart # All other supported distributions sudo systemctl...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found