question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

housekeeping: Remove large files from git history

See original GitHub issue
  • image files
  • pdf files
set -xe

REPO_URL=https://github.com/intel/dffml
BRANCH=$BRANCH
TEMPDIR=$(mktemp -d)
GIT_FILTER_REPO_PATHS=$(mktemp)

rm -rf $TEMPDIR
mkdir $TEMPDIR
cd $TEMPDIR
git init
git remote add origin $REPO_URL
git fetch origin $BRANCH
git reset --hard origin/$BRANCH
git log -n 3 --oneline
git log --stat $BRANCH | grep -E '\.png|\.jpeg|\.jpg|\.gif'
cat > $GIT_FILTER_REPO_PATHS <<'EOF'
glob:*.gif
glob:*.png
glob:*.jpg
glob:*.jpeg
EOF
git filter-repo --force --invert-paths --paths-from-file $GIT_FILTER_REPO_PATHS
git diff --stat origin/$BRANCH
git log --stat $BRANCH | grep -E '\.png|\.jpeg|\.jpg|\.gif'
git branch

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
pdxjohnnycommented, Mar 13, 2022

The repo is still way to big for some reason.

$ git clone https://github.com/intel/dffml test-dffml-2022-03-13-10-06-no-gifs-supposed-to-be
Cloning into 'test-dffml-2022-03-13-10-06-no-gifs-supposed-to-be'...
remote: Enumerating objects: 108172, done.
remote: Counting objects: 100% (8778/8778), done.
remote: Compressing objects: 100% (399/399), done.
remote: Total 108172 (delta 8463), reused 8394 (delta 8377), pack-reused 99394 eceiving objects: 100% (108172/1081
Receiving objects: 100% (108172/108172), 577.15 MiB | 13.44 MiB/s, done.
Resolving deltas: 100% (86371/86371), done.
$ cd test-dffml-2022-03-13-10-06-no-gifs-supposed-to-be
$ du -h --threshold=400k -a .
2.9M    ./.git/objects/pack/pack-3bddcee5979f5e78334ceae49f719fa4643e3c7f.idx
578M    ./.git/objects/pack/pack-3bddcee5979f5e78334ceae49f719fa4643e3c7f.pack
581M    ./.git/objects/pack
581M    ./.git/objects
581M    ./.git
920K    ./dffml
1.5M    ./docs/images/BSidesPDX_2019_Down_The_Dependency_Rabbit_Hole.pdf
3.3M    ./docs/images/GSoC_2019_Models.pdf
536K    ./docs/images/Theory_Operation_and_Application_of_Neural_Networks.pdf
5.7M    ./docs/images
6.3M    ./docs
936K    ./examples/ffmpeg/input.mp4
960K    ./examples/ffmpeg
720K    ./examples/notebooks/transferlearning.ipynb
868K    ./examples/notebooks
2.3M    ./examples
652K    ./model
592M    .
0reactions
pdxjohnnycommented, Jun 22, 2022

While diffing manifest-main and manifest

$ git diff origin/manifest | grep Binary
Binary files a/docs/examples/webhook/images/github_settings.png and /dev/null differ
Binary files a/docs/examples/webhook/images/localhost_run.png and /dev/null differ
Binary files a/docs/examples/webhook/images/ngrok_out.png and /dev/null differ
Binary files a/docs/images/how-to-read-ci-tests.png and /dev/null differ
Binary files a/docs/images/maintainance-arch.png and /dev/null differ
Binary files a/docs/images/oarch.jpg and /dev/null differ
Binary files a/docs/images/rubric-table.png and /dev/null differ
Binary files a/docs/images/website-before.png and /dev/null differ
Binary files a/examples/MNIST/image1.png and /dev/null differ
Binary files a/examples/MNIST/image2.png and /dev/null differ
Binary files a/examples/MNIST/image3.png and /dev/null differ
Binary files a/examples/MNIST/image4.png and /dev/null differ
Binary files a/examples/flower17/buttercup.jpg and /dev/null differ
Binary files a/examples/flower17/daisy.jpg and /dev/null differ
Binary files a/examples/flower17/pansy.jpg and /dev/null differ
Binary files a/examples/flower17/tigerlily.jpg and /dev/null differ
Binary files a/examples/shouldi/shouldi.jpg and /dev/null differ
$ git diff origin/manifest | grep Binary | awk '{print $3}' | sed -e 's/a\///'
docs/examples/webhook/images/github_settings.png
docs/examples/webhook/images/localhost_run.png
docs/examples/webhook/images/ngrok_out.png
docs/images/how-to-read-ci-tests.png
docs/images/maintainance-arch.png
docs/images/oarch.jpg
docs/images/rubric-table.png
docs/images/website-before.png
examples/MNIST/image1.png
examples/MNIST/image2.png
examples/MNIST/image3.png
examples/MNIST/image4.png
examples/flower17/buttercup.jpg
examples/flower17/daisy.jpg
examples/flower17/pansy.jpg
examples/flower17/tigerlily.jpg
examples/shouldi/shouldi.jpg
Read more comments on GitHub >

github_iconTop Results From Across the Web

Reduce repository size - GitLab Docs
To remove large files from tagged releases, force push your changes to all tags on GitLab: git push origin --force 'refs/tags/*'.
Read more >
Remove a Large File from Commit History in Git - Baeldung
In this tutorial, we'll learn how to remove large files from the commit history of a git repository using various tools.
Read more >
Cleaning up git history - Stack Overflow
Linked. -1 · Remove huge pushed data from git repo history · 78 · Completely remove files from Git repo and remote on...
Read more >
Tutorial: Removing Large Files from Git | by Erin Hoffman
Interactive Rebase for Removing Large Files. Conceptually what we're doing here is looking back through the Git history, finding the commit ...
Read more >
How can I remove a large file from my commit history?
If you've committed a large file to your repository that takes up a large amount of disk space, simply removing it in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found