How to remove sensitive information from a Github repository

Hello everyone! In this episode, I would like to talk about Github and how to remove sensitive information that was accidentally uploaded there.

Alternative video link (for Russia): https://vk.com/video-149273431_456239077

This is a fairly common problem. When publishing the project code on Github, developers forget to remove credentials: logins, passwords, tokens. What to do if this becomes known? Well, of course, these credentials must be urgently changed.

What was publicly available on the Internet cannot be completely removed. This data is indexed and copied by some systems. But wiping it from github.com is real.

Why is it not enough to just delete the file in the Github repository? The problem is that the history of changes for the file will remain and everything will be visible there. Surprisingly, there is still no tool in the Github web interface to remove the history for a file. You have to use third-party utilities, one of them is git-filter-repo.

First you need to install this utility. On Linux, specifically I’m using Ubuntu 20.04.4 LTS, the easiest way to install it is using pip. Pip is a package-management system written in Python used to install and manage software packages. 

python3 -m pip install git-filter-repo

Also, you will have to get the token from https://github.com/settings/tokens, because git push no longer accepts password authentication.

Next, in the script below, set username, repository, path_to_delete and execute it.

username="your-github-username"
repository="your-github-repository"
path_to_delete="file-you-want-to-delete-completely.txt"

git clone https://github.com/$username/$repository
cd $repository
git filter-repo --invert-paths --path $path_to_delete

git remote add origin https://github.com/$username/$repository.git
git push origin --force --all
#    Username for 'https://github.com': <your-github-username>
#    Password for 'https://<your-github-username>@github.com': <github-token>

git push origin --force --tags
#    Username for 'https://github.com': <your-github-username>
#    Password for 'https://<your-github-username>@github.com': <github-token>

cd ../
rm -rf $repository

As you can see, the repository is cloned there, filter-repo does some magic, and then the changes are pushed to the repository with github token authentication. As a result, the file from path_to_delete will be completely removed from your repository, including the history.

But again, I want to say that this is not leakage protection, because the leakage has already occurred. This is an attempt to reduce damage. The data must be considered compromised and must be changed, if possible.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.