50 Shades of Git: Remotes and Authentication
Introduction
Git is a software development tool that almost all engineers use in their work. This source control tool enables us to make changes to a project code base collaboratively. However, Git can be a headache at times. When running on CI environment, it sometimes does not work the way it does locally. Moreover, we sometimes follow best practices without knowing much about how it works. This gap together with the limited debug capabilities on CI make it even harder to resolve issues.
In this blog post, we are going to fill a bit of that gap. To be more specific, we are looking into how different ways of configuring a remote may affect the way Git authenticates with the server.
Background
A Git server refers to the server in which a repo is hosted. Those can be Github (github.com), Gitlab (gitlab.com), Bitbucket (bitbucket.org), or self-hosted server (ex. gitlab.company.com)
A remote in Git refers to a repo (hosted in a Git server) in which team members collaborate, ex. https://github.com/trinhngocthuyen/cocoapods-ezplugin.
A fetch action is to fetch changes (of branches or tags) from a remote. A push action is to transfer your local changes to a remote. These actions are done by the git fetch
and git push
commands respectively.
A typical workflow would be:
- (1) An engineer fetches the latest changes (using
git fetch
command) from remotes. - (2) He/she makes changes on top of the latest on his/her local.
- (3) He/she pushes his/her changes (using
git push
command) to remotes.
Conflicts may arise in steps (1) or (3). Engineers have to resolve them and sometimes try again.
Depending on preference, some may use git pull
in their workflow in step (1). Under the hood, git pull
is just git fetch
followed up with git merge
.
Configuring a Remote
A remote is denoted by a URL. This URL contains information about the transport protocol (SSH, HTTP/HTTPS, FTP…). Below are some valid examples:
ssh://[email protected]/trinhngocthuyen/cocoapods-ezplugin.git
[email protected]:trinhngocthuyen/cocoapods-ezplugin.git
https://github.com/trinhngocthuyen/cocoapods-ezplugin.git
To view remotes of a repo, run git remove -v
.
$ git remote -v
origin https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin https://github.com/trinhngocthuyen/ezactions.git (push)
We can configure different URLs for the push action. This is done by running git remote set-url --push
.
$ git remote set-url --push origin https://github/trinhngocthuyen/foo.git
$ git remote -v
origin https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin https://github.com/trinhngocthuyen/foo.git (push)
Alternatively, we can use git config remote.<origin_name>.pushurl
to alter the push URL.
$ git config remote.origin.pushurl https://github/trinhngocthuyen/bar.git
$ git remote -v
origin https://github.com/trinhngocthuyen/ezactions.git (fetch)
origin https://github.com/trinhngocthuyen/bar.git (push)
We can configure more than one remote per repo. This is usually the case for open-source projects where each engineer forks the repo. He/she pushes changes to his/her forked repo but still desires to keep his/her fork up to date with the main repo. This case is also useful when you work with mirrors (for example, one public repo on Github/Gitlab, and one private repo on your company server). However, we shall not dive into details for that topic.
A fetch/push/clone is associated with a remote. Prior to this action, Git authenticates with the server (ex. Github) and then performs further steps if applicable. Therefore, the credentials used for authentication is adjacent to the remote configuration. Those credentials could be an SSH key, a tuple of username/password, or an access token. In the following section, we’ll look into how such credentials play role in the authentication.
Remotes and Credentials for Authentication
Authentication with a Git server when cloning/fetching from/pushing to a remote is similar. For convenience, we take the fetch action as a typical example. If you take a closer look at how Gitlab CI/CD or Github Actions implements their checkout, you should see the order like this:
- Initializing the project (with
git init
) - Cd to that project
- Then, configuring the
origin
remote - Then, fetching the origin and checking out a given commit
SSH
Using SSH to connect with a Git server is a common practice. A remote used with SSH is like this:
[email protected]/trinhngocthuyen/cocoapods-ezplugin.git
When fetching such a remote, Git opens an SSH connection to the server under the hood. This is when the authentication jumps in. As you know, it requires a pair of public & private keys. The public key is added to the server (ex. Github). The private one is owned by the user and used for authentication. This key, in OpenSSH, is known as “Identity Key” and is located in a file called IdentifyFile. By default, the following files are used
~/.ssh/id_rsa,
~/.ssh/id_ecdsa
~/.ssh/id_ed25519
...
If you have them configured, you can test the connection by running: ssh -T git@<server>
$ ssh -T [email protected]
Hi trinhngocthuyen! You've successfully authenticated, but GitHub does not provide shell access.
$ ssh -T [email protected]
Welcome to GitLab, @trinhngocthuyen!
Using different keys for different servers
Some choose to use different keys for Github, Gitlab, or your company server.
~/.ssh/id_rsa_github
~/.ssh/id_rsa_gitlab
~/.ssh/id_rsa_company
To add a key to the authentication agent, use ssh-add
:
$ ssh-add ~/.ssh/id_rsa_github
$ ssh -T [email protected]
Hi trinhngocthuyen! You've successfully authenticated, but GitHub does not provide shell access.
Manually loading keys like this has two downsides:
- First, those keys are wiped upon system restart, or when the SSH agent is restarted. And we need to load them again. There are some tips to automate this task (ie. loading keys) upon Mac login. Yet, the workaround is not quite universal.
- Second, more than one key may work. And the one to be used is not one that should be used. For example, if the above three keys work for Github, the chosen one might be the one you do not expect. I know this example seems a bit extreme but my point is it increases management costs.
A more proper approach is to use the SSH config (located in ~/.ssh/config
). This way, you can configure what key is used for what server.
Host github.com
IdentityFile ~/.ssh/id_rsa_github
Host gitlab.com
IdentityFile ~/.ssh/id_rsa_gitlab
Host gitlab.company.com
IdentityFile ~/.ssh/id_rsa_company
Using different keys for different repos
This is usually the case for CI. When running on CI, you should be mindful of what to write outside of the project directory. For self-hosted runners, files you write outside of these directories might retain. This issue happens a lot for Shell (MacOS) runners.
Two main drawbacks when such files are not properly cleaned up are:
- Subsequent executions get affected. For example, a runner picks up a job that writes the SSH config above and exits. Then it picks up another job belonging to another team that handles SSH differently, this file may lead to unexpected behaviors.
- Sensitive files could be leaked. For example, if you write your key to
~/.ssh/id_rsa
and forget to clean it up properly, another employee in your company can just run jobs and dump this file (if exists) to obtain your key.
Therefore, a best practice is to stick to the project directory or any directory that is guaranteed to be cleaned up by the CI/CD infra.
Then, in this case, we can instruct git to use the key by the core.sshCommand
config (see: reference):
$ git config core.sshCommand "ssh -o IdentitiesOnly=yes -i <path/to/key> -F /dev/null"
HTTP/HTTPS
There is no problem if the repo is public. The remote URL is just like the web URL to the repo. For convenience, let’s call this kind of URL “bare URL”.
https://github.com/trinhngocthuyen/public-repo
Now, we only care about how to fetch from a private repo.
Git authenticates with the server using a username & password, or a token. We can see a token as a username/password tuple where the password is the token and the username is just anything you want (ex. x-access-token
, gitlab-token
…). Therefore, we can treat these two roughly the same.
Using username/password in the remote URL
An HTTP/HTTPS remote that allows us to fetch successfully looks like this:
https://<username>:<password>@github.com/trinhngocthuyen/private-repo
This turns out to be the approach Gitlab CI/CD adopts. If you run git remote -v
in a Gitlab job, you should see the URL as follows:
Using http.extraheader
config
Github Checkout Action adopts a different approach. They use the http.extraheader
config to carry the credentials for authentication. And the remote URL is just a bare URL.
https://github.com/trinhngocthuyen/private-repo
Below are the logs from the checkout step. Taking a closer look, we notice the command that sets up the authentication. The masked content ***
is actually the base64 encoded string of x-access-token:<token>
(see: src/git-auth-helper.ts#L57-L60).
You can easily try out this approach on your local by:
- First, creating an access token
- Configure
http.extraheader
- Then try fetching a private repo
$ git config http.extraheader "Authorization: Basic $(echo -n x-access-token:<TOKEN> | base64)"
$ git fetch https://github.com/trinhngocthuyen/private-repo
Note: If you’re using Bash to encode <username>:<password>
, be careful with the trailing newlines. It should be echo -n <username>:<password> | base64
instead of echo <username>:<password> | base64
.
In case you want to configure for Github only, then use http.https://github.com/.extraheader
instead of http.extraheader
.
$ git config http.https://github.com/.extraheader "Authorization: Basic <base64(username:password)>"
This approach also works for other servers (Gitlab, Bitbucket…) as long as they support basic authentication.
Username/password prompts
If you fetch a remote with a bare URL (without a username/password), Git prompts you to ask for a username and password. Let’s say, we input x-access-token
for the username and the access token for the password. Then, it successfully fetches from this remote.
$ git fetch https://github.com/trinhngocthuyen/private-repo
Username for 'https://github.com': x-access-token
Password for 'https://[email protected]': my-token-goes-here
From https://github.com/trinhngocthuyen/private-repo
* branch HEAD -> FETCH_HEAD
Let say, you are a MacOS user. Now, you fetch from this remote again. Then, you are able to perform the fetch without seeing the username/password prompts again.
$ git fetch https://github.com/trinhngocthuyen/private-repo
From https://github.com/trinhngocthuyen/private-repo
* branch HEAD -> FETCH_HEAD
This behavior is due to the fact that Git caches the credentials. When enabling git traces by setting variable GIT_TRACE=1
, you should see what handles the credentials cache.
$ GIT_TRACE=1 git fetch https://github.com/trinhngocthuyen/private-repo
09:22:03.977378 git.c:460 trace: built-in: git fetch https://github.com/trinhngocthuyen/private-repo
09:22:03.978347 run-command.c:655 trace: run_command: GIT_DIR=.git git remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:03.992273 git.c:750 trace: exec: git-remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:03.992846 run-command.c:655 trace: run_command: git-remote-https https://github.com/trinhngocthuyen/private-repo https://github.com/trinhngocthuyen/private-repo
09:22:04.464215 run-command.c:655 trace: run_command: 'git credential-osxkeychain get'
09:22:04.509220 git.c:750 trace: exec: git-credential-osxkeychain get
09:22:04.510059 run-command.c:655 trace: run_command: git-credential-osxkeychain get
09:22:04.993732 run-command.c:655 trace: run_command: 'git credential-osxkeychain store'
09:22:05.038985 git.c:750 trace: exec: git-credential-osxkeychain store
09:22:05.039730 run-command.c:655 trace: run_command: git-credential-osxkeychain store
09:22:05.506154 run-command.c:655 trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs
From https://github.com/trinhngocthuyen/private-repo
* branch HEAD -> FETCH_HEAD
09:22:05.547164 run-command.c:1524 run_processes_parallel: preparing to run up to 1 tasks
09:22:05.547195 run-command.c:1551 run_processes_parallel: done
09:22:05.547216 run-command.c:655 trace: run_command: git maintenance run --auto --no-quiet
09:22:05.565672 git.c:460 trace: built-in: git maintenance run --auto --no-quiet
It is git credential-osxkeychain
that does the magic in MacOS. In the first successful fetch, the command git credential-osxkeychain store
saves the credentials to Keychain. In subsequent uses, it runs git credential-osxkeychain get
to retrieve the credentials for authentication.
You can easily verify this by checking the corresponding item in Keychain Access, or by running git credential-osxkeychain get
:
$ echo "host=github.com\nprotocol=https" | git credential-osxkeychain get
password=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
username=x-access-token
Git credential storage
What happened in the precedent section is the credentials are handled by “credential storage”. In MacOS, Git comes with the osxkeychain
mode which allows caching such info to Keychain.
If you also observe the same behavior (ie. Git remembers your credentials), then maybe you have the cache in place. To see the current credential storage:
$ git config credential.helper
osxkeychain
In fact, for me, osxkeychain
is set as the credential storage by the system git config (located in /System/Volumes/Data/usr/local/etc/gitconfig
)
$ git config --system --list
credential.helper=osxkeychain
There are several built-in options besides osxkeychain
(see: reference):
cache
: short-lived cache in memory (for 15m)store
: the cache persists in a text file~/.git-credentials
- …
You can try out these options by overriding the config:
$ git config credential.helper cache
Using url.<base>.insteadOf
config
This config is really useful, especially for CI environment.
For Git-based dependencies in the project (declared in Gemfile, Podfile, etc.), engineers may choose to use SSH URLs because those work for them on their local. When running on CI environment, those URLs possibly won’t work if the CI provider does not use SSH for authentication (ex. Github, Gitlab). Changing those URLs to HTTP/HTTPS format, unfortunately, might cause the issue on their local.
A simple solution to mitigate this issue is using the url.<base>.insteadOf
config. This way, a URL format can be translated into the expected one.
Using this config is a very common practice to make your CI executions robust. Therefore, sometimes you might see the code like this on CI:
# For Github Actions
$ git config --global url."https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/".insteadOf "[email protected]:"
# For Gitlab CI
$ git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.com/".insteadOf "[email protected]:"
Conclusion
In this blog post, we covered some areas of how Git authenticates with the server. We also mentioned some best practices when working with SSH and HTTP/HTTPS remotes. Although some practices are not really the case for local development, they are quite common for CI integration. Given that different CI providers may adopt different approaches (ex. Github using the .extraheader
config, Gitlab using token-based remotes, CircleCI using SSH), knowing how they work helps you be less confused with the workflows.
At the end of the day, good engineering quality comes from not only excelling at domain knowledge but also being proficient in your day-to-day tools, in my opinion.