Gerrit for Contributing to Open Source

Code review, most fortunately, has come to be widely accepted as necessary for large-scale software development. Not only does it help locate bugs in the program's behavior preemptively, but it also (in best practice) ensures that at least one person other than the code's author understands the text of the code itself. Code review is also held in high regard by FLOSS principles: a significant portion of the "open-source" splinter group of the free software movement's thesis is simply that code will be better if it is widely reviewable. The free software movement itself holds that the “reviewablility” of all used code is a moral imperative.

As an organization with significant investment in developing our own (frequently proprietary) code, the importance of code reviewing free software contributions takes on several other dimensions. Belvedere wants to ensure that any code publicly posted under our organization's name accurately reflects our shared standards of code quality. We have significant investment in our proprietary algorithms remaining trade secrets.Thus, Belvedere needs to minimize the risk that proprietary algorithms could be posted to the public internet so that we can contribute code upstream.

This blog post will explain how we accomplished integrating an internal code review system with a public GitHub account, including all of the code needed to set something similar up in your own organization. It is our hope that this will allow other companies with similar investments in large proprietary codebases to also contribute non-proprietary code upstream.

Here at Belvedere, we use gerrit to host our code review, both for our internal software projects and for the FLOSS projects to which we contribute. It understands the git protocol and provides a modern web-front-end for doing the review itself. Gerrit supplies its own git implementation (called jgit), which does sometimes lead to strangeness, but it allows us to put commits in a "holding area" (called refs/for/...) before they pass code review and are merged into the appropriate branch. Reviewing can also be done locally using ssh commands to communicate with the gerrit server. Some packages (such as magit-gerrit for Emacs) take advantage of this, allowing for integration with your editor of choice.

The replication plugin for gerrit allows gerrit to automatically push particular references paths to another git repository.

Here is our replication.config (note that in this snippet and all the rest of the ones in this post, hard-codings that you would have to adjust are in all caps and surrounded by angled brackets <LIKETHIS>).

[remote "github"]
url = git@github.com:<YOURGITHUBACCOUNT>/${name}.git
push = +refs/heads/*:refs/heads/*
projects = ^(?!All-Projects).*

All-Projects is a fake project gerrit uses from which to inherit permissions for other projects, so it should be excluded by your regex.

It is important that you only tell the replication plugin to push refs under "refs/heads", otherwise it will try to replicate out the public internet both pre-reviewed commits (and the comments on the commits themselves).

That takes care of pushing any changes successfully reviewed and merged up to github, but we still need to get the repositories into our gerrit server in the first place and keep the projects updated.

You will need a system account with its ssh authorized by the gerrit server to create new projects and to keep those projects up to date, along with a directory in which to store a copy of the repositories that can be manipulated with standard git. Belvedere has this on the same server as our gerrit instance, but that is by no means necessary.

In order to add a project on github to the gerrit instance, run this script in the aforementioned directory, with the GitHub URI as its sole argument:

#!/bin/bash

almostreponame=$(basename $1)
reponame=${almostreponame%.git}

git clone $1
cd $reponame
git remote add opengerrit ssh://<YOUROPENGERRITSERVER>:29418/$reponame
ssh -p 29418 localhost gerrit create-project --name $reponame git push --all opengerrit

Finally, in order to keep your gerrit server’s copies of the repos up to date, run the following script as the aforementioned user on a regular cron schedule: (make sure the user has access to write to /var/log/gerrit-update.log or the log file of your choosing).

#!/bin/bash

projects=($(ssh -p 29418 localhost 'gerrit ls-projects'| tr '\n' ' ' && echo ))

cd <DIRECTORY WHERE YOU PUT A COPY OF YOUR REPOS WITH THE SCRIPT ABOVE>

for project in ${projects[*]};
do
if [ -d $project ];
then
cd $project
branches=($(git branch --no-color --no-column |cut -c 3- | tr '\n' ' ' && echo))
for branch in ${branches[*]};
do
git pull opengerrit $branch:$branch
git pull --rebase origin $branch:$branch
done
git push --all opengerrit
cd ..
else
echo "missing $project directory" >> /var/log/gerrit-update.log
fi
done

This walks through the branches of the repository, rebases the merged changes in your gerrit server on top of the most recent updates in the public version, and then pushes the rebased updates to your gerrit server to try to keep the repo history as consistent as possible.

This setup allows developers to push to FLOSS repository upstream in the same way as internal repositories, while also removing the possibility of accidentally pushing proprietary code to the public internet. Belvedere is excited to have such a system set up so that we can begin to increase our FLOSS presence and help to improve some of the publicly available libraries that we use in production daily.