MultiGitRepository

From APIDesign

(Difference between revisions)

Revision as of 09:31, 13 April 2018

Using single Git repository is certainly more comfortable than working with multiple Git repositories. On the other hand, distributed development can hardly be performed in a single repository (unless you believe in a single Blockchain for the whole sourcecode on the planet). How to orchestrate multiple Git repositories to work together? That is the thousands dollar question various teams seek answer to! For example there was a talk at GeeCON 2017 in Prague about that by Robert Munteanu. Let's assume we have a project split into multiple Git repositories. What are the options?

1 Remember non-Distributed Version Control Systems?
2 Gates for Correctness
3 Single Integration Repository
4 Always Correct vs. Ultimately Correct
5 Single vs. Multi: Where's the difference?

Remember non-Distributed Version Control Systems?

There used to be times when people were afraid of distributed version control systems like Mercurial or Git. All the users of CVS or Subversion couldn't understand how one can develop and commit in parallel without integrating into the tip of the development branch! If each developer or team of developers has its own tip, where is the truth?

These days we know where the truth is: there is a master repository somewhere and whatever is the tip there, is the truth. There can of course be multiple repositories, people are free to fork GitHub repositories like crazy, and some may even agree that one of the forks is more important. Yet, unless the fork overtakes the original repository, the truth will always remain in the master repository.

The situation with multiple repositories isn't that different. The history repeats on a new level. It is just necessary to explain to the poor single Mercurial and Git repository users that there is nothing to be afraid of.

Gates for Correctness

Typical GitHub workflow uses pull requests and some integration with Travis or other form of ContinuousIntegration which is usually well integrated with the review tool. As soon as one creates a PR, the continuous builder runs the tests and marks the PR as valid or broken. This greatly contributes to the stability of the master branch - it is almost impossible to break it by merging in PRs.

On the other hand please note that before your PR gets merged it may contain as many broken (e.g. not fully correct) commits as you wish. It is quite common one makes changes to the system, pushes them on a branch of own repository fork, creates a PR just to find out that while the functionality is OK, there are other things that need to be polished (formatting and proper spacing being my favorite). One then adds few more commits to polish the non-semantical problems of the code.

What I'd like to point out is: It is absolutely OK to have commits which are broken if they get fixed before merging into master branch. We are going to transplant this observation to the MultiGitRepository case.

Single Integration Repository

Just like there is a the master branch in each Git repository where all the commits have to ultimately end up (be merged), there has to be such integration point in the MultiGitRepository scenario as well. That means there has to be a single integration repository which references all the other repositories and identifies their exact commits which were integrated.

One can use Git modules for that, but other possibilities that uniquely identify the changesets work as well (GraalVM is using tool called MX which keeps these references in a special file called suite.py). All that is important is to have a single version of the truth - a single place that uniquely and completely identifies all the source code split among all the repositories.

As in the single repository case, it is good to have a gate. An automated check that verifies with every PR to be merged into master branch of the integration repository that everything is still OK, still consistent. Such Travis or other ContinuousIntegration test checks out all the dependent repositories in appropriate revisions (they are stored somewhere in the integration repository) and runs the test. If it passes, the PR is eligible for being merged. That guarantees the master' branch of the integration repository is always correct.

What happens in the individual repositories meanwhile? may be your question. Well, anything. Things may get even broken there, but please note that was also the case in a single repository case. It could also contain broken commits - all that mattered was to fix them before integrating. The same applies to the MultiGitRepository case: all that matters is that before the changes from a single repository get integrated (which means to update the appropriate commit reference in the integration repository, create a PR and merge it into master branch of the integration repository), they are correct. But they have to be correct, as we have a gate in the integration repository which would refuse our PR otherwise!

Of course individual teams working on the non-integration repositories are encouraged to run tests and have their own gates. However such tests give just a hint, they they aren't the ultimate source of truth. Just like developers working on a branch of a single repository are supposed to execute tests before making commits, yet they cannot expect such tests to guarantee their code will be able to be merged without any changes into master branch. In the same way regardless what happens in your own repository, doesn't guarantee anything with respect to integration repository in the MultiGitRepository case.

Only when the final PR in the integration repository gets merged, one can claim that the we have new truth which just have moved forward.

Always Correct vs. Ultimately Correct

TBD

Single vs. Multi: Where's the difference?

TBD

Retrieved from "http://wiki.apidesign.org/wiki/MultiGitRepository"

@@ Line 21: / Line 21: @@
 Just like there is a the '''master''' branch in each [[Git]] repository where all the commits have to ultimately end up (be merged), there has to be such integration point in the [[MultiGitRepository]] scenario as well. That means there has to be a single integration repository which references all the other repositories and identifies their exact commits which were ''integrated''.
-One can use [[Git]] modules for that, but other possibilities that uniquely identify the changesets work as well ([[GraalVM]] is using tool called [[Mx]] which keeps these references in a special file called ''suite.py''). All that is important is to have a single version of the truth - a single place that uniquely and completely identifies all the source code split among all the repositories.
+One can use [[Git]] modules for that, but other possibilities that uniquely identify the changesets work as well ([[GraalVM]] is using tool called [[MX]] which keeps these references in a special file called ''suite.py''). All that is important is to have a single version of the truth - a single place that uniquely and completely identifies all the source code split among all the repositories.
+As in the single repository case, it is [[good]] to have a gate. An automated check that verifies with every [[PR]] to be merged into '''master'' branch of the integration repository that everything is still OK, still consistent. Such [[Travis]] or other [[ContinuousIntegration]] test checks out all the dependent repositories in appropriate revisions (they are stored somewhere in the integration repository) and runs the test. If it passes, the [[PR]] is eligible for being merged. That guarantees the '''master''' branch of the integration repository is always correct.
+''What happens in the individual repositories meanwhile?'' may be your question. Well, anything. Things may get even broken there, but please note that was also the case in a single repository case. It could also contain ''broken'' commits - all that mattered was to fix them before integrating. The same applies to the [[MultiGitRepository]] case: all that matters is that before the changes from a single repository get integrated (which means to update the appropriate commit reference in the integration repository, create a [[PR]] and merge it into '''master''' branch of the integration repository), they are correct. But they have to be correct, as we have a gate in the integration repository which would refuse our [[PR]] otherwise!
+Of course individual teams working on the non-integration repositories are encouraged to run tests and have their own gates. However such tests give just a hint, they they aren't the ultimate source of truth. Just like developers working on a branch of a single repository are supposed to execute tests before making commits, yet they cannot expect such tests to guarantee their code will be able to be merged without any changes into '''master''' branch. In the same way regardless what happens in your own repository, doesn't guarantee anything with respect to integration repository in the  [[MultiGitRepository]] case.
+Only when the final [[PR]] in the integration repository gets merged, one can claim that the we have new ''truth'' which just have moved forward.
 ==== Always Correct vs. Ultimately Correct ====

MultiGitRepository

From APIDesign

Revision as of 09:31, 13 April 2018

Contents

Remember non-Distributed Version Control Systems?

Gates for Correctness

Single Integration Repository

Always Correct vs. Ultimately Correct

Single vs. Multi: Where's the difference?

Views

Personal tools

blogs & look

Navigation

Search

Toolbox

MultiGitRepository

From APIDesign

Revision as of 09:31, 13 April 2018

Contents

Remember non-Distributed Version Control Systems?

Gates for Correctness

Single Integration Repository

Always Correct vs. Ultimately Correct

Single vs. Multi: Where's the difference?

Views

Personal tools

blogs & look

Navigation

Search

Toolbox

buy