JaroslavTulach: /* Appendix B: Who Shall Integrate it All? */ - 2018-05-22 07:53:26

Appendix B: Who Shall Integrate it All?

←Older revision Revision as of 07:53, 22 May 2018
Line 134: Line 134:
As horribly as the above sounds, it is no different to ''single repository'' case with many independent branches. When there is a massive parallel development on the branches, making sure the branches can be merged together, is tough task as well. Being an ''integration guy'' responsible for merging branches of teams who work on their own branches and completely ignore the other ones, would be as unwanted position as in the above ''multi integration guy'' case.
As horribly as the above sounds, it is no different to ''single repository'' case with many independent branches. When there is a massive parallel development on the branches, making sure the branches can be merged together, is tough task as well. Being an ''integration guy'' responsible for merging branches of teams who work on their own branches and completely ignore the other ones, would be as unwanted position as in the above ''multi integration guy'' case.
-
Luckily there is a simple and well known solution that we use daily while working on ''single repositories'': every developer (or at least a team) is responsible for ensuring own pull requests get merged into the integration point - e.g. '''master''' branch. Focusing on merging one branch (e.g. single variable) into relatively fixed target eliminates the [[NP-Complete]]ness just like the [[RangeDependenciesAnalysed#Avoiding_NP-Completeness|recording compile time dependencies]] does it for the module configuration problem. There is no reason to not use the same roles in the [[MultiGitRepository]] setup. As such:
+
Luckily there is a simple and well known solution that we use daily while working on ''single repositories'': every developer (or at least a team) is responsible for ensuring own pull requests get merged into the integration point - e.g. '''master''' branch. Focusing on merging one branch (e.g. single variable) into relatively fixed target eliminates the [[NP-Complete]]ness just like [[RangeDependenciesAnalysed#Avoiding_NP-Completeness|recording compile time dependencies]] does it for the module configuration problem. There is no reason to not use the same roles in the [[MultiGitRepository]] setup. As such:
Every developer should be his own ''integration guy''! It is every developer's responsibility to make sure his changes eventually end up in the ''integration repository''.
Every developer should be his own ''integration guy''! It is every developer's responsibility to make sure his changes eventually end up in the ''integration repository''.

JaroslavTulach: /* Appendix B: Who Shall Integrate it All? */ - 2018-05-22 07:53:07

Appendix B: Who Shall Integrate it All?

←Older revision Revision as of 07:53, 22 May 2018
Line 134: Line 134:
As horribly as the above sounds, it is no different to ''single repository'' case with many independent branches. When there is a massive parallel development on the branches, making sure the branches can be merged together, is tough task as well. Being an ''integration guy'' responsible for merging branches of teams who work on their own branches and completely ignore the other ones, would be as unwanted position as in the above ''multi integration guy'' case.
As horribly as the above sounds, it is no different to ''single repository'' case with many independent branches. When there is a massive parallel development on the branches, making sure the branches can be merged together, is tough task as well. Being an ''integration guy'' responsible for merging branches of teams who work on their own branches and completely ignore the other ones, would be as unwanted position as in the above ''multi integration guy'' case.
-
Luckily there is a simple and well known solution that we use daily while working on ''single repositories'': every developer (or at least a team) is responsible for ensuring own pull requests get merged into the integration point - e.g. '''master''' branch. Focusing on merging one branch (e.g. single variable) into relatively fixed target eliminates the [[NP-Complete]]ness just like the [[RangeDependenciesAnalysed]] show case does it for the module configuration problem. There is no reason to not use the same roles in the [[MultiGitRepository]] setup. As such:
+
Luckily there is a simple and well known solution that we use daily while working on ''single repositories'': every developer (or at least a team) is responsible for ensuring own pull requests get merged into the integration point - e.g. '''master''' branch. Focusing on merging one branch (e.g. single variable) into relatively fixed target eliminates the [[NP-Complete]]ness just like the [[RangeDependenciesAnalysed#Avoiding_NP-Completeness|recording compile time dependencies]] does it for the module configuration problem. There is no reason to not use the same roles in the [[MultiGitRepository]] setup. As such:
Every developer should be his own ''integration guy''! It is every developer's responsibility to make sure his changes eventually end up in the ''integration repository''.
Every developer should be his own ''integration guy''! It is every developer's responsibility to make sure his changes eventually end up in the ''integration repository''.

JaroslavTulach at 07:34, 22 May 2018 - 2018-05-22 07:34:57

←Older revision Revision as of 07:34, 22 May 2018
Line 99: Line 99:
| '''master''' branch is bug free
| '''master''' branch is bug free
| '''master''' branch in the ''integration repository'' and all referenced ''slave repository'' versions are bug free
| '''master''' branch in the ''integration repository'' and all referenced ''slave repository'' versions are bug free
 +
|-
 +
| integration guy
 +
| every developer is responsible for merging own changes to '''master''' branch
 +
| every developer is responsible for merging reference to own changes to ''integration repository''
|}
|}
Line 115: Line 119:
The advantage of using '''master''' branch for collaborative development is the simplicity of producing daily builds ready for quality checks or publishing to [[Maven]] snapshot repository. Most of the projects do that for '''master''' branch. In the [[MultiGitRepository]] setup each ''slave repository'' gets such infrastructure automatically. Again, these are just temporary builds, not fully correct from the global ''integration repository'' perspective, but for many usages they are good enough: many teams use such temporary bits for manual sanity checks to be sure everything is OK, before they include their work (e.g. id of their repository latest '''master''' commit) into the ''integration repository''.
The advantage of using '''master''' branch for collaborative development is the simplicity of producing daily builds ready for quality checks or publishing to [[Maven]] snapshot repository. Most of the projects do that for '''master''' branch. In the [[MultiGitRepository]] setup each ''slave repository'' gets such infrastructure automatically. Again, these are just temporary builds, not fully correct from the global ''integration repository'' perspective, but for many usages they are good enough: many teams use such temporary bits for manual sanity checks to be sure everything is OK, before they include their work (e.g. id of their repository latest '''master''' commit) into the ''integration repository''.
 +
 +
==== Appendix B: Who Shall Integrate it All? ====
 +
 +
My colleague Stefan wrote: ''The task of integrating the individual repositories into the integration repository on a daily/weekly base is another issue. If the integration tests just work fine, there is no issue to worry about. But if the integration tests fail, who is responsible for fixing the integration?''
 +
 +
Who's responsible for your pull request to be integrated these days (in a ''single repository'' setup)? You. The same way you shall be responsible for making sure the '''eventually correct''' changes will become part of the truth - e.g. they get integrated into the ''integration repository'' (in the [[MultiGitRepository]] setup).
 +
 +
Such propagation can be automated. There can be an automatic continuous job which brings the most recent '''master''' commit from each ''slave repository'' into the ''integration'' one. Most of the time this works fine, however there are situation when it doesn't and a manual interaction is required:
 +
* a conflicting behavior between ''slave repositories'' which causes integration tests to fail
 +
* a global change that needs to be orchestrated among multiple ''slave repositories''
 +
 +
One idea is to ''name a person to fix the integration issues''. However such ''integration guy'' is going to have quite a hard time. Orchestrating changes coming from different teams that focus only on their own part maybe very challenging. If the teams are constantly submitting more code, moving ahead and potentially increasing incompatibilities, finding commits that work together may become almost impossible. Just like [[LibraryReExportIsNPComplete|finding a configuration of modules to work with others]] is known to be [[NP-Complete]], finding the versions of ''slave repositories'' that work together is [[NP-Complete]] as well.
 +
 +
As horribly as the above sounds, it is no different to ''single repository'' case with many independent branches. When there is a massive parallel development on the branches, making sure the branches can be merged together, is tough task as well. Being an ''integration guy'' responsible for merging branches of teams who work on their own branches and completely ignore the other ones, would be as unwanted position as in the above ''multi integration guy'' case.
 +
 +
Luckily there is a simple and well known solution that we use daily while working on ''single repositories'': every developer (or at least a team) is responsible for ensuring own pull requests get merged into the integration point - e.g. '''master''' branch. Focusing on merging one branch (e.g. single variable) into relatively fixed target eliminates the [[NP-Complete]]ness just like the [[RangeDependenciesAnalysed]] show case does it for the module configuration problem. There is no reason to not use the same roles in the [[MultiGitRepository]] setup. As such:
 +
 +
Every developer should be his own ''integration guy''! It is every developer's responsibility to make sure his changes eventually end up in the ''integration repository''.

JaroslavTulach: /* Single Integration Repository */ - 2018-05-04 11:07:27

Single Integration Repository

←Older revision Revision as of 11:07, 4 May 2018
Line 30: Line 30:
As in the single repository case, it is [[good]] to have a gate. An automated check that verifies with every [[PR]] to be merged into '''master''' branch of the ''integration repository'' that everything is still OK, still consistent. Such [[Travis]] or other [[ContinuousIntegration]] test checks out all the dependent repositories at their appropriate revisions (they are stored somewhere in the ''integration repository'') and runs the test. If it passes, the [[PR]] is eligible for being merged. That guarantees the '''master''' branch of the ''integration repository'' is always correct.
As in the single repository case, it is [[good]] to have a gate. An automated check that verifies with every [[PR]] to be merged into '''master''' branch of the ''integration repository'' that everything is still OK, still consistent. Such [[Travis]] or other [[ContinuousIntegration]] test checks out all the dependent repositories at their appropriate revisions (they are stored somewhere in the ''integration repository'') and runs the test. If it passes, the [[PR]] is eligible for being merged. That guarantees the '''master''' branch of the ''integration repository'' is always correct.
-
''What happens in the individual repositories meanwhile?'' may be your question. Well, anything. Things may get even ''broken'' there, but please note that was also the case in the single repository setup. There could also be ''broken'' commits meanwhile - all that mattered was to fix them before ''integrating''. The same applies to the [[MultiGitRepository]] case: all that matters is that before the changes from a single repository get ''integrated'' (which means to update the appropriate commit references in the ''integration repository'', create a [[PR]] and merge it into '''master''' branch of the integration repository), they are correct. But they have to be correct, as we have a gate in the ''integration repository'' which would refuse our [[PR]] otherwise!
+
''What happens in the individual repositories meanwhile?'' may be your question. Well, anything. Things may get even ''broken'' (from a [[#Appendix_A:_Local_Collaboration_Area|global perspective]]) there, but please note that was also the case in the single repository setup. There could also be ''broken'' commits meanwhile - all that mattered was to fix them before ''integrating''. The same applies to the [[MultiGitRepository]] case: all that matters is that before the changes from a single repository get ''integrated'' (which means to update the appropriate commit references in the ''integration repository'', create a [[PR]] and merge it into '''master''' branch of the integration repository), they are correct. But they have to be correct, as we have a gate in the ''integration repository'' which would refuse our [[PR]] otherwise!
Of course individual teams working on the non-integration ''slave repositories'' are encouraged to run tests and have their own gates. However such tests give just a hint, they aren't the ultimate source of truth. Just like developers working on a branch of a single repository are adviced to execute tests before making commits, yet they cannot expect such tests to guarantee their code will be able to be merged without any changes into '''master''' branch. In the same way regardless what happens in your ''slave repository'', nothing can be guaranteed with respect to ''integration repository'' in the [[MultiGitRepository]] case.
Of course individual teams working on the non-integration ''slave repositories'' are encouraged to run tests and have their own gates. However such tests give just a hint, they aren't the ultimate source of truth. Just like developers working on a branch of a single repository are adviced to execute tests before making commits, yet they cannot expect such tests to guarantee their code will be able to be merged without any changes into '''master''' branch. In the same way regardless what happens in your ''slave repository'', nothing can be guaranteed with respect to ''integration repository'' in the [[MultiGitRepository]] case.

JaroslavTulach at 11:01, 4 May 2018 - 2018-05-04 11:01:16

←Older revision Revision as of 11:01, 4 May 2018
Line 87: Line 87:
| branches in the repository
| branches in the repository
| even '''master''' branches in ''slave'' repositories
| even '''master''' branches in ''slave'' repositories
 +
|-
 +
| origin of team sanity builds
 +
| a dedicated branch with full featured [[ContinuousIntegration]]
 +
| best to use '''master''' branch a ''slave'' repository (has [[ContinuousIntegration]] by default)
|-
|-
| ultimate gate
| ultimate gate
Line 109: Line 113:
In some sense the '''master''' branch of a ''slave repository'' is another temporary ''collaboration area''. When you need to collaborate in the single repository setup, you create a branch and let multiple members of a team commit into such branch. Only when the work is done, it gets integrated into the final commit destination - e.g. '''master''' branch of that repository. However in case of [[MultiGitRepository]] setup, a team may easily collaborate in the '''master''' branch of their repository. Until a reference to the latest commit is integrated into the ''integration repository'' all such work is just a temporary ''collaboration''.
In some sense the '''master''' branch of a ''slave repository'' is another temporary ''collaboration area''. When you need to collaborate in the single repository setup, you create a branch and let multiple members of a team commit into such branch. Only when the work is done, it gets integrated into the final commit destination - e.g. '''master''' branch of that repository. However in case of [[MultiGitRepository]] setup, a team may easily collaborate in the '''master''' branch of their repository. Until a reference to the latest commit is integrated into the ''integration repository'' all such work is just a temporary ''collaboration''.
-
The advantage of using '''master''' branch for collaborative development is the simplicity of producing daily builds ready for quality checks or publishing to [[Maven]] snapshot repository. Most of the projects do that for '''master''' branch. In the [[MultiGitRepository]] setup each ''slave repository'' gets such infrastructure automatically. Again, these are just temporary builds, not fully correct from the global ''integration repository'' perspective, but for many usages (like for our [[GraalJS]] team) they are good enough.
+
 
 +
The advantage of using '''master''' branch for collaborative development is the simplicity of producing daily builds ready for quality checks or publishing to [[Maven]] snapshot repository. Most of the projects do that for '''master''' branch. In the [[MultiGitRepository]] setup each ''slave repository'' gets such infrastructure automatically. Again, these are just temporary builds, not fully correct from the global ''integration repository'' perspective, but for many usages they are good enough: many teams use such temporary bits for manual sanity checks to be sure everything is OK, before they include their work (e.g. id of their repository latest '''master''' commit) into the ''integration repository''.

JaroslavTulach at 10:44, 4 May 2018 - 2018-05-04 10:44:59

←Older revision Revision as of 10:44, 4 May 2018
Line 1: Line 1:
-
Using single [[Git]] repository is certainly more comfortable than working with multiple [[Git]] repositories. On the other hand, [[distributed development]] can hardly be performed in a single repository (unless you believe in a single [[Blockchain]] for the whole sourcecode on the planet). How to orchestrate multiple [[Git]] repositories to work together? That is the thousands dollar question various teams seek answer to! For example there was a talk at [[GeeCON]] 2017 in [[Prague]] about that by [https://2017.geecon.cz/schedule-day2/ Robert Munteanu]. Let's assume we have a project split into multiple [[Git]] repositories. What are the options?
+
Using single [[Git]] repository is certainly more comfortable than working with multiple [[Git]] repositories. On the other hand, [[distributed development]] can hardly be performed in a single repository (unless you believe in a single [[Blockchain]] for the whole sourcecode on the planet). How to orchestrate multiple [[Git]] repositories to work together? That is the thousands dollar question various teams seek answer to! For example there was a talk at [[GeeCON]] 2017 in [[Prague]] about that by [https://twitter.com/rombert/status/989069576825647105 Robert Munteanu]:
 +
 
 +
 
 +
{{#ev:youtube|5-kB0ux5kBA}}
 +
 
 +
Let's assume we have a project split into multiple [[Git]] repositories. What are the options?
==== Remember non-Distributed Version Control Systems? ====
==== Remember non-Distributed Version Control Systems? ====
Line 78: Line 83:
| done on branches or in forks
| done on branches or in forks
| anything done in ''slave'' repositories before '''master''' in the ''integration repository'' references it
| anything done in ''slave'' repositories before '''master''' in the ''integration repository'' references it
 +
|-
 +
| collaborative areas
 +
| branches in the repository
 +
| even '''master''' branches in ''slave'' repositories
|-
|-
| ultimate gate
| ultimate gate
Line 89: Line 98:
Don't be afraid to work in [[MultiGitRepository]] setup. With single ''integration repository'' it is not complicated at all!
Don't be afraid to work in [[MultiGitRepository]] setup. With single ''integration repository'' it is not complicated at all!
 +
 +
==== Appendix A: Local Collaboration Area ====
 +
 +
[https://twitter.com/rombert/status/989069576825647105 Robert commented] that having broken '''master''' branch in a ''slave repository'' is bad for collaboration. That is indeed true! When I wrote about ''broken'', I meant ''broken'' from a global perspective, not from a perspective of the ''slave repository''.
 +
 +
Let's envision a team using the ''slave repository'' approach that develops for example [[GraalJS]]. Then there could be a [[GraalVM]] ''integration repository'' that includes the [[GraalJS]] one and integrates it together with other [[language]]s. In such situation a commit in [[GraalJS]] may break the [[TruffleInteropUsability|interop]] functionality between the [[GraalJS]] and some other language. But one will not know for sure until the change gets integrated into the [[GraalVM]] ''integration repository''.
 +
 +
From the overall perspective the ''slave repository'' may get into a ''broken'' state. However from a local perspective, there is no reason to have broken '''master''' branch in any repository, right? There are tests and we develop via PRs and merge only when everything is (locally) green, right?
 +
 +
In some sense the '''master''' branch of a ''slave repository'' is another temporary ''collaboration area''. When you need to collaborate in the single repository setup, you create a branch and let multiple members of a team commit into such branch. Only when the work is done, it gets integrated into the final commit destination - e.g. '''master''' branch of that repository. However in case of [[MultiGitRepository]] setup, a team may easily collaborate in the '''master''' branch of their repository. Until a reference to the latest commit is integrated into the ''integration repository'' all such work is just a temporary ''collaboration''.
 +
 +
The advantage of using '''master''' branch for collaborative development is the simplicity of producing daily builds ready for quality checks or publishing to [[Maven]] snapshot repository. Most of the projects do that for '''master''' branch. In the [[MultiGitRepository]] setup each ''slave repository'' gets such infrastructure automatically. Again, these are just temporary builds, not fully correct from the global ''integration repository'' perspective, but for many usages (like for our [[GraalJS]] team) they are good enough.

JaroslavTulach: /* The Scalability Problem */ - 2018-04-25 07:36:03

The Scalability Problem

←Older revision Revision as of 07:36, 25 April 2018
Line 53: Line 53:
-
The holy grail is in designed system around [[BackwardCompatible]] [[API]]s. Then most of the commits are just locally important and you can save a lot of the testing. Enough to run limited sanity tests, verify binary compatibility with [[SigTest]] and delay the throughout checks for later. The system has to become '''eventually correct''' before the changes get merged into '''master''' branch of the ''integration repository'', right?
+
The nirvana lays in a system properly designed around [[BackwardCompatible]] [[API]]s. Then most of the commits are just locally important and you can save a lot of the testing. Enough to run limited sanity tests, verify binary compatibility with [[SigTest]] and delay the throughout checks for later. The system has to become '''eventually correct''' before the changes get merged into '''master''' branch of the ''integration repository'', right?
In a system with [[API]]s designed for [[distributed development]], going though overall testing for each commit is clearly a waste of resources.
In a system with [[API]]s designed for [[distributed development]], going though overall testing for each commit is clearly a waste of resources.
-
 
-
 
==== Single vs. Multi: Where's the difference? ====
==== Single vs. Multi: Where's the difference? ====

JaroslavTulach: /* Tight Coupling */ - 2018-04-25 07:32:33

Tight Coupling

←Older revision Revision as of 07:32, 25 April 2018
Line 43: Line 43:
The '''always correct''' approach may be a better choice if the repositories are separate, but their [[proximity]] is close. For example the repositories may be split for licensing reasons. Then it is very common a technical change cross-cuts both of these repositories and one needs to integrate both parts of the change at once. Given the nature of such tight coupling, the '''always correct''' integration policy seems reduce hassles on balance compared to the '''eventually correct''' approach. It is better to have longer gates times to verify each commit properly, than force people to do that manually (which they would have to do almost every time anyway).
The '''always correct''' approach may be a better choice if the repositories are separate, but their [[proximity]] is close. For example the repositories may be split for licensing reasons. Then it is very common a technical change cross-cuts both of these repositories and one needs to integrate both parts of the change at once. Given the nature of such tight coupling, the '''always correct''' integration policy seems reduce hassles on balance compared to the '''eventually correct''' approach. It is better to have longer gates times to verify each commit properly, than force people to do that manually (which they would have to do almost every time anyway).
-
This nicely shows the importance of [[API]] for economy of your project. If you have two repositories isolated by [[BackwardCompatible]] [[API]], you can start practicing [[distributed development]] - e.g. disconnect the repositories a bit by using '''eventually correct''' approach. If you don't bother with maintaining [[BackwardCompatible]] [[API]], you immediately increase coupling and you have to treat them as one - using '''always correct''' approach, preferably.
+
This nicely shows the importance of [[API]] for economy of your project. If you have two repositories isolated by [[BackwardCompatible]] [[API]], you can start practicing [[distributed development]] - e.g. disconnect the repositories a bit by using '''eventually correct''' approach. If you don't bother with maintaining [[BackwardCompatible]] [[API]], you immediately increase coupling and you have to treat them as one - burn [[CPU]] cycles on the '''always correct''' verifications.
==== The Scalability Problem ====
==== The Scalability Problem ====

JaroslavTulach: /* Single vs. Multi: Where's the difference? */ - 2018-04-24 05:56:29

Single vs. Multi: Where's the difference?

←Older revision Revision as of 05:56, 24 April 2018
Line 79: Line 79:
| temporary work
| temporary work
| done on branches or in forks
| done on branches or in forks
-
| anything done in ''slave'' repositories before ''integration repository'' references it
+
| anything done in ''slave'' repositories before '''master''' in the ''integration repository'' references it
|-
|-
| ultimate gate
| ultimate gate

JaroslavTulach: /* Single vs. Multi: Where's the difference? */ - 2018-04-24 05:56:09

Single vs. Multi: Where's the difference?

←Older revision Revision as of 05:56, 24 April 2018
Line 87: Line 87:
| bug free system
| bug free system
| '''master''' branch is bug free
| '''master''' branch is bug free
-
| '''master''' branch in the ''integration repository'' is bug free
+
| '''master''' branch in the ''integration repository'' and all referenced ''slave repository'' versions are bug free
|}
|}
Don't be afraid to work in [[MultiGitRepository]] setup. With single ''integration repository'' it is not complicated at all!
Don't be afraid to work in [[MultiGitRepository]] setup. With single ''integration repository'' it is not complicated at all!