LibraryReExportIsNPComplete
From APIDesign
(→Proof) |
|||
(42 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
- | + | <skin>monobook</skin> | |
- | + | Reusing libraries produced by others is essential aspect of [[DistributedDevelopment]]. It simplifies [[Time To Market]], it reduces long term cost of ownership and leads to creation of [[Good Technology|good technologies]]. However it does not come for free. Read details or directly jump to the [[LibraryReExportIsNPComplete#Implications|implications]] that shall improve your every day development habits. | |
+ | |||
+ | This page starts by describing a way to convert any [[3SAT]] problem to a solution of finding whether there is a way to satisfy all dependencies of a library in a repository of libraries. Thus proving that the later problem is [[wikipedia::NP-complete|NP-Complete]]. Then it describes the importance of such observations on our [[DistributedDevelopment|development practices]]. | ||
+ | |||
+ | There are similar observations for other module systems ([[RPM]] and [[Debian]], see the external references section), with almost identical proof. The only difference is that both [[RPM]] and [[Debian]] allow easy way to specify negation by use of ''obsolete'' directive (thus it is easy to map the [[3SAT]] formula). The unique feature of [[LibraryReExportIsNPComplete|this]] proof is that it does not need negation at all. Instead it deals with re-export of an [[API]]. As re-export of [[API]]s is quite common in software development, it brings implications of this kind of problem closer to reality. | ||
+ | |||
+ | == [[3SAT]] == | ||
The problem of satisfying a logic formula remains '''NP'''-complete even if all expressions are written in ''[[wikipedia::conjunctive normal form]]'' with 3 variables per clause (3-CNF), yielding the '''3SAT''' problem. This means the expression has the form: | The problem of satisfying a logic formula remains '''NP'''-complete even if all expressions are written in ''[[wikipedia::conjunctive normal form]]'' with 3 variables per clause (3-CNF), yielding the '''3SAT''' problem. This means the expression has the form: | ||
Line 11: | Line 17: | ||
where each <math>x_{ab}</math> is a variable <math>v_i</math> or a negation of a variable <math>\neg v_i</math>. Each variable <math>v_i</math> can appear multiple times in the expression. | where each <math>x_{ab}</math> is a variable <math>v_i</math> or a negation of a variable <math>\neg v_i</math>. Each variable <math>v_i</math> can appear multiple times in the expression. | ||
- | == | + | == Library Versioning Terminology == |
- | Let <math>A, B, C, ...</math> denote | + | Let <math>A, B, C, ...</math> denote various modules and their [[API]]s. |
- | Let <math> | + | Let <math>A_{1.0}, A_{1.1}, A_{1.7}, A_{1.11}</math> denote compatible versions of module <math>A</math>. |
- | Let <math> | + | Let <math>A_{1.0}, A_{2.0}, A_{3.1}</math> denote incompatible versions of module <math>A</math>. |
- | Let <math>A_{x.y} > B_{u.v}</math> denote the fact that version ''x.y'' of | + | Let <math>A_{x.y} > B_{u.v}</math> denote the fact that version ''x.y'' of module A depends on version ''u.v'' of module ''B''. |
- | Let <math>A_{x.y} \gg B_{u.v}</math> denote the fact that version ''x.y'' of | + | Let <math>A_{x.y} \gg B_{u.v}</math> denote the fact that version ''x.y'' of module A depends on version ''u.v'' of module ''B'' and that it re-exports module ''B'''s [[API]] to users of own [[API]]. |
Let ''Repository'' <math>R=(M,D)</math> be any set of modules with their various versions and their dependencies on other modules with or without re-export. | Let ''Repository'' <math>R=(M,D)</math> be any set of modules with their various versions and their dependencies on other modules with or without re-export. | ||
Line 27: | Line 33: | ||
Let C be a ''Configuration'' in a repository <math>R=(M,D)</math>, if | Let C be a ''Configuration'' in a repository <math>R=(M,D)</math>, if | ||
<math>C \subseteq M</math>, where following is satisfied: | <math>C \subseteq M</math>, where following is satisfied: | ||
- | :<math>\forall A_{x.y} \in C, \forall A_{x.y} \gg B_{u.v} \in D \Rightarrow \exists w >= v \wedge B_{u.w} \in C</math> | + | # '''re-exported dependency is satisfied''' with some compatible version: <math>\forall A_{x.y} \in C, \forall A_{x.y} \gg B_{u.v} \in D \Rightarrow \exists w >= v \wedge B_{u.w} \in C</math> |
- | :<math>\forall A_{x.y} \in C, \forall A_{x.y} > B_{u.v} \in D \Rightarrow \exists w >= v B_{u.w} \in C</math> | + | # '''each dependency is satisfied''' with some compatible version: <math>\forall A_{x.y} \in C, \forall A_{x.y} > B_{u.v} \in D \Rightarrow \exists w >= v \wedge B_{u.w} \in C</math> |
- | :Let there be two chains of re-exported dependencies <math>A_{p.q} \gg ... \gg B_{x.y}</math> and <math>A_{p.q} \gg ... \gg B_{u.v}</math> then <math>x = u \wedge y = v</math> | + | # '''each imported object has just one meaning''' for each importer: Let there be two chains of re-exported dependencies <math>A_{p.q} \gg ... \gg B_{x.y}</math> and <math>A_{p.q} \gg ... \gg B_{u.v}</math> then <math>x = u \wedge y = v</math> |
- | + | == Module Dependency Problem == | |
- | = | + | Let there be a repository <math>R=(M,D)</math> and a module <math>A \in M</math>. Does there exist a configuration <math>C</math> in the repository <math>R</math>, such that the module <math>A \in C</math>, e.g. the module can be enabled? |
- | Let there be [[ | + | == Conversion of [[3SAT]] to Module Dependencies Problem == |
+ | |||
+ | Let there be [[3SAT]] formula with with variables <math>v_1, ..., v_m</math> as defined above. | ||
Let's create a repository of modules <math>R</math>. For each variable <math>v_i</math> let's create two modules <math>M^i_{1.0}</math> and <math>M^i_{2.0}</math>, which are mutually incompatible and put them into repository <math>R</math>. | Let's create a repository of modules <math>R</math>. For each variable <math>v_i</math> let's create two modules <math>M^i_{1.0}</math> and <math>M^i_{2.0}</math>, which are mutually incompatible and put them into repository <math>R</math>. | ||
Line 41: | Line 49: | ||
For each formula | For each formula | ||
<math>(x_{i1} \vee x_{i2} \vee x_{i3})</math> | <math>(x_{i1} \vee x_{i2} \vee x_{i3})</math> | ||
- | let's create a module <math>F^i</math> that will have three compatible versions. Each of them will depend on one variable's module. In case the variable is used with negation, it will depend on version ''2.0'', otherwise on version ''1.0''. So for | + | let's create a module <math>F^i</math> that will have three compatible versions. Each of them will depend on one variable's module. In case the variable is used with negation, it will depend on version ''2.0'', otherwise on version ''1.0''. So for formula |
:<math>v_a \vee \neg v_b \vee \neg v_c</math> | :<math>v_a \vee \neg v_b \vee \neg v_c</math> | ||
we will get: | we will get: | ||
Line 47: | Line 55: | ||
:<math>F^i_{1.2} \gg M^b_{2.0}</math> | :<math>F^i_{1.2} \gg M^b_{2.0}</math> | ||
:<math>F^i_{1.3} \gg M^c_{2.0}</math> | :<math>F^i_{1.3} \gg M^c_{2.0}</math> | ||
- | All these modules and dependencies | + | All these modules and dependencies are added into repository <math>R</math> |
- | Now we will create a module <math>T_{1.0}</math> that depends all formulas: | + | Now we will create a module <math>T_{1.0}</math> that depends on all formulas: |
:<math>T_{1.0} \gg F^1_{1.0}</math> | :<math>T_{1.0} \gg F^1_{1.0}</math> | ||
:<math>T_{1.0} \gg F^2_{1.0}</math> | :<math>T_{1.0} \gg F^2_{1.0}</math> | ||
Line 56: | Line 64: | ||
and add this module as well as its dependencies into repository <math>R</math>. | and add this module as well as its dependencies into repository <math>R</math>. | ||
- | '''Claim''': There <math>\exists</math> a configuration | + | '''Claim''': There <math>\exists C</math> (a configuration) of repository <math>R</math> and <math>T_{1.0} \in C</math> <math>\Longleftrightarrow</math> there is a solution to the [[3SAT]] formula. |
== Proof == | == Proof == | ||
- | "<math>\Leftarrow</math>": Let's have an evaluation of each variable to either true or false that evaluates the whole [[ | + | "<math>\Leftarrow</math>": Let's have an evaluation of each variable to either true or false that evaluates the whole [[3SAT]] formula to true. Then |
:<math>C = \{ T_{1.0} \} \bigcup</math> | :<math>C = \{ T_{1.0} \} \bigcup</math> | ||
:<math>\{ M^i_{1.0} : v_i \} \bigcup \{M^i_{2.0} : \neg v_i \} \bigcup</math> | :<math>\{ M^i_{1.0} : v_i \} \bigcup \{M^i_{2.0} : \neg v_i \} \bigcup</math> | ||
:<math>\{ F^i_{1.1} : x_{i1} \} \bigcup \{ F^i_{1.2} : \neg x_{i1} \wedge x_{i2} \} \bigcup \{ F^i_{1.3} : \neg x_{i1} \wedge \neg x_{i2} \wedge x_{i3} \}</math> | :<math>\{ F^i_{1.1} : x_{i1} \} \bigcup \{ F^i_{1.2} : \neg x_{i1} \wedge x_{i2} \} \bigcup \{ F^i_{1.3} : \neg x_{i1} \wedge \neg x_{i2} \wedge x_{i3} \}</math> | ||
- | It is clear from the definition that each <math>M^i</math> and <math>F^i</math> can be in the <math>C</math> just in one version. Now it is important to ensure that each module is present always at least in one version. This is easy for <math>M^i</math> as its <math>v_i</math> needs to be true or false, and that means one of <math>M^i_{1.0}</math> or <math>M^i_{2.0}</math> will be included. Can there be a <math>F^i</math> which is not included? Only if <math>\neg x_{i1} \wedge \neg x_{i2} \wedge \neg x_{i3}</math> but that would mean the whole or would evaluate to false and as a result also the [[ | + | It is clear from the definition that each <math>M^i</math> and <math>F^i</math> can be in the <math>C</math> just in one version. Now it is important to ensure that each module is present always at least in one version. This is easy for <math>M^i</math> as its <math>v_i</math> needs to be true or false, and that means one of <math>M^i_{1.0}</math> or <math>M^i_{2.0}</math> will be included. Can there be a <math>F^i</math> which is not included? Only if <math>\neg x_{i1} \wedge \neg x_{i2} \wedge \neg x_{i3}</math> but that would mean the whole ''3-or'' would evaluate to false and as a result also the [[3SAT]] formula would evaluate to false. This means that dependencies of <math>T_{1.0}</math> on <math>F^i</math> modules are satisfied. Are also dependencies of every <math>F^i_{1.q}</math> satisfied? From all the three versions, there is just one <math>F^i_{1.q}</math>, the one its <math>x_{iq}</math> evaluates to true. However <math>x_{iq}</math> can either be without negation, and as such <math>F^i_{1.q}</math> depends on <math>M^j_{1.0}</math> which is included as <math>v_j</math> is true. Or <math>x_{iq}</math> contains negation, and as such <math>F^i_{1.q}</math> depends on <math>M^j_{2.0}</math> which is included as <math>v_j</math> is false. |
+ | |||
+ | "<math>\Rightarrow</math>": Let's have a <math>C</math> configuration satisfies all dependencies of <math>T_{1.0}</math>. Can we also find positive valuation of [[3SAT]] formula? | ||
+ | |||
+ | For <math>i</math>-th ''3-or'' there is <math>T_{1.0} \gg F^i_{1.0}</math> dependency which is satisfied. That means <math>F^i_{1.1} \in C \vee F^i_{1.2} \in C \vee F^i_{1.3} \in C</math> - at least one version of <math>F^i</math> module is present in the configuration. The one <math>F^i</math> that has the satisfied dependency reexports <math>M^j_{1.0}</math> (which means <math>v_j = true</math>) or <math>M^j_{2.0}</math> (which means <math>v_j = false</math>). Anyway each <math>i</math> ''3-or'' evaluates to <math>true</math>. | ||
+ | |||
+ | The only remaining question is whether a <math>C</math> configuration can force truth variable <math>v_j</math> to be true in one ''3-or'' and false in another. However that would mean that there is re-export via <math>T_{1.0} \gg F^i_{1.x} \gg M^j_{1.0}</math> and also another one via <math>T_{1.0} \gg F^p_{1.u} \gg M^j_{2.0}</math>. However those two ''chain of dependencies'' ending in different versions of <math>M^j</math> cannot be in one <math>C</math> as that breaks the last condition of configuration definition (each imported object has just one meaning). Thus each <math>M^j</math> is represented only by one version and each <math>v_j</math> is evaluated either to true or false, but never both. | ||
+ | |||
+ | The [[3SAT]] formula's evaluation based on the configuration <math>C</math> is consistent and satisfies the formula. | ||
+ | |||
+ | '''qed'''. | ||
+ | |||
+ | == Polemics == | ||
+ | |||
+ | One of the critiques raised during the LtU review (linked in external sources) is that the kind of situation cannot happen in practise. Surprisingly it can. [[OSGi]] and its [[RangeDependencies]] lead naturally into [[NP-Complete]] problems. [[RangeDependencies|Read more]]... | ||
+ | |||
+ | == Implications == | ||
+ | |||
+ | If there is a [[Module system|repository of modules]] in various (incompatible) versions, with mutual dependencies and re-export of their [[API]]s, then deciding whether some of them can be enabled is [[wikipedia::NP-complete|NP-complete]] problem. | ||
+ | |||
+ | As [[wikipedia::NP-complete|NP-complete]] problems are hard to solve, it is usually our best desire to avoid them in real life situations. What does that mean in case one decides to practise [[DistributedDevelopment]] (and it is [[TheAPIBook|inevitable that software for 21st century needs]] this development style)? If you want to avoid headaches with finding the right configuration of various version of the libraries that allows to execute them together, then stick with following simple rules. | ||
+ | |||
+ | ==== Be Compatible! ==== | ||
+ | |||
+ | If you develop your own libraries in [[BackwardCompatibility|backward compatible]] way, you can always select the most recent version of each library. That is the configuration you are looking for. It is easy to find (obviously) and also it is the most desirable, as it delivers the most modern features and bugfixes that users of such libraries want. | ||
+ | |||
+ | ==== Reuse with Care! ==== | ||
+ | |||
+ | If you happen to reuse libraries (and you should because reuse lowers [[Time To Market]], just like it did for me when I was publishing my first [[First Amoeba Video|animated movie]]), then choose such libraries that can be trusted to [[evolution|evolve]] compatibly. | ||
+ | |||
+ | ==== Hide Incompatibilities! ==== | ||
+ | |||
+ | If you happen to reuse a library that cannot be trusted to keep its [[BackwardCompatibility]], then do whatever you can to not re-export its [[API]]s! This has been discussed in [[Chapter 10]], Cooperating with Other [[API]]s, but in short: If you hide such library for internal use and do not export any of its interfaces, you can use whatever version of library you want (even few years older) and nobody shall notice. Moreover in many [[module system]]s there can even be multiple versions of the same library in case they are not re-exported. | ||
+ | |||
+ | ==== Explicit Re-export ==== | ||
+ | |||
+ | Looks like there is a way to eliminate the NP-Completeness by disabling implicit re-export. See [[LibraryWithoutImplicitExportIsPolynomial]]. However this works only in a system with standardized versioning policy | ||
+ | and without use of [[RangeDependencies]]. | ||
+ | |||
+ | == Conclusion == | ||
+ | |||
+ | Avoid complexities and [[wikipedia::NP-complete|NP-complete]] problems. Learn to develop in [[BackwardCompatibility|backward compatible]] way. Reading [[TheAPIBook]] is a perfect entry point into such compatible software design of the 21st century. | ||
+ | |||
+ | == External Links == | ||
+ | |||
+ | # discussion at [http://lambda-the-ultimate.org/node/3588 Lambda the Ultimate] | ||
+ | # LtU guys pointed out that the proof has already been published: [http://people.debian.org/~dburrows/model.pdf D. Burrows, Modelling and Resolving Software Dependencies] | ||
+ | # [[Equinox]] is said to [http://blog.bjhargrave.com/2008/03/equinox-and-google-summer-of-code.html use SAT4J solver] | ||
+ | # EDOS Project seems to find similar proof: See section 3.2 in [http://www.edos-project.org/xwiki/bin/download/Main/D2-1/edos-wp2d1.pdf edos-wp2d1.pdf] | ||
+ | |||
+ | <!-- Added correct link to D. Burrows article --> |
Current revision
Reusing libraries produced by others is essential aspect of DistributedDevelopment. It simplifies Time To Market, it reduces long term cost of ownership and leads to creation of good technologies. However it does not come for free. Read details or directly jump to the implications that shall improve your every day development habits.
This page starts by describing a way to convert any 3SAT problem to a solution of finding whether there is a way to satisfy all dependencies of a library in a repository of libraries. Thus proving that the later problem is NP-Complete. Then it describes the importance of such observations on our development practices.
There are similar observations for other module systems (RPM and Debian, see the external references section), with almost identical proof. The only difference is that both RPM and Debian allow easy way to specify negation by use of obsolete directive (thus it is easy to map the 3SAT formula). The unique feature of this proof is that it does not need negation at all. Instead it deals with re-export of an API. As re-export of APIs is quite common in software development, it brings implications of this kind of problem closer to reality.
Contents |
3SAT
The problem of satisfying a logic formula remains NP-complete even if all expressions are written in wikipedia::conjunctive normal form with 3 variables per clause (3-CNF), yielding the 3SAT problem. This means the expression has the form:
- ...
where each x_{ab} is a variable v_{i} or a negation of a variable . Each variable v_{i} can appear multiple times in the expression.
Library Versioning Terminology
Let A,B,C,... denote various modules and their APIs.
Let A_{1.0},A_{1.1},A_{1.7},A_{1.11} denote compatible versions of module A.
Let A_{1.0},A_{2.0},A_{3.1} denote incompatible versions of module A.
Let A_{x.y} > B_{u.v} denote the fact that version x.y of module A depends on version u.v of module B.
Let denote the fact that version x.y of module A depends on version u.v of module B and that it re-exports module B's API to users of own API.
Let Repository R = (M,D) be any set of modules with their various versions and their dependencies on other modules with or without re-export.
Let C be a Configuration in a repository R = (M,D), if , where following is satisfied:
- re-exported dependency is satisfied with some compatible version:
- each dependency is satisfied with some compatible version:
- each imported object has just one meaning for each importer: Let there be two chains of re-exported dependencies and then
Module Dependency Problem
Let there be a repository R = (M,D) and a module . Does there exist a configuration C in the repository R, such that the module , e.g. the module can be enabled?
Conversion of 3SAT to Module Dependencies Problem
Let there be 3SAT formula with with variables v_{1},...,v_{m} as defined above.
Let's create a repository of modules R. For each variable v_{i} let's create two modules and , which are mutually incompatible and put them into repository R.
For each formula let's create a module F^{i} that will have three compatible versions. Each of them will depend on one variable's module. In case the variable is used with negation, it will depend on version 2.0, otherwise on version 1.0. So for formula
we will get:
All these modules and dependencies are added into repository R
Now we will create a module T_{1.0} that depends on all formulas:
- ...
and add this module as well as its dependencies into repository R.
Claim: There (a configuration) of repository R and there is a solution to the 3SAT formula.
Proof
"": Let's have an evaluation of each variable to either true or false that evaluates the whole 3SAT formula to true. Then
It is clear from the definition that each M^{i} and F^{i} can be in the C just in one version. Now it is important to ensure that each module is present always at least in one version. This is easy for M^{i} as its v_{i} needs to be true or false, and that means one of or will be included. Can there be a F^{i} which is not included? Only if but that would mean the whole 3-or would evaluate to false and as a result also the 3SAT formula would evaluate to false. This means that dependencies of T_{1.0} on F^{i} modules are satisfied. Are also dependencies of every satisfied? From all the three versions, there is just one , the one its x_{iq} evaluates to true. However x_{iq} can either be without negation, and as such depends on which is included as v_{j} is true. Or x_{iq} contains negation, and as such depends on which is included as v_{j} is false.
"": Let's have a C configuration satisfies all dependencies of T_{1.0}. Can we also find positive valuation of 3SAT formula?
For i-th 3-or there is dependency which is satisfied. That means - at least one version of F^{i} module is present in the configuration. The one F^{i} that has the satisfied dependency reexports (which means v_{j} = true) or (which means v_{j} = false). Anyway each i 3-or evaluates to true.
The only remaining question is whether a C configuration can force truth variable v_{j} to be true in one 3-or and false in another. However that would mean that there is re-export via and also another one via . However those two chain of dependencies ending in different versions of M^{j} cannot be in one C as that breaks the last condition of configuration definition (each imported object has just one meaning). Thus each M^{j} is represented only by one version and each v_{j} is evaluated either to true or false, but never both.
The 3SAT formula's evaluation based on the configuration C is consistent and satisfies the formula.
qed.
Polemics
One of the critiques raised during the LtU review (linked in external sources) is that the kind of situation cannot happen in practise. Surprisingly it can. OSGi and its RangeDependencies lead naturally into NP-Complete problems. Read more...
Implications
If there is a repository of modules in various (incompatible) versions, with mutual dependencies and re-export of their APIs, then deciding whether some of them can be enabled is NP-complete problem.
As NP-complete problems are hard to solve, it is usually our best desire to avoid them in real life situations. What does that mean in case one decides to practise DistributedDevelopment (and it is inevitable that software for 21st century needs this development style)? If you want to avoid headaches with finding the right configuration of various version of the libraries that allows to execute them together, then stick with following simple rules.
Be Compatible!
If you develop your own libraries in backward compatible way, you can always select the most recent version of each library. That is the configuration you are looking for. It is easy to find (obviously) and also it is the most desirable, as it delivers the most modern features and bugfixes that users of such libraries want.
Reuse with Care!
If you happen to reuse libraries (and you should because reuse lowers Time To Market, just like it did for me when I was publishing my first animated movie), then choose such libraries that can be trusted to evolve compatibly.
Hide Incompatibilities!
If you happen to reuse a library that cannot be trusted to keep its BackwardCompatibility, then do whatever you can to not re-export its APIs! This has been discussed in Chapter 10, Cooperating with Other APIs, but in short: If you hide such library for internal use and do not export any of its interfaces, you can use whatever version of library you want (even few years older) and nobody shall notice. Moreover in many module systems there can even be multiple versions of the same library in case they are not re-exported.
Explicit Re-export
Looks like there is a way to eliminate the NP-Completeness by disabling implicit re-export. See LibraryWithoutImplicitExportIsPolynomial. However this works only in a system with standardized versioning policy and without use of RangeDependencies.
Conclusion
Avoid complexities and NP-complete problems. Learn to develop in backward compatible way. Reading TheAPIBook is a perfect entry point into such compatible software design of the 21st century.
External Links
- discussion at Lambda the Ultimate
- LtU guys pointed out that the proof has already been published: D. Burrows, Modelling and Resolving Software Dependencies
- Equinox is said to use SAT4J solver
- EDOS Project seems to find similar proof: See section 3.2 in edos-wp2d1.pdf