CompilerOptimizations

From APIDesign

(Difference between revisions)
Jump to: navigation, search
Current revision (05:44, 26 March 2022) (edit) (undo)
 
(3 intermediate revisions not shown.)
Line 5: Line 5:
Yet, at the old days when compiler output was directly consumed by hardware CPU and there was no chance to optimize something later, everything had to be done during the compilation. At that time various C++ compilers competed among themselves to produce the fastest code, the most optimized one. The competition had to be quite hard, as often they tried to optimize too much and sometimes even ''overoptimized''. I remember that from time to time I was getting some mysterious error in my program that vanished away as soon as (usually after many hours of debugging) I realized what can be the cause and I disabled some optimization switches.
Yet, at the old days when compiler output was directly consumed by hardware CPU and there was no chance to optimize something later, everything had to be done during the compilation. At that time various C++ compilers competed among themselves to produce the fastest code, the most optimized one. The competition had to be quite hard, as often they tried to optimize too much and sometimes even ''overoptimized''. I remember that from time to time I was getting some mysterious error in my program that vanished away as soon as (usually after many hours of debugging) I realized what can be the cause and I disabled some optimization switches.
-
For a while I believed that problems of this kind cannot happen to [[JavaC]], however I was probably wrong. Recently I needed to prevent an object to be garbage collected from memory and wrote following code:
+
For a while I believed that problems of this kind cannot happen to [[JavaC]], however I was probably wrong. Recently I needed to prevent an object to be [[Garbage Collection|garbage collected]] from memory and wrote following code:
<source lang="java" snippet="compiler.surprises.intro"/>
<source lang="java" snippet="compiler.surprises.intro"/>
Line 13: Line 13:
<source lang="java" snippet="compiler.surprises.error"/>
<source lang="java" snippet="compiler.surprises.error"/>
-
So far, so good. This code behaves exactly as expected. It leads to conclusion that if you have a variable defined in a method body, and it has a reference to your object, the object cannot be garbage collected, until the method execution ends. OK, now guess: will the following test succeed or fail?
+
So far, so good. This code behaves exactly as expected. It leads to conclusion that if you have a variable defined in a method body, and it has a reference to your object, the object cannot be [[Garbage Collection|garbage collected]], until the method execution ends. OK, now guess: will the following test succeed or fail?
<source lang="java" snippet="compiler.surprises.surprise"/>
<source lang="java" snippet="compiler.surprises.surprise"/>
-
To my biggest surprise the reference can really be garbage collected, even there is a local variable pointing to it! This is an example of surprising (''over'')optimization of [[JavaC]] or [[HotSpot]]. It turns out that, in spite of being declared for the whole method, the variable is not used outside of the '''if''' block and as such the [[JavaC]] allocates its space on stack only for the execution of the '''if''' branch. This is quite surprising behaviour. An easy to fix one, yet surprising one:
+
To my biggest surprise the reference can really be [[Garbage Collection|garbage collected]], even there is a local variable pointing to it! This is an example of surprising (''over'')optimization of [[JavaC]] or [[HotSpot]]. It turns out that, in spite of being declared for the whole method, the variable is not used outside of the '''if''' block and as such the [[JavaC]] allocates its space on stack only for the execution of the '''if''' branch. This is quite surprising behaviour. An easy to fix one, yet surprising one:
<source lang="java" snippet="compiler.surprises.fix"/>
<source lang="java" snippet="compiler.surprises.fix"/>
-
The fix is easy, however the consequences of my finding are really horrible for predictability of our code. [[NetBeans]] may rely on the expected behaviour (e.g. having an uninitialized local variable is enough) quite a lot. From time to time our tests are failing and it may be due to this randomness. Usually everything is OK, but from time to time, on machines with ''too powerful'' virtual machines, too many cores, too low memory, etc. the GC can kick in while the method is running and release the reference, causing our tests to fail because of an unexpected situation.
+
The fix is easy, however the consequences of my finding are really horrible: This means that compiler optimization are not as invisible as they should be. People can rely on or be hurt by them. They can influence predictability of our code, they can change our code to do something else than programmer would expect. This may be flaw of the compiler or of the language designer, yet [[NetBeans]] probably rely on the expected behaviour (e.g. having an uninitialized local variable is enough to hold a reference while a method is being executed) quite a lot. We know that from time to time our tests are failing unexpectedly and unexplainably and it may be due to this randomness. Usually everything is OK, but from time to time, on machines with ''too powerful'' virtual machines, too many cores, too low memory, etc. the GC can kick in while the method is running and release the reference, causing our tests to fail because of an unexpected situation.
-
 
+
-
Maybe we will need to perform complete audit of [[NetBeans]] sources to eliminate use of uninitialized local variables. And all of this just because [[APITypes:CompilerOptimizations|compiler optimizations]] seem to become thing that external API users can depend on. [[APITypes:CompilerOptimizations|Compiler optimizations]] seem to be part of the API of our libraries!
+
-
 
+
-
== Additional Findings ==
+
Liam noted at [http://weblogs.java.net/blog/jst/archive/2008/10/the_better_comp.html ljnelson's blog note] that it is enough to make the variable '''final''' and the problem goes away. True, '''final''' helps, I've just tried that:
Liam noted at [http://weblogs.java.net/blog/jst/archive/2008/10/the_better_comp.html ljnelson's blog note] that it is enough to make the variable '''final''' and the problem goes away. True, '''final''' helps, I've just tried that:
Line 31: Line 27:
<source lang="java" snippet="compiler.surprises.fix.final"/>
<source lang="java" snippet="compiler.surprises.fix.final"/>
-
However the same code without '''final''' works as well. It is enough to initialize the variable in both branches of the '''if''' statement to prevent garbage collection of the reference held by the ''retValue'' variable:
+
However the same code without '''final''' works as well. It is enough to initialize the variable in both branches of the '''if''' statement to prevent [Garbage Collection]] of the reference held by the ''retValue'' variable:
<source lang="java" snippet="compiler.surprises.fix.init"/>
<source lang="java" snippet="compiler.surprises.fix.init"/>
-
This very likely means that the compiler puts the variable onto the stack of the topmost block where it is guaranteed to be fully initialized. That is why we [http://openide.netbeans.org/issues/show_bug.cgi?id=150492 need a hint] to warn developers about declaration of non-fully initialized non-primitive variables, as those can be source of the memory leaks.
+
This very likely means that the compiler puts the variable onto the stack of the topmost block where it is guaranteed to be fully initialized. That is why we [http://openide.netbeans.org/issues/show_bug.cgi?id=150492 need a hint] to warn developers about declaration of non-fully initialized non-primitive variables, as those can be source of the memory leaks.
 +
 
 +
I believe that original motivation for [[APITypes:CompilerOptimizations|compiler optimizations]] is to speed program execution without affecting its behaviour. However this often is just a distant dream as from time to time the [[APITypes:CompilerOptimizations|optimizations]] change execution semantics and as soon as that happen they start to be part of the API of our languages and their libraries!
 +
 
 +
 
 +
[[Category:APITypes]]

Current revision

Do you remember the time when we were still coding in C++ and we used real compilers, producing not just ByteCode, but real machine code, executed by the target CPU? I do remember, at least a bit. I was developing an implementation of SQL database for Novell for a certain time while studying my University.

This used to be the time when compilers needed to perform optimizations. These days are gone now, the JavaC just emits ByteCode and only later, when it is really executed the HotSpot virtual machine performs optimizations. The DynamicCompilation makes this possible. The JavaC does not need to optimize anything, everything can be done later, with much greater knowledge about the execution environment.

Yet, at the old days when compiler output was directly consumed by hardware CPU and there was no chance to optimize something later, everything had to be done during the compilation. At that time various C++ compilers competed among themselves to produce the fastest code, the most optimized one. The competition had to be quite hard, as often they tried to optimize too much and sometimes even overoptimized. I remember that from time to time I was getting some mysterious error in my program that vanished away as soon as (usually after many hours of debugging) I realized what can be the cause and I disabled some optimization switches.

For a while I believed that problems of this kind cannot happen to JavaC, however I was probably wrong. Recently I needed to prevent an object to be garbage collected from memory and wrote following code:

Code from CompilerSurprisesTest.java:
See the whole file.

public class CompilerSurprisesTest {
    Reference<String> cache;
 
    public String factory() {
        String value = new String("Can I disappear?");
        cache = new WeakReference<String>(value);
        return value;
    }
 
    @Test
    public void checkThatTheValueCanDisapper() {
        String retValue = factory();
        retValue = null;
        assertGC("Nobody holds the string value anymore." +
                "It can be GCed.", cache);
    }
}
 

The assertGC is a code from our JUnit extension library called NbJUnit and tries as hard as it can to remove the object pointed by the reference from memory. In the previous code snippet it works fine, in the following code snippet the GC cannot succeed, as the local strong reference is not cleared:

Code from CompilerSurprisesTest.java:
See the whole file.

    @Test
    public void obviouslyWithoutClearingTheReferenceItCannotBeGCed() {
        String retValue = factory();
// commented out:        retValue = null;
        assertNotGC("The reference is still on stack." +
                "It cannot be GCed.", cache);
    }
 

So far, so good. This code behaves exactly as expected. It leads to conclusion that if you have a variable defined in a method body, and it has a reference to your object, the object cannot be garbage collected, until the method execution ends. OK, now guess: will the following test succeed or fail?

Code from CompilerSurprisesTest.java:
See the whole file.

boolean yes = true;
@Test
public void canItBeGCedSurprisingly() {
    String retValue;
    if (yes) {
        retValue = factory();
    }
    assertGC("Can be GCed, as retValue is not on stack!!!!", cache);
}
 

To my biggest surprise the reference can really be garbage collected, even there is a local variable pointing to it! This is an example of surprising (over)optimization of JavaC or HotSpot. It turns out that, in spite of being declared for the whole method, the variable is not used outside of the if block and as such the JavaC allocates its space on stack only for the execution of the if branch. This is quite surprising behaviour. An easy to fix one, yet surprising one:

Code from CompilerSurprisesTest.java:
See the whole file.

boolean ok = true;
@Test
public void canItBeGCedIfInitialized() {
    String retValue = null;
    if (ok) {
        retValue = factory();
    }
    assertNotGC("Cannot be GCed as retValue is not stack", cache);
}
 

The fix is easy, however the consequences of my finding are really horrible: This means that compiler optimization are not as invisible as they should be. People can rely on or be hurt by them. They can influence predictability of our code, they can change our code to do something else than programmer would expect. This may be flaw of the compiler or of the language designer, yet NetBeans probably rely on the expected behaviour (e.g. having an uninitialized local variable is enough to hold a reference while a method is being executed) quite a lot. We know that from time to time our tests are failing unexpectedly and unexplainably and it may be due to this randomness. Usually everything is OK, but from time to time, on machines with too powerful virtual machines, too many cores, too low memory, etc. the GC can kick in while the method is running and release the reference, causing our tests to fail because of an unexpected situation.

Liam noted at ljnelson's blog note that it is enough to make the variable final and the problem goes away. True, final helps, I've just tried that:

Code from CompilerSurprisesTest.java:
See the whole file.

@Test public void properUseOfFinalFixesTheProblem() {
    final String retValue;
    if (yes) {
        retValue = factory();
    } else {
        retValue = null;
    }
    assertNotGC("Cannot be GCed, now the retValue is on stack", cache);
}
 

However the same code without final works as well. It is enough to initialize the variable in both branches of the if statement to prevent [Garbage Collection]] of the reference held by the retValue variable:

Code from CompilerSurprisesTest.java:
See the whole file.

@Test public void properInitializationFixesTheProblem() {
    String retValue;
    if (yes) {
        retValue = factory();
    } else {
        retValue = null;
    }
    assertNotGC("Cannot be GCed, now the retValue is on stack", cache);
}
 

This very likely means that the compiler puts the variable onto the stack of the topmost block where it is guaranteed to be fully initialized. That is why we need a hint to warn developers about declaration of non-fully initialized non-primitive variables, as those can be source of the memory leaks.

I believe that original motivation for compiler optimizations is to speed program execution without affecting its behaviour. However this often is just a distant dream as from time to time the optimizations change execution semantics and as soon as that happen they start to be part of the API of our languages and their libraries!

Personal tools
buy