CompilerOptimizations

From APIDesign

Revision as of 07:27, 17 October 2008 by JaroslavTulach (Talk | contribs)
Jump to: navigation, search

Do you remember the time when we were still coding in C++ and we used real compilers, producing not just ByteCode, but real machine code, executed by the target CPU? I do remember, at least a bit. Because for a certain time, while studying my University, I made my living developing an implementation of SQL database for Novell.

This used to be the time when compilers needed to perform optimizations. These days are gone now, the JavaC just emits ByteCode and only later, when it is really executed the HotSpot virtual machine perform optimizations. The DynamicCompilation makes this possible. The JavaC does not need to optimize anything, everything can be done later, with much greater knowledge about the execution environment.

Yet, at the old days when compiler output was directly consumed by hardware CPU and there was no chance to optimize something later, everything had to be done during the compilation. At that time various C++ compilers competed among themselves to produce the fastest code, the most optimized one. The competition had to be quite hard, as often they tried to optimize too much and sometimes even overoptimized. I remember that from time to time I was getting some mysterious error in my program that vanished away as soon as (usually after many hours of debugging) I realized what can be the cause and I disabled some optimization switches.

For a while I believed that problems of this kind cannot happen to JavaC, however I was probably wrong. Recently I needed to prevent an object to be garbage collected from memory and wrote following code:

Code from CompilerSurprisesTest.java:
See the whole file.

public class CompilerSurprisesTest {
    Reference<String> cache;
 
    public String factory() {
        String value = new String("Can I disappear?");
        cache = new WeakReference<String>(value);
        return value;
    }
 
    @Test
    public void checkThatTheValueCanDisapper() {
        String retValue = factory();
        retValue = null;
        assertGC("Nobody holds the string value anymore." +
                "It can be GCed.", cache);
    }
}
 

The assertGC is a code from our JUnit extension library called NbJUnit and tries as hard as it can to remove the object pointed by the reference from memory. In the previous code snippet it works fine, in the following code snippet the GC cannot succeed, as the local strong reference is not cleared:

Code from CompilerSurprisesTest.java:
See the whole file.

    @Test
    public void obviouslyWithoutClearingTheReferenceItCannotBeGCed() {
        String retValue = factory();
// commented out:        retValue = null;
        assertNotGC("The reference is still on stack." +
                "It cannot be GCed.", cache);
    }
 

So far, so good. This code behaves exactly as expected. It leads to conclusion that if you have a variable defined in a method body, and it has a reference to your object, the object cannot be garbage collected, until the method execution ends. OK, now guess: will the following test succeed of fail?

Code from CompilerSurprisesTest.java:
See the whole file.

boolean yes = true;
@Test
public void canItBeGCedSurprisingly() {
    String retValue;
    if (yes) {
        retValue = factory();
    }
    assertGC("Can be GCed, as retValue is not on stack!!!!", cache);
}
 

To my biggest surprise the reference can really be garbage collected, even there is a local variable pointing to it! This is an example of surprising (over)optimization of JavaC or HotSpot. It turns out that, in spite of being declared for the whole method, the variable is not used outside of the if block and as such the JavaC allocates its space on stack only for the execution of the if branch. This is quite surprising behaviour. An easy to fix one, yet surprising one:

Code from CompilerSurprisesTest.java:
See the whole file.

boolean ok = true;
@Test
public void canItBeGCedIfInitialized() {
    String retValue = null;
    if (ok) {
        retValue = factory();
    }
    assertNotGC("Cannot be GCed as retValue is not stack", cache);
}
 

The fix is easy, however the consequences of my finding are really horrible. NetBeans may rely on the expected behaviour (e.g. having an uninitialized local variable is enough) quite a lot. From time to time our tests are failing and it may be due to this randomness. Usually everything is OK, but from time to time, on machines with too powerful virtual machines, too many cores, too low memory, etc. the GC can kick in while the method is running and release the reference, causing our tests to fail because of an unexpected situation.

Maybe we will need to to perform complete audit of NetBeans sources to eliminate use of uninitialized local variables. And all of this just because compiler optimizations seem to become thing that external API users can depend on. Compiler optimizations seem to be part of the API of our libraries!

Additional Findings

Liam noted at ljnelson's blog note that it is enough to make the variable final and the problem goes away. True, final helps, I've just tried that:

Code from CompilerSurprisesTest.java:
See the whole file.

@Test public void properUseOfFinalFixesTheProblem() {
    final String retValue;
    if (yes) {
        retValue = factory();
    } else {
        retValue = null;
    }
    assertNotGC("Cannot be GCed, now the retValue is on stack", cache);
}
 

However the same code without final works as well. It is enough to initialize the variable in both branches of the if statement to prevent garbage collection of the reference hold by the retValue variable:

Code from CompilerSurprisesTest.java:
See the whole file.

@Test public void properInitializationFixesTheProblem() {
    String retValue;
    if (yes) {
        retValue = factory();
    } else {
        retValue = null;
    }
    assertNotGC("Cannot be GCed, now the retValue is on stack", cache);
}
 

This very likely means that the compiler puts the variable into the topmost block where it is guaranteed to be fully initialized. That is why we need a hint to warn developers about declaration of non-fully initialized non-primitive variables, as those can be source of the memory leaks.

Personal tools
buy