'. '

Modular Java SE

From APIDesign

Revision as of 07:47, 23 October 2012 by JaroslavTulach (Talk | contribs)
Jump to: navigation, search

I like puzzles that tease my mind (a bit). Last week I've been introduced to one. Modularize JDK (as described at Mark Reinhold's blog). This page will capture my thoughts on this topic.

There can be many reasons for having modular JDK, but to simplify things, let's stick with one: we want to reach a point in future, when it will be enough to download just a limited set of JDK to execute an applet/application/server, etc.

Contents

False Expectations

Sometimes people expect to get better performance just by modularizing their application. This is probably a false expectation, at least in the initial step.

By trivially splitting a JAR into ten smaller ones, one can only increase the amount of work done by the system. Instead of opening just one JAR and reading list of its entries, one needs to do this operation ten times, and this is obviously slower, especially when operating system caches are empty (e.g. after boot). Also the mutual communication between the newly separated pieces of the application will require some overhead. Not much, but certainly something.

I have faced this when I split the monolithic openide.jar - a JAR with majority of NetBeans APIs back in 2005 (see the project page for more details). When I divided the big JAR into fifteen smaller, the start time of NetBeans IDE increased by 5%. I was seeking for reasons of such slowdown for the whole release and managed to eliminate it somehow, but the proper fix was still waiting to be discovered.

These days the NetBeans IDE starts faster then it used to before its modularization - we improved the infrastructure and made it more module friendly. We created various caches (for content of META-INF/MANIFEST.MF files of all modules, for META-INF/services, for classes loaded during start, for layout of files on disk, NetBeansLayers, etc.) and these days (NetBeans IDE 6.5, or 6.7) we don't open the modularized JARs at all. Thus we have the deployment benefits as claimed in manifesto of modular programming, while during runtime the system behaves like a monolithic application.

Modularization really pays off (we can easily deprecate obsoleted modules and make the system smaller), but it may take a little while. If you are seeking immediate improvements in terms of ms spend while loading a Hello World! application, you'd better refactor your code and classes. Modularization is not going to help you. Modularization is for those who seek long term improvements in deployment, ability to deprecate and more rapidly evolve the framework.

Infrastructure

Motto: I don't like to do useless work. - As a result I always seek for some test that will ensure my work really leads somewhere (like the guards described in section Path of a lost warrior in Chapter 11).

What is the foremost check to ensure your code is split into pieces? Well, each piece needs to compile separately. Thus, before modularization, I tweak the build infrastructure to make sure it really compiles the pieces and not everything at once.


To do this, you very likely don't want to mangle with location of your sources in your version control system. This would be premature, as the final division of the units is not yet know and your version history would be full of useless moves and renames. Luckily Ant offers powerful enough way to define sets of files and feed them into the compiler.

The following part of build.xml defines three groups of sources: applet, beans and the rest - called base.

<!-- this is the core of the separation - definition
      of what classes belong into what compilation group.
    -->
    <selector id="applet">
        <or>
            <filename name="java/beans/AppletInitializer*"/>
            <filename name="java/applet/**"/>
            <filename name="sun/applet/**"/>
            <filename name="META-INF/services/sun.beans.AppletProxy"/>
        </or>
    </selector>
    <selector id="beans">
        <and>
            <or>
                <filename name="java/beans/**"/>
                <filename name="sun/beans/**"/>
                <filename name="com/sun/beans/**"/>
            </or>
            <none>
                <selector refid="applet"/>
            </none>
        </and>
    </selector>
 
    <selector id="base">
        <none>
            <selector refid="applet"/>
            <selector refid="beans"/>
        </none>
    </selector>

Please note that the selectors are referring to each other. The beans group explicitly says it wants nothing from the applet group and the base group is solitelly defined as everything not included in the previous groups.

Then you need to start Java compiler on each of this group. An important step is to disable the search functionality of javac. By default the compiler looks for additional classes in the sourcepath and loads them as necessary. This needs to be prevented, as that might accidentally load classes from some other group of sources. To do this use the sourcepath="" parameter:

<javac
  bootclasspath="${build.dir}/base.jar"
  sourcepath=""
  destdir="${build.dir}/classes/${module}"
  classpath="${module.cp}"
  includejavaruntime="false"
  includeantruntime="false"
>
  <src refid="src.path"/>
  <selector refid="${module}"/>
</javac>

With infrastructure like this one, you can start splitting your project apart.

Hudson Builder

There is a hudson job to build the whole system, but you shall be able to build the sources manually too. Here are the steps to follow:

  • get the OpenJDK sources, see their website for more info
$ hg fclone http://hg.openjdk.java.net/jdk7/jdk7
  • change subtree of the repository to our
$ vi jdk/.hg/hgrc
# change to: 
# default = http://source.apidesign.org/hg/jdk/
  • update to new version
$ (cd jdk; hg pull -u)
  • build the sources of the JDK itself. This is tricky, but I succeeded with following steps (of course you need to download plugs and jibx according to the OpenJDK instructions):
$ MYJDKROOT=`pwd`/..
$ export ALT_BOOTDIR=/usr/java/jdk1.7.0/
$ export ALT_HOTSPOT_IMPORT_PATH=/usr/java/jdk1.7.0/
$ export ALT_BINARY_PLUGS_PATH=$MYJDKROOT/plugs
$ export JAVA_HOME=""
$ export ANT_HOME=/usr/share/java/ant/
$ export ALT_JIBX_LIBS_PATH=$MYJDKROOT/jibx/lib/
$ make sanity
$ make all
  • And now we are ready to work with the separation itself. First of all copy all the sources spread around the Hg tree into one directory:
$ cd jdk
$ ant merge-sources
  • And now build your modularized JDK:
$ ANT_OPTS=-mx900M ant clean all

After this initial sequence you can start to play with the OpenJDK sources, modify them, tweak the Ant build script (especially its selectors) and repeat only the last step to be sure that the system still continue to build.

java.applet and java.beans

The biggest obstacle preventing creation of limited parts of JDK that really work is to define such limited pieces, make them independent and yet keep binary compatibility for the whole Java SE. Let's look at one such problem and seek a solution.

Obviously you may be interested in using JavaBeans specification and you may not want to know anything about applets. Also you may want to use applets and don't care about introspection and BeanInfos provided by JavaBeans. Is this possible?

Well, there is a problem. The java.beans.AppletInitializer interface. It resides in beans, yet its signature contains java.applet.Applet. This means that the java.beans package has compile time dependency on java.applet. Does that mean whoever uses JavaBeans module in future, needs to install applet module as well?

No. I have a solution: Let's use CodeInjection! Let's change Beans code to not talk directly to Applet, but rather create a code slot that can be injected by the applet module. Here is the diff against out openjdk repository:

The diff is here.

The idea is that when the applet module is not installed, there is no AppletProxy provider meaning that the application would not reference any types in the applet module. When the applet module is installed, it will install the provider and update META-INF/services/sun.beans.AppletProxy and thereafter the service loader will find it.

Sneaking Simplicity In

So things are looking good. With just one problem: There is a static method in Beans class that takes AppletInitializer parameter. Right now it is commented out, but for the sake of BackwardCompatibility I need to return it back? Another puzzle! What shall I do now?

Well, the basic trick is to sneak in simplicity. Of course simplicity can have various meanings, but in this context it means number of outgoing dependencies. The Beans class is not simple, because it has dependency on beans, as well as applet classes. If we can replace it with some other class, that does not depend on applet, then we will simplify the API. Sometimes this is called conceptual surface - the amount of concepts one needs to understand when dealing with an API. By removing the need for users of a class to know anything about applet, we simplify its surface. Not only that, we also allow it to be compilable without applet being around (which is actually the most important goal when modularizing an API).

The only question is how to simplify the Beans class? Of course, the simplest way is to remove the one static method that references Applet - however this is horribly backward incompatible and compatibility for existing clients in our highest priority. Thus the only compile time option is to deprecate the whole Beans class and replace it with some other, simplified one.

I did that by creating new BeanFactory class that does not reference applet at all and otherwise contains the same methods like Beans class.

There Will Be Victims

When modularizing an API, get ready to have some victims - some classes that will need to be deprecated. Regardless of how well designed your API is, it will contain classes with not enough simplicity, like the Beans class above. Prepare for that and create a trash for them - a deprecated module.

The purpose of such module is to keep BackwardCompatibility only. It will have dependencies on all other modules in your system, and as such it can contain classes that do not fit anywhere else. Users of previous version of your API should see this module by default, so their previous dependencies are satisfied.

On the other hand users of your new version, shall not care and shall use other classes in properly modularized APIs that have smaller conceptual surface and smaller compile type and runtime dependencies.

For those interested, here is the final diff of the java.beans and java.applet separation: read it all!

Changing Rules of the Game

I have just find out a trick to prevent the victims mentioned in the previous paragraph. The OpenJDK is considering to change the specification of the Java virtual machine to allow a class with not fully resolved methods to be loaded into the runtime and work fine. Everything is supposed to work until one calls the method which is not resolvable (for example AppletInitializer class would be missing for one of Beans methods), then an exception shall be thrown.

Very useful trick. This shows what one can achieve when thinking without boundaries and also, when one is allowed to change the rules of the game. When I did the modularization of NetBeans APIs, I obviously could not change the virtual machine spec. I was left with only one option: I had to sneak simplicity in. I had to deprecate classes with too large conceptual surface and replace them with similar classes without unneeded dependencies. This was fine in case of Beans like classes that no other API referred to (in signatures). However as this deprecation is transitive, it becomes quite hard to replace java.awt.Image - one would need to deprecate also java.awt.Toolkit and java.awt.Graphics, and so on, so on transitively. It is impossible to get rid of a single method in java.lang.String using this style, obviously.

Thus it is good there will be chance to do partial resolution of classes by the virtual machine. Especially when one faces a sneak in simplicity problem with too big transitive closure, this can be very helpful. On the other hand I can imagine problems with complicated (e.g. not enough clueless) Javadoc that will mention conditional method and conditions when they are allow to work, etc. That is why in the Beans case, it still may be better to sneak the simplicity in and classically deprecate the old class and provide a modern replacement.

java.awt and javax.swing

I managed to remove dependency of the base Java on client packages this week too (see the full patch). Things were straightforward, just I had to include java.beans packages into the client module as well. They heavily depend on java.awt.Image and java.awt.Component - these classes are really unseparable from the rest of AWT. At the end this shall not be that big problem, as JavaBeans primary purpose is to provide means for visual manipulation with plain Java objects and this indeed needs to happen on some client.

Well, except some other advanced usages, which rely on java.beans.FeatureDescriptor only, just like JMX. This subsystem definitely does not belong only to client, it is even more useful in headless environments and as such I had to make it work without JavaBeans API. Again, I used CodeInjection - in this case with a fallback. If the JavaBeans API is loaded into the system, its own Introspector is used. In case the system runs without JavaBeans a simple implementation that identifies getters in their default way (either getXYZ or isXYZ) is used (see the SimpleIntrospector patch).

I believe this provides the best balance between BackwardCompatibility (if running with JavaBean the behaviour stays the same) and modularity (there is a reasonable default behaviour in small setup). Actually, the default behaviour is in fact also correct - the advanced definition of getters is possible only if one includes own java.beans.BeanInfo next to the bean's class. However this is possible only if one depends on JavaBean specification and as such it will be present in the runtime and thus the CodeInjection finds its provider and the behaviour stays fully compatible.

XML SAX and DOM 2

The last part that deserves surgery from the small Java is support for processing of XML. I did this on Wednesday (see the patch). The XML API does not seem so big, but to my surprise there is really big implementation hidden behind it. When eliminated, I could reduce the number of Java files in base module from about 8000 to less then 6000. Looks like about 1/4 of base Java is dedicated to processing of XML. That is quite a lot given the fact that Java can be useful for other purposes as well. It is good the XML infrastructure is now a separate module.

There were two problems I had to face. java.util.prefs and java.util.Properties can be read and written into XML. Again I used the CodeInjection, but in this case it may not be necessary. Writing the XML output is easy while using simple java.io.PrintWriter (see export method in following patch). The only question is how to stick to the old behaviour as close a possible. Here I used ImplementationCompatibilityTest which writes the values of a Preferences into two strings (using the new and old implementation) and then compares the results are same (also available in the patch). The test is extensible: Every time one notices a divergence in behaviour, one can easily demonstrate it in the test. Then it can be fixed for once and ever (stiffing the shape of its implementation amoeba).

Reading XML is slightly harder, and I definitely did not want to do it manually. In the name of cluelessness I wanted to assemble the solution, not write it in scratch. Thus I searched around and found a nanoxml parser. Small, few kilobytes of code, that I used to parse the XML stream. The code remains the same and keeps the same functionality (I again used the ImplementationCompatibilityTest) and instead of 2MB of Xerces I need just about 10KB (e.g. cluelessness is good, but better to be clueless and small).

Executive Summary

Those last fourteen days that I dedicated to prototype the modularization of JDK are over and it is time to do little recap.

The first achievement of this effort is narrowing the scope and expectations. The modularization is beneficial, but itself alone cannot speed up loading time of a Hello World! application.

To get the best from the modularization, it is not enough to split the code to modules. One also needs to improve the infrastructure behind (at least in long term). Proper runtime container needs various caches to optimize load time, runtime behaviour, etc.

This project demonstrated how to reuse Ant and create a reusable build infrastructure. With such infrastructure it is then easy to experiment with various groupings of classes into modules and immediately verify that these groupings are sane, remain compilable and usable.

The actual changes made to the source code of OpenJDK demonstrate few important aspects of modularization. First of all, be aware of benefits of CodeInjection that allows compile and deploy time separation, while keeping runtime cooperation.

Second: get ready for deprecations. Create a module that will contain all the deprecated API that don't fit anywhere else. This way it is possible to shrink the whole framework, while keeping its BackwardCompatibility.

Last but not least: Don't be afraid to re-implement existing implementations. With the help of ImplementationCompatibilityTests one can create new, simplified implementations that remain totally compatible with previous ones (just like the prototype did for XML).

<comments/>

Personal tools
buy