InvokeDynamic
From APIDesign
(64 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
- | + | When I was younger I used to believe that having [[invokeDynamic]] instruction in [[HotSpot]] [[VM]] can be beneficial. [[I]] even [[Closure|argued]] that the instruction should not be used just for dynamic languages like [[Ruby]] but rather by the core [[Java]] to implement [[Closure|lambdas]]. Now, after spending time to implement [[Closure|lambdas]] in my [[Bck2Brwsr]] [[VM]] and seeing things from the other side I have to admit I was wrong. [[invokeDynamic]] is wrong idea (especially for implementation of [[Closure|lambdas]]). | |
- | + | === Benefits === | |
- | The | + | Implementing different languages on top of [[HotSpot]] virtual machine is of different complexity. When John Rose pushed forward his [[invokeDynamic]] vision, he claimed that the most problematic thing is to properly and effectively dispatch methods calls. Not every language uses the [[Java]] rules. Some support type conversions, implicit arguments. Some can dynamically alter the existing dispatch target or strategies. More about that in an excellent summary [http://blogs.sun.com/jrose/entry/vmil_paper_on_invokedynamic Bytecodes meet Combinators]. [[I]] really liked that paper and [[I]] continue to like it. It matches my functional heart: with [[MethodHandle]] (basically a pointer to method of some signature - for example ''plus'' would take two ints and return their sum as an int - and an object - a receiver to call the method on) a method invocation is finally first class citizen in the [[VM]]. One can do [[wikipedia:Currying|currying]] & co. - all the goodies functional languages had for ages. |
+ | |||
+ | But there is a hidden catch... | ||
+ | |||
+ | === Getting Dynamic === | ||
+ | |||
+ | The primary goal of John Rose was to support dynamic languages - e.g. languages where one knows (almost) no type information until the program actually runs. That means one can effectively type (in this [[JVM]] context: effectively generate [[bytecode]]) only when one knows the actual types. To address all these "deffered" needs the new [[invokeDynamic]] [[bytecode]] operand had been introduced. It does not hardcode the actual invocation, but once invoked, it calls back to let the "supervising" software (like your [[JRuby]] implementation) analyse the actual call parameters and generate sequence of [[MethodHandle]] transformation (possibly a bit of [[wikipedia:currying|currying]], mostly type conversions) to effectively match the actual types of method arguments. | ||
+ | |||
+ | === Drawbacks === | ||
+ | |||
+ | The major problem with [[invokeDynamic]] is, well, that it is dynamic! [[Java]] is statically typed language and all variable, field, method and parameter types are known to [[JavaC]] before its emits the bytecode. Yet (as [[JavaC]] from [[JDK]]8 is emulating [[Closures|lambdas]] with [[invokeDynamic]]) it forgets all the derived type information and generates [[invokeDynamic]] - which is supposed to do late binding - e.g. find out the right types at the invocation time. | ||
+ | |||
+ | One of the key ideas that I had in mind when advocating use of [[MethodHandle]]s for implementation of ''lambdas'' was reduction in the size of [[wikipedia:Constant_pool#The_constant_pool|constant pool]] - you know, the list of referenced symbols like ''Ljava/lang/String'' which generally needs to be repeated in every [[Java]] class. If ''lambdas'' were simulated by inner classes, the constant pools might get enormous (all the symbols might be duplicated in each ''lambda''-innerclass). With [[invokeDynamic]] I was hoping for the pool to be reduced to one shared pool for a single source code (with as many ''lambdas'' as needed). | ||
+ | |||
+ | However the [[JDK]]8 [[Closures|lambdas]] are generating innerclasses behind the scene and on the fly! So the main benefit is in my opinion gone. | ||
+ | |||
+ | === The Problem === | ||
+ | |||
+ | The unnecessary loss of types is problematic for [[VM]]s that are supposed to run in restricted environment - e.g. [[Bck2Brwsr]] or (as far as I heard) [[Java]] ME 8. We are running in restricted [[environment]], we can't consume these resources by trying to generate new classes. Just in time compilation may be too expensive, it is much easier to generate the right execution format ahead-of-time (both for [[Bck2Brwsr]] and for [[Java]] ME 8). | ||
+ | |||
+ | Another issue is related to reflection. [[Method Handle]]s are (due to their dynamic nature) a specific form of reflection. While doing method lookup one identifies the desired method (or field, or setter) by name. One can reference public or private methods. It is not known in advance which methods will be requested - one needs to invoke the bootstrap method to find that out. As such it is really hard to do compile time optimizations (like shortening method names). Again problem for for small, limited environments. | ||
+ | |||
+ | === Summary so far === | ||
+ | |||
+ | As a result we have implementation of [[closures|lambdas]] that is needlessly forgetting the type information gained during compilation, re-creates it during each startup, is generating [[bytecode]] on the fly. It is even surprising it performs acceptably (which probably took many nights of the [[HotSpot]] team members, but ''when there is a will, there is a way'': John Rose was so motivated to show [[invokeDynamic]] is useful, so he did it). | ||
+ | |||
+ | No surprise [[InvokeDynamic]] is not supported by [[Android]]'s [[Dalvik]] [[VM]] (and in fact it should never be, unless [[Android]] wants to attract [[Ruby]] and other dynamic language developers). [[Java]] language does not need it and if you care about [[Java]] language, forget about [[invokeDynamic]]! | ||
+ | |||
+ | === Having a Hammer, Every Problem Looks like a Nail! === | ||
+ | |||
+ | [[InvokeDynamic]] is not a [[Java]] (language) feature. It is a [[HotSpot]] feature to make [[HotSpot]] more attractive for other languages than [[Java]]. | ||
+ | That is of course [[good]] intention. [[HotSpot]] is still one of the best performing [[VM]]s out there - making it more attractive to non-[[Java]] langauges makes sense. | ||
+ | |||
+ | However using [[InvokeDynamic]] to implement a core [[Java]]8 feature (e.g. '''lambdas''') is a mistake. I know I had [[Closure|advocated]] that in the past too - but the consequences are horrible. Original [[Java]] was easy to get ported to small devices - [[InvokeDynamic]] is not - and as such the whole [[Java]]8 is not portable either! | ||
+ | |||
+ | When one has [[InvokeDynamic]] one may be tempted to see solutions to all problems with the help of [[InvokeDynamic]]. But sometimes too much may be an overkill. | ||
+ | |||
+ | === Solution for the [[JVM]] === | ||
+ | |||
+ | All [[Java]]8 actually needs is to be able to turn method into an instance of interface. However this specific goal can be achieved using simpler tools than [[InvokeDynamic]] - of course one can generate the necessary inner classes during compile time, but if we want to stick with the effective (from the point of view of constant pools) way of recording lambdas, the best way is to create new [[bytecode]] instruction specialized for the task. | ||
+ | |||
+ | Something like '''newFromAMethod''' that would specify the resulting interface to generate, the method to call and additional parameters to pass to it. | ||
+ | |||
+ | Some may say that adding new instruction into [[JVM]] needs to be done with care. But those who have seen John's new attempt in the area of [[Value classes]] have to realize that adding new instructions into [[JVM]] is no longer taboo. In such case adding '''newFromAMethod''' that closely mimics necessary [[Java]]8 semantics should be no brainer. | ||
+ | |||
+ | Porting that to limited [[VM]]s like [[Bck2Brwsr]] or [[Java]] ME would be way easier as all the typechecks are performed by [[Javac]] and the rest is just about wiring the method call. All the information is in the classfile. Way easier to extract it than to execute [[invokeDynamic]]'s bootstrap method. | ||
+ | |||
+ | === Extra Syntax on Top of Existing One === | ||
+ | |||
+ | [[InvokeDynamic]] has been originally introduced as a helper for implementation of dynamic languages on top of [[JVM]]. That is indeed valuable goal from the point of view of [[HotSpot]] team. The question is: could the goal have been satisfied without unnecessarily (from the point of view of a [[Java]] language) complicating [[JVM]] specification? | ||
+ | |||
+ | I believe it could have been done. Have you heard about [[AsmJs]]? It is an extension of [[JavaScript]] designed by [[Mozilla]]. It solves completely different needs (it is an attempt to make [[JavaScript]] more typed language - e.g. something opposite to [[InvokeDynamic]]), yet the way it has been introduced worth analysis. | ||
+ | |||
+ | Rather than extending [[JavaScript]] with new keywords (which is similar to to adding new [[JVM]] [[bytecode]] instructions), the [[AsmJs]] decided to create additional syntax on top of the [[JavaScript]] language. For example if one wants to declare that variable '''x''' is an integer, one can do so by: | ||
+ | |||
+ | <source lang="javascript"> | ||
+ | x = x | 0; | ||
+ | </source> | ||
+ | |||
+ | The above is valid [[JavaScript]] assignment and according to the language specification it guarantees that the result of the ''or'' operation is 32-bit integer. Using this to provide hint to the [[JavaScript]] [[VM]] that '''x''' is an integer is a clever way to embed additional semantics into existing language. As a result [[AsmJs]] program is parseable by any [[JavaScript]] implementation, just on [[Mozilla]] it runs way faster than on any other [[JavaScript]] implementation. | ||
+ | |||
+ | I believe the same style could have been used for [[InvokeDynamic]]. If [[HotSpot]] wanted to give [[Ruby]] & co. more effective way to handle method dispatch, there could be some extension of the base [[ByteCode]] that [[Ruby]] and [[HotSpot]] could use to talk to each other without modifying [[JVM]] specification at all. | ||
+ | |||
+ | Probably such ''extensive'' approach would be used, if the [[JVM]] team and [[HotSpot]] team would not be one! If there were more implementations of [[JVM]] spec treated seriously (which was true in case of [[AsmJs]] as [[Mozilla]] needs to negotiate changes to [[JavaScript]] specification with others: [[Safari]], [[Chrome]], etc.). | ||
+ | |||
+ | === Diverging Future === | ||
+ | |||
+ | Looks like the [[JDK]] guys got adrenalized by the "success" of using [[InvokeDynamic]] in recent introduction of [[Closure|lambdas]] in [[JDK]]8 and are willing to boost this kind of innovation in the next [[JDK]] release. Adding new instructions into the [[JVM]] is no longer taboo (as it has been for first fifteen years of [[Java]] existence): the [[Value classes]] proposal wants to add ten(!?) new [[bytecode]]s! | ||
+ | |||
+ | Meanwhile, it turned out that [[invokeDynamic]] may not be the best way to speed up dynamic language. The [[Truffle]] project's [[Ruby]] implementation running on top of [[Graal|enhanced]] [[HotSpot]] (and using no [[invokeDynamic]]) is ten times faster than with [[invokeDynamic]]. Turns out that in future we are likely to have fast dynamic languages on top of [[HotSpot]] not using [[invokeDynamic]], yet the heavy burden of the [[invokeDynamic]] specification remains in the core [[JVM]] spec! | ||
+ | |||
+ | [[Java]] and [[JVM]] needs to get smaller (to compete with emerging and improving competitive technologies; think of [[V8]] and [[NodeJS]]), not bigger. Complicating them just makes it harder to port [[Java]] to new, small areas of use. | ||
+ | |||
+ | Should not we rather think twice before repeating the [[invokeDynamic]] failure [[Value classes|again]]? Should not we fix the [[invokeDynamic]]/[[Closures|lamda]] issue - for example by removing/deprecating [[invokeDynamic]] from future [[JVM]] spec and replacing it with a '''newFromAMethod''' [[bytecode]]? |
Current revision
When I was younger I used to believe that having invokeDynamic instruction in HotSpot VM can be beneficial. I even argued that the instruction should not be used just for dynamic languages like Ruby but rather by the core Java to implement lambdas. Now, after spending time to implement lambdas in my Bck2Brwsr VM and seeing things from the other side I have to admit I was wrong. invokeDynamic is wrong idea (especially for implementation of lambdas).
Contents |
Benefits
Implementing different languages on top of HotSpot virtual machine is of different complexity. When John Rose pushed forward his invokeDynamic vision, he claimed that the most problematic thing is to properly and effectively dispatch methods calls. Not every language uses the Java rules. Some support type conversions, implicit arguments. Some can dynamically alter the existing dispatch target or strategies. More about that in an excellent summary Bytecodes meet Combinators. I really liked that paper and I continue to like it. It matches my functional heart: with MethodHandle (basically a pointer to method of some signature - for example plus would take two ints and return their sum as an int - and an object - a receiver to call the method on) a method invocation is finally first class citizen in the VM. One can do currying & co. - all the goodies functional languages had for ages.
But there is a hidden catch...
Getting Dynamic
The primary goal of John Rose was to support dynamic languages - e.g. languages where one knows (almost) no type information until the program actually runs. That means one can effectively type (in this JVM context: effectively generate bytecode) only when one knows the actual types. To address all these "deffered" needs the new invokeDynamic bytecode operand had been introduced. It does not hardcode the actual invocation, but once invoked, it calls back to let the "supervising" software (like your JRuby implementation) analyse the actual call parameters and generate sequence of MethodHandle transformation (possibly a bit of currying, mostly type conversions) to effectively match the actual types of method arguments.
Drawbacks
The major problem with invokeDynamic is, well, that it is dynamic! Java is statically typed language and all variable, field, method and parameter types are known to JavaC before its emits the bytecode. Yet (as JavaC from JDK8 is emulating lambdas with invokeDynamic) it forgets all the derived type information and generates invokeDynamic - which is supposed to do late binding - e.g. find out the right types at the invocation time.
One of the key ideas that I had in mind when advocating use of MethodHandles for implementation of lambdas was reduction in the size of constant pool - you know, the list of referenced symbols like Ljava/lang/String which generally needs to be repeated in every Java class. If lambdas were simulated by inner classes, the constant pools might get enormous (all the symbols might be duplicated in each lambda-innerclass). With invokeDynamic I was hoping for the pool to be reduced to one shared pool for a single source code (with as many lambdas as needed).
However the JDK8 lambdas are generating innerclasses behind the scene and on the fly! So the main benefit is in my opinion gone.
The Problem
The unnecessary loss of types is problematic for VMs that are supposed to run in restricted environment - e.g. Bck2Brwsr or (as far as I heard) Java ME 8. We are running in restricted environment, we can't consume these resources by trying to generate new classes. Just in time compilation may be too expensive, it is much easier to generate the right execution format ahead-of-time (both for Bck2Brwsr and for Java ME 8).
Another issue is related to reflection. Method Handles are (due to their dynamic nature) a specific form of reflection. While doing method lookup one identifies the desired method (or field, or setter) by name. One can reference public or private methods. It is not known in advance which methods will be requested - one needs to invoke the bootstrap method to find that out. As such it is really hard to do compile time optimizations (like shortening method names). Again problem for for small, limited environments.
Summary so far
As a result we have implementation of lambdas that is needlessly forgetting the type information gained during compilation, re-creates it during each startup, is generating bytecode on the fly. It is even surprising it performs acceptably (which probably took many nights of the HotSpot team members, but when there is a will, there is a way: John Rose was so motivated to show invokeDynamic is useful, so he did it).
No surprise InvokeDynamic is not supported by Android's Dalvik VM (and in fact it should never be, unless Android wants to attract Ruby and other dynamic language developers). Java language does not need it and if you care about Java language, forget about invokeDynamic!
Having a Hammer, Every Problem Looks like a Nail!
InvokeDynamic is not a Java (language) feature. It is a HotSpot feature to make HotSpot more attractive for other languages than Java. That is of course good intention. HotSpot is still one of the best performing VMs out there - making it more attractive to non-Java langauges makes sense.
However using InvokeDynamic to implement a core Java8 feature (e.g. lambdas) is a mistake. I know I had advocated that in the past too - but the consequences are horrible. Original Java was easy to get ported to small devices - InvokeDynamic is not - and as such the whole Java8 is not portable either!
When one has InvokeDynamic one may be tempted to see solutions to all problems with the help of InvokeDynamic. But sometimes too much may be an overkill.
Solution for the JVM
All Java8 actually needs is to be able to turn method into an instance of interface. However this specific goal can be achieved using simpler tools than InvokeDynamic - of course one can generate the necessary inner classes during compile time, but if we want to stick with the effective (from the point of view of constant pools) way of recording lambdas, the best way is to create new bytecode instruction specialized for the task.
Something like newFromAMethod that would specify the resulting interface to generate, the method to call and additional parameters to pass to it.
Some may say that adding new instruction into JVM needs to be done with care. But those who have seen John's new attempt in the area of Value classes have to realize that adding new instructions into JVM is no longer taboo. In such case adding newFromAMethod that closely mimics necessary Java8 semantics should be no brainer.
Porting that to limited VMs like Bck2Brwsr or Java ME would be way easier as all the typechecks are performed by Javac and the rest is just about wiring the method call. All the information is in the classfile. Way easier to extract it than to execute invokeDynamic's bootstrap method.
Extra Syntax on Top of Existing One
InvokeDynamic has been originally introduced as a helper for implementation of dynamic languages on top of JVM. That is indeed valuable goal from the point of view of HotSpot team. The question is: could the goal have been satisfied without unnecessarily (from the point of view of a Java language) complicating JVM specification?
I believe it could have been done. Have you heard about AsmJs? It is an extension of JavaScript designed by Mozilla. It solves completely different needs (it is an attempt to make JavaScript more typed language - e.g. something opposite to InvokeDynamic), yet the way it has been introduced worth analysis.
Rather than extending JavaScript with new keywords (which is similar to to adding new JVM bytecode instructions), the AsmJs decided to create additional syntax on top of the JavaScript language. For example if one wants to declare that variable x is an integer, one can do so by:
x = x | 0;
The above is valid JavaScript assignment and according to the language specification it guarantees that the result of the or operation is 32-bit integer. Using this to provide hint to the JavaScript VM that x is an integer is a clever way to embed additional semantics into existing language. As a result AsmJs program is parseable by any JavaScript implementation, just on Mozilla it runs way faster than on any other JavaScript implementation.
I believe the same style could have been used for InvokeDynamic. If HotSpot wanted to give Ruby & co. more effective way to handle method dispatch, there could be some extension of the base ByteCode that Ruby and HotSpot could use to talk to each other without modifying JVM specification at all.
Probably such extensive approach would be used, if the JVM team and HotSpot team would not be one! If there were more implementations of JVM spec treated seriously (which was true in case of AsmJs as Mozilla needs to negotiate changes to JavaScript specification with others: Safari, Chrome, etc.).
Diverging Future
Looks like the JDK guys got adrenalized by the "success" of using InvokeDynamic in recent introduction of lambdas in JDK8 and are willing to boost this kind of innovation in the next JDK release. Adding new instructions into the JVM is no longer taboo (as it has been for first fifteen years of Java existence): the Value classes proposal wants to add ten(!?) new bytecodes!
Meanwhile, it turned out that invokeDynamic may not be the best way to speed up dynamic language. The Truffle project's Ruby implementation running on top of enhanced HotSpot (and using no invokeDynamic) is ten times faster than with invokeDynamic. Turns out that in future we are likely to have fast dynamic languages on top of HotSpot not using invokeDynamic, yet the heavy burden of the invokeDynamic specification remains in the core JVM spec!
Java and JVM needs to get smaller (to compete with emerging and improving competitive technologies; think of V8 and NodeJS), not bigger. Complicating them just makes it harder to port Java to new, small areas of use.
Should not we rather think twice before repeating the invokeDynamic failure again? Should not we fix the invokeDynamic/lamda issue - for example by removing/deprecating invokeDynamic from future JVM spec and replacing it with a newFromAMethod bytecode?