Bck2BrwsrFlow
From APIDesign
|  (→Flaws) | |||
| (5 intermediate revisions not shown.) | |||
| Line 22: | Line 22: | ||
| Thomas rather suggested to use [http://openjdk.java.net/projects/graal/ Graal's] flow analyzer. Especially to look into the '''GraphBuilderPhase''' class and see how it creates SSA form (using '''FrameStateBuilder''') and also '''BciBlockMapping''' class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily. | Thomas rather suggested to use [http://openjdk.java.net/projects/graal/ Graal's] flow analyzer. Especially to look into the '''GraphBuilderPhase''' class and see how it creates SSA form (using '''FrameStateBuilder''') and also '''BciBlockMapping''' class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily. | ||
| - | Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the '''switch''' and optimize at least  | + | Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the '''switch''' and optimize at least backward jumps. [[Bck2Brwsr]] now generates: | 
| <source lang="javascript"> | <source lang="javascript"> | ||
| Line 38: | Line 38: | ||
|   continue X_22; // loop using direct jump |   continue X_22; // loop using direct jump | ||
| } | } | ||
| - | X_69: for (;;) { | + | X_69: for (;;) { IF: if (gt <= 69) { | 
|   return;  |   return;  | ||
| - | }}}} // close all for loops | + | }}}}} // close all for loops | 
| </source> | </source> | ||
| - | The back jump (e.g. '''continue''' ''X_22'') is now a direct [[JavaScript]] control flow instruction that [[V8]] can optimize. Code using '''for'''-loops is going to be faster now. The forward jump still requires  | + | The back jump (e.g. '''continue''' ''X_22'') is now a direct [[JavaScript]] control flow instruction that [[V8]] can optimize. Code using '''for'''-loops is going to be faster now. The forward jump still requires usage of additional variable (e.g. gt = 69; '''break''' IF), but if it is ''near'', it may not need that many compare operations either. | 
| Measurements showed about 30% speed up on our matrix multiplication benchmark (which is of course using '''for'''-cycles). | Measurements showed about 30% speed up on our matrix multiplication benchmark (which is of course using '''for'''-cycles). | ||
| Line 51: | Line 51: | ||
| Using the current technique one can decide to optimize forward jumps or backward jumps. But not both. I am not aware of a language construct to optimize both forward/backward jumps without the control flow analyzer. Currently [[Bck2Brwsr]] optimizes for backward jumps as the common sense suggests that one needs them more when using '''while'''/'''for''' loops. | Using the current technique one can decide to optimize forward jumps or backward jumps. But not both. I am not aware of a language construct to optimize both forward/backward jumps without the control flow analyzer. Currently [[Bck2Brwsr]] optimizes for backward jumps as the common sense suggests that one needs them more when using '''while'''/'''for''' loops. | ||
| - | Another discovered problem revealed limitation of some [[JavaScript]] implementations. We are nesting the '''for''' loops one into another. If there is many of them, the code may get refused. For example [[Rhino]] complained when I tried to nest 268 loops. To mitigate that I limit the number of open loops to 64. When the limit is reached (only twice in the code I know: {{JDK|java/util/zip|Inflater}} and [[Bck2Brwsr]] code generation method) we close all the loops and start again. Fast backward jumps still work ''locally'' | + | Another discovered problem revealed limitation of some [[JavaScript]] implementations. We are nesting the '''for''' loops one into another. If there is many of them, the code may get refused. For example [[Rhino]] complained when I tried to nest 268 loops. To mitigate that I limit the number of open loops to 64. When the limit is reached (only twice in the code I know: {{JDK|java/util/zip|Inflater}} and [[Bck2Brwsr]] code generation method) we close all the loops and start again. Fast backward jumps still work effectively when done ''locally'' - if there is a need to jump between two 64-'''for''' blocks, we need to use the old: ''gt = 22;'' '''continue X_0'''. | 
| - | + | So much about simulating '''goto''' in [[Java]]/[[JavaScript]]. Better ideas? | |
| <comments/> | <comments/> | ||
Current revision
Originally Bck2Brwsr simulated control flow (conditional and non-conditional goto statements in method bodies) via for/switch mapping:
var gt = 0; for (;;) { switch (gt) { case 0: // some instructions // and fallback case 22: // more instructions and test: if (smthng) gt = 69; continue; // and fallback case 67: gt = 22; continue; // a loop case 69: return; }}
This provides almost 1:1 mapping between the ByteCode and its JavaScript representation (the values in the case statements are positions of the instruction in the ByteCode of each method). However Thomas Würthinger (the most knowledgable person about V8 I know) warned me that this is not going to be fast. Not only a jump requires assignment of a variable, continue-jump to beginning of the loop and yet another jump to the appropriate case statement. According to Thomas, switch statement is not well optimized in V8 (as he considered it unimportant usecase when he worked on V8). I tried to convince Thomas to provide special V8 optimization for the above construct - but no luck.
Thomas rather suggested to use Graal's flow analyzer. Especially to look into the GraphBuilderPhase class and see how it creates SSA form (using FrameStateBuilder) and also BciBlockMapping class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily.
Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the switch and optimize at least backward jumps. Bck2Brwsr now generates:
var gt = 0; X_0: for (;;) { IF: if (gt <= 0) { // some instructions // and fallback } X_22: for (;;) { IF: if (gt <= 22) { // more instructions and test: if (smthng) gt = 69; break IF; // and fallback } X_67: for (;;) { IF: if (gt <= 67) { continue X_22; // loop using direct jump } X_69: for (;;) { IF: if (gt <= 69) { return; }}}}} // close all for loops
The back jump (e.g. continue X_22) is now a direct JavaScript control flow instruction that V8 can optimize. Code using for-loops is going to be faster now. The forward jump still requires usage of additional variable (e.g. gt = 69; break IF), but if it is near, it may not need that many compare operations either.
Measurements showed about 30% speed up on our matrix multiplication benchmark (which is of course using for-cycles).
Flaws
Using the current technique one can decide to optimize forward jumps or backward jumps. But not both. I am not aware of a language construct to optimize both forward/backward jumps without the control flow analyzer. Currently Bck2Brwsr optimizes for backward jumps as the common sense suggests that one needs them more when using while/for loops.
Another discovered problem revealed limitation of some JavaScript implementations. We are nesting the for loops one into another. If there is many of them, the code may get refused. For example Rhino complained when I tried to nest 268 loops. To mitigate that I limit the number of open loops to 64. When the limit is reached (only twice in the code I know: Inflater and Bck2Brwsr code generation method) we close all the loops and start again. Fast backward jumps still work effectively when done locally - if there is a need to jump between two 64-for blocks, we need to use the old: gt = 22; continue X_0.
So much about simulating goto in Java/JavaScript. Better ideas?
<comments/>
 Follow
 Follow 
             
             
            