Bck2BrwsrFlow
From APIDesign
(New page: Originally Bck2Brwsr simulated control flow (conditional and non-conditional '''goto''' statements in method bodies) via '''for'''/'''switch''' mapping: <source lang="javascript"> var...) |
|||
Line 8: | Line 8: | ||
// and fallback | // and fallback | ||
case 22: | case 22: | ||
- | // more instructions | + | // more instructions and test: |
+ | if (smthng) gt = 69; continue; | ||
// and fallback | // and fallback | ||
case 67: | case 67: | ||
gt = 22; continue; // a loop | gt = 22; continue; // a loop | ||
+ | case 69: | ||
+ | return; | ||
}} | }} | ||
</source> | </source> | ||
- | This | + | This provides almost 1:1 mapping between the [[ByteCode]] and its [[JavaScript]] representation (the values in the '''case''' statements are positions of the instruction in the [[ByteCode]] of each method). However [[Thomas Würthinger]] (the most knowledgable person about [[V8]] I know) warned me that this is not going to be fast. Not only a jump requires assignment of a variable, '''continue'''-jump to beginning of the loop and yet another jump to the appropriate '''case''' statement. According to Thomas, '''switch''' statement is not well optimized in [[V8]] (as he considered it unimportant [[usecase]] when he worked on [[V8]]). I tried to convince Thomas to provide special [[V8]] optimization for the above construct - but no luck. |
- | Thomas suggested to use [http://openjdk.java.net/projects/graal/ Graal's] flow analyzer. Especially to look into the '''GraphBuilderPhase''' class and see how it creates SSA form (using '''FrameStateBuilder''') and also '''BciBlockMapping''' class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily. | + | Thomas rather suggested to use [http://openjdk.java.net/projects/graal/ Graal's] flow analyzer. Especially to look into the '''GraphBuilderPhase''' class and see how it creates SSA form (using '''FrameStateBuilder''') and also '''BciBlockMapping''' class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily. |
- | Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the '''switch''' and | + | Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the '''switch''' and optimize at least forward jumps. [[Bck2Brwsr]] now generates: |
<source lang="javascript"> | <source lang="javascript"> | ||
Line 28: | Line 31: | ||
} | } | ||
X_22: for (;;) { IF: if (gt <= 22) { | X_22: for (;;) { IF: if (gt <= 22) { | ||
- | // more instructions | + | // more instructions and test: |
+ | if (smthng) gt = 69; break IF; | ||
// and fallback | // and fallback | ||
} | } | ||
Line 34: | Line 38: | ||
continue X_22; // loop using direct jump | continue X_22; // loop using direct jump | ||
} | } | ||
- | }}} // close all for loops | + | X_69: for (;;) { |
+ | return; | ||
+ | }}}} // close all for loops | ||
</source> | </source> | ||
+ | |||
+ | The back jump (e.g. '''continue''' ''X_22'') is now a direct [[JavaScript]] control flow instruction that [[V8]] can optimize. Code using '''for'''-loops is going to be faster now. The forward jump still requires using the additional variable (e.g. gt = 69; '''break''' IF), but if it is ''near'', it may not need many comparisons operations either. | ||
+ | |||
+ | Measurements showed about 30% speed up on our matrix multiplication benchmark (which is of course using '''for'''-cycles). |
Revision as of 08:56, 13 March 2013
Originally Bck2Brwsr simulated control flow (conditional and non-conditional goto statements in method bodies) via for/switch mapping:
var gt = 0; for (;;) { switch (gt) { case 0: // some instructions // and fallback case 22: // more instructions and test: if (smthng) gt = 69; continue; // and fallback case 67: gt = 22; continue; // a loop case 69: return; }}
This provides almost 1:1 mapping between the ByteCode and its JavaScript representation (the values in the case statements are positions of the instruction in the ByteCode of each method). However Thomas Würthinger (the most knowledgable person about V8 I know) warned me that this is not going to be fast. Not only a jump requires assignment of a variable, continue-jump to beginning of the loop and yet another jump to the appropriate case statement. According to Thomas, switch statement is not well optimized in V8 (as he considered it unimportant usecase when he worked on V8). I tried to convince Thomas to provide special V8 optimization for the above construct - but no luck.
Thomas rather suggested to use Graal's flow analyzer. Especially to look into the GraphBuilderPhase class and see how it creates SSA form (using FrameStateBuilder) and also BciBlockMapping class that is used for creating structured control flow from byte codes. We tried that, but it all seems too connected to the rest of the code and we were unsure how to extract that most easily.
Still, we were in need of some speed up. As a poor man's solution I decided to eliminate the switch and optimize at least forward jumps. Bck2Brwsr now generates:
var gt = 0; X_0: for (;;) { IF: if (gt <= 0) { // some instructions // and fallback } X_22: for (;;) { IF: if (gt <= 22) { // more instructions and test: if (smthng) gt = 69; break IF; // and fallback } X_67: for (;;) { IF: if (gt <= 67) { continue X_22; // loop using direct jump } X_69: for (;;) { return; }}}} // close all for loops
The back jump (e.g. continue X_22) is now a direct JavaScript control flow instruction that V8 can optimize. Code using for-loops is going to be faster now. The forward jump still requires using the additional variable (e.g. gt = 69; break IF), but if it is near, it may not need many comparisons operations either.
Measurements showed about 30% speed up on our matrix multiplication benchmark (which is of course using for-cycles).