Bck2BrwsrMangling

From APIDesign

(Difference between revisions)
Jump to: navigation, search
(Like JNI)
Current revision (13:44, 27 February 2013) (edit) (undo)
(Accessing a Field)
 
(9 intermediate revisions not shown.)
Line 1: Line 1:
-
When translating the [[ByteCode]] to [[JavaScript]] the [[Bck2Brwsr]] project needs to face a common problem -- when translating a typed language (like [[Java]]) to untyped (like [[JavaScript]]) one needs to mange the names, so they continue to support method and field overloading.
+
When translating the [[ByteCode]] to [[JavaScript]] the [[Bck2Brwsr]] project needs to face a common problem. One needs to find [[Good Name]] for meaningful objects in the old world (aka. [[Java]]) inside the new world (in this case [[JavaScript]]). This happens whenever one is translating a typed language (like [[Java]] or [[C++]]) to untyped (like [[JavaScript]] and [[C]]) one needs to mange the names, so the new names still allow to express method and field overloading.
== Like [[JNI]] ==
== Like [[JNI]] ==
-
There is a common [[wikipedia:Name_mangling|mangling scheme]] specified by [[JNI]] for [[C]] and [[Bck2Brwsr]] mimics the [http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/design.html specification] as closely as possible. The mangling is based on ''underscore encoding'' substitution:
+
There is a common [[wikipedia:Name_mangling|mangling scheme]] specified by [[JNI]] for [[C]]. The [[Bck2Brwsr]] projects mimics the [http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/design.html specification] as closely as possible and extends it only when the [[JVM]] compatibility differs from [[Java]] source compatibility and requires different treatment. The mangling is based on ''underscore encoding'' substitution.
 +
 
 +
=== Fully Qualified Names ===
 +
 
 +
Fully qualified name uses '_' to separate package names and class name. The global virtual machine object '''vm''' has methods to obtain all (once referenced) classes. One can get reference to '''String''' as:
 +
 
 +
<source lang="javascript">
 +
var clazz = vm.java_lang_String(false);
 +
</source>
 +
 
 +
=== Methods ===
-
* Fully qualified name uses '_' to separate package names and class name
 
* There is "__" after name of a method and before its arguments
* There is "__" after name of a method and before its arguments
* return type is encoded first, parameters follow
* return type is encoded first, parameters follow
-
* If there is an '_' in the name segment, it gets replaced by "_1"
+
* If there is an '_' in the name or argument segment, it gets replaced by "_1"
* array signatures start with '[' - such character is replaced by "_3"
* array signatures start with '[' - such character is replaced by "_3"
* object signatures end with ';' - that character is replaced by "_2"
* object signatures end with ';' - that character is replaced by "_2"
-
As a result to call method String.substring(int, int) returning String would be written as:
+
As a result to call method ''String.substring(int, int)'' - e.g. a method that return string and takes two integers as arguments -it be written as:
<source lang="javascript">
<source lang="javascript">
-
var s = "...";
+
var s = "...";
-
var r = s.substring__Ljava_lang_String_2II(0, 5);
+
var r = s.substring__Ljava_lang_String_2II(0, 5);
</source>
</source>
 +
 +
=== Static Method ===
When calling a static method, one first needs to obtain the name of a class. The class is made available in a global object called "vm". As such calling ''String.valueOf(10)'' is translated to:
When calling a static method, one first needs to obtain the name of a class. The class is made available in a global object called "vm". As such calling ''String.valueOf(10)'' is translated to:
<source lang="javascript">
<source lang="javascript">
-
var clazz = vm.java_lang_String(false);
+
var clazz = vm.java_lang_String(false);
-
var r = clazz.valueOf__Ljava_lang_String_2I(10);
+
var r = clazz.valueOf__Ljava_lang_String_2I(10);
</source>
</source>
 +
 +
=== Accessing an Instance Field ===
 +
 +
To support subclasses defining the same field (like in case of [http://source.apidesign.org/hg/bck2brwsr/rev/5e13b1ac2886 InheritanceA and InheritanceB classes]) the [[Bck2Brwsr]] needed to create accessor method to access each field in its declaring class. The accessor which prefixes the name of the field with "_" (forming a name that can't clash with mangled method names). The proper way to access field ''value'' defined in String class would then be:
 +
 +
<source lang="javascript">
 +
var getValue = vm.java_lang_String(true)._value.call(this);
 +
var newValue = "...";
 +
vm.java_lang_String(true)._value.call(this, newValue);
 +
</source>
 +
 +
This can often be simplified to:
 +
 +
<source lang="javascript">
 +
var getValue = this._value();
 +
var newValue = "...";
 +
this._value(newValue);
 +
</source>
 +
 +
which does the same in most cases. Only when there is a subclass defining its own field ''value'', the result would not be correct. Thus this kind of usage is appropriate when one knows the class is final and can't be subclassed (another reason to follow the [[ClientAPI]] advice).
 +
 +
=== Accessing a Static Field ===
 +
 +
Static fields are wrapped by accessors as well. However accessing them is simpler as they don't need proper '''this''' argument. One can use:
 +
 +
<source lang="javascript">
 +
var getValue = vm.java_lang_String(true)._CASE_INSENSITIVE_ORDER();
 +
var newValue = "...";
 +
vm.java_lang_String(true)._CASE_INSENSITIVE_ORDER(newValue);
 +
</source>
 +
 +
== Possible Future Work ==
 +
 +
There is an [http://source.apidesign.org/hg/bck2brwsr/rev/TypeNickNames experimental branch] which replaces well known object types with a single letter:
 +
* "Ljava_lang_String_2" would become 's'
 +
* "Ljava_lang_Object_2" would become 'o'
 +
Applying this pattern would shorten the generated [[JavaScript]] code. The experiments done so far, however have not yield too convincing results.

Current revision

When translating the ByteCode to JavaScript the Bck2Brwsr project needs to face a common problem. One needs to find Good Name for meaningful objects in the old world (aka. Java) inside the new world (in this case JavaScript). This happens whenever one is translating a typed language (like Java or C++) to untyped (like JavaScript and C) one needs to mange the names, so the new names still allow to express method and field overloading.

Contents

Like JNI

There is a common mangling scheme specified by JNI for C. The Bck2Brwsr projects mimics the specification as closely as possible and extends it only when the JVM compatibility differs from Java source compatibility and requires different treatment. The mangling is based on underscore encoding substitution.

Fully Qualified Names

Fully qualified name uses '_' to separate package names and class name. The global virtual machine object vm has methods to obtain all (once referenced) classes. One can get reference to String as:

var clazz = vm.java_lang_String(false);

Methods

  • There is "__" after name of a method and before its arguments
  • return type is encoded first, parameters follow
  • If there is an '_' in the name or argument segment, it gets replaced by "_1"
  • array signatures start with '[' - such character is replaced by "_3"
  • object signatures end with ';' - that character is replaced by "_2"

As a result to call method String.substring(int, int) - e.g. a method that return string and takes two integers as arguments -it be written as:

var s = "...";
var r = s.substring__Ljava_lang_String_2II(0, 5);

Static Method

When calling a static method, one first needs to obtain the name of a class. The class is made available in a global object called "vm". As such calling String.valueOf(10) is translated to:

var clazz = vm.java_lang_String(false);
var r = clazz.valueOf__Ljava_lang_String_2I(10);

Accessing an Instance Field

To support subclasses defining the same field (like in case of InheritanceA and InheritanceB classes) the Bck2Brwsr needed to create accessor method to access each field in its declaring class. The accessor which prefixes the name of the field with "_" (forming a name that can't clash with mangled method names). The proper way to access field value defined in String class would then be:

var getValue = vm.java_lang_String(true)._value.call(this);
var newValue = "...";
vm.java_lang_String(true)._value.call(this, newValue);

This can often be simplified to:

var getValue = this._value();
var newValue = "...";
this._value(newValue);

which does the same in most cases. Only when there is a subclass defining its own field value, the result would not be correct. Thus this kind of usage is appropriate when one knows the class is final and can't be subclassed (another reason to follow the ClientAPI advice).

Accessing a Static Field

Static fields are wrapped by accessors as well. However accessing them is simpler as they don't need proper this argument. One can use:

var getValue = vm.java_lang_String(true)._CASE_INSENSITIVE_ORDER();
var newValue = "...";
vm.java_lang_String(true)._CASE_INSENSITIVE_ORDER(newValue);

Possible Future Work

There is an experimental branch which replaces well known object types with a single letter:

  • "Ljava_lang_String_2" would become 's'
  • "Ljava_lang_Object_2" would become 'o'

Applying this pattern would shorten the generated JavaScript code. The experiments done so far, however have not yield too convincing results.

Personal tools
buy