Erasure

From APIDesign

(Difference between revisions)
Jump to: navigation, search
m (added a word for clarity)
Current revision (07:53, 25 June 2014) (edit) (undo)
(Generics, Covariance and Contravariance)
 
(16 intermediate revisions not shown.)
Line 1: Line 1:
-
Generics were added to the Java language to improve compile time type checking. Generics make it possible to specify the element type of collections. For example List<Integer> rather than just List. This allows the compiler to differentiate between List<Integer> and List<String>, so the compiler can prevent the programmer from adding an element of the wrong type to the list.
+
Generics were added to the [[Java]] language to improve compile time type checking. Generics make it possible to specify the element type of collections. For example {{JDK|java/util|List}}<{{JDK|java/lang|Integer}}> rather than just {{JDK|java/util|List}}. This allows the compiler to differentiate between {{JDK|java/util|List}}<{{JDK|java/lang|Integer}}> and {{JDK|java/util|List}}<{{JDK|java/lang|String}}>, so the compiler can prevent the programmer from adding an element of the wrong type to the list.
-
The designers of the Java language wanted the language change to be backward compatible as much as possible. So using just List is still allowed. In order to allow this, the compiler erases the element type from objects on the heap and uses just one class for all instantiations. For example ArrayList is used for both ArrayList<String> and ArrayList<Integer>.
+
The designers of the [[Java]] language wanted the language change to be [[BackwardCompatibility|backward compatible]] as much as possible. So using just {{JDK|java/util|List}} is still allowed. Also the goal was to minimize changes in the [[HotSpot]] [[JVM]]. In order to achieve this, the compiler erases the element type from objects on the heap and uses just the raw class for all instantiations. For example {{JDK|java/util|ArrayList}} is used for both {{JDK|java/util|ArrayList}}<String> and {{JDK|java/util|ArrayList}}<Integer>.
<source lang="java">
<source lang="java">
-
// Both print true
+
// Both print true
-
System.out.println(new ArrayList().getClass() == new ArrayList<String>().getClass());
+
System.out.println(new ArrayList().getClass() == new ArrayList<String>().getClass());
-
System.out.println(new ArrayList().getClass() == new ArrayList<Integer>().getClass());
+
System.out.println(new ArrayList().getClass() == new ArrayList<Integer>().getClass());
</source>
</source>
-
Erasure is often understood as removal of all information about generics from the compiled class. Even Oracle's own tutorial on generics, which is mostly correct, sometimes errs in this respect. For example, the page about [http://docs.oracle.com/javase/tutorial/java/generics/bridgeMethods.html|bridge methods] states:
+
On the other hand, the compiler needs the generics information in [[API]]s people compile against. It needs to know {{JDK|java/util|List}} has one type parameter. As a result the things with [[erasure]] of generics are quite complex. The generic types are not there from the point of [[JVM]] but are present from the point of view of [[JavaC]] and reflection.
-
<code lang="java">
+
 
-
After type erasure, the Node and MyNode classes become:
+
== The [[Erasure]] [[Doublethink]] ==
 +
 
 +
[[Erasure]] is often understood as removal of all information about generics from the compiled class. Such a statement is too strong. Only some type information is erased. Even Oracle's own tutorial on generics, which is mostly correct, sometimes errs in this respect. For example, the page about [http://docs.oracle.com/javase/tutorial/java/generics/bridgeMethods.html bridge methods] states:
 +
<source lang="java">
 +
// After type erasure, the Node and MyNode classes become:
public class Node {
public class Node {
Line 28: Line 32:
}
}
}
}
-
</code>
+
</source>
-
The type variable T of class Node (see the tutorial of Oracle) is indeed erased. But the compiler does not erase information about generics from subclass declarations. So after type erasure, the class MyNode actually becomes:
+
The type variable T of class Node (see the tutorial of Oracle) is indeed erased on all instances. Given an instance of Node, one cannot, during runtime find out what is the actual type parameter. But the compiler does not erase information about generics from subclass declarations. So after type erasure, the class MyNode actually becomes:
-
<code lang="java">
+
<source lang="java">
public class MyNode extends Node<Integer> {
public class MyNode extends Node<Integer> {
public MyNode(Integer data) { super(data); }
public MyNode(Integer data) { super(data); }
Line 39: Line 43:
}
}
}
}
-
</code>
+
</source>
-
The type parameter Integer is present in Mynode.class and can be retrieved at runtime using reflection. The same is true for method signatures. The type parameter Integer (strictly: ? extends Integer) of arePositive is available at runtime.
+
The type parameter Integer is present in Mynode.class and can be retrieved at runtime using reflection. The same is true for method signatures.
<source lang="java" snippet="variance.erasure.v1"/>
<source lang="java" snippet="variance.erasure.v1"/>
-
 
+
The type parameter Integer (strictly: ? extends Integer) of arePositive is available at runtime. So if it is changed to Number, the reflection and the [[JavaC]] can see the type parameter is now Number. However the [[JVM]] does not care, it operates on the raw types only and thus it sees only {{JDK|java/util|Set}}:
-
So if it is changed to Number, the JVM can see the type parameter is now Number.
+
<source lang="java" snippet="variance.erasure.v2"/>
<source lang="java" snippet="variance.erasure.v2"/>
== Generics, Covariance and Contravariance ==
== Generics, Covariance and Contravariance ==
-
It is well known that while [[Covariance]] and [[Contravariance]] work OK from the source compatibility point of view in [[Java]], but they are not very [[BackwardCompatible]] from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like [[Java]], one could think, that usage of [[Covariance]] or [[Contravariance]] is impossible. However it is not (at least not completely) - with the help of [[wikipedia:Type_erasure|erasure]] of generic type information in [[Java]] one can use both-variances to own benefits.
+
It is well known that while [[Covariance]] and [[Contravariance]] work OK from the source compatibility point of view in [[Java]], they are not very [[BackwardCompatible]] from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like [[Java]], one could think, that usage of [[Covariance]] or [[Contravariance]] is impossible. However it is not (at least not completely) - with the help of [[wikipedia:Type_erasure|erasure]] of generic type information in [[Java]] one can use both-variances to own benefits.
Imagine there is a simple method that operates on set of integers:
Imagine there is a simple method that operates on set of integers:
Line 70: Line 73:
PS: Tribute for this inventive use of generic type [[erasure]] belongs to Tomáš Zezula.
PS: Tribute for this inventive use of generic type [[erasure]] belongs to Tomáš Zezula.
 +
PS: Tribute goes to [[User:Cordeo|Rijk van Haaften]] for discovering the change is functionally incompatible, as it can be differentiated using reflection on parameter types of the '''arePositive''' method

Current revision

Generics were added to the Java language to improve compile time type checking. Generics make it possible to specify the element type of collections. For example List<Integer> rather than just List. This allows the compiler to differentiate between List<Integer> and List<String>, so the compiler can prevent the programmer from adding an element of the wrong type to the list.

The designers of the Java language wanted the language change to be backward compatible as much as possible. So using just List is still allowed. Also the goal was to minimize changes in the HotSpot JVM. In order to achieve this, the compiler erases the element type from objects on the heap and uses just the raw class for all instantiations. For example ArrayList is used for both ArrayList<String> and ArrayList<Integer>.

// Both print true
System.out.println(new ArrayList().getClass() == new ArrayList<String>().getClass());
System.out.println(new ArrayList().getClass() == new ArrayList<Integer>().getClass());

On the other hand, the compiler needs the generics information in APIs people compile against. It needs to know List has one type parameter. As a result the things with erasure of generics are quite complex. The generic types are not there from the point of JVM but are present from the point of view of JavaC and reflection.

The Erasure Doublethink

Erasure is often understood as removal of all information about generics from the compiled class. Such a statement is too strong. Only some type information is erased. Even Oracle's own tutorial on generics, which is mostly correct, sometimes errs in this respect. For example, the page about bridge methods states:

// After type erasure, the Node and MyNode classes become:
 
 public class Node {
    private Object data;
    public Node(Object data) { this.data = data; }
    public void setData(Object data) {
        System.out.println("Node.setData");
        this.data = data;
    }
 }
 public class MyNode extends Node {
    public MyNode(Integer data) { super(data); }
    public void setData(Integer data) {
        System.out.println(Integer data);
        super.setData(data);
    }
 }

The type variable T of class Node (see the tutorial of Oracle) is indeed erased on all instances. Given an instance of Node, one cannot, during runtime find out what is the actual type parameter. But the compiler does not erase information about generics from subclass declarations. So after type erasure, the class MyNode actually becomes:

public class MyNode extends Node<Integer> {
    public MyNode(Integer data) { super(data); }
    public void setData(Integer data) {
        System.out.println(Integer data);
        super.setData(data);
    }
 }

The type parameter Integer is present in Mynode.class and can be retrieved at runtime using reflection. The same is true for method signatures.

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Integer> numbers) {
    for (Integer n : numbers) {
        if (n <= 0) {
            return false;
        }
    }
    return true;
}
 

The type parameter Integer (strictly: ? extends Integer) of arePositive is available at runtime. So if it is changed to Number, the reflection and the JavaC can see the type parameter is now Number. However the JVM does not care, it operates on the raw types only and thus it sees only Set:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Number> numbers) {
    for (Number n : numbers) {
        if (n.doubleValue() <= 0.0d) {
            return false;
        }
    }
    return true;
}
 

Generics, Covariance and Contravariance

It is well known that while Covariance and Contravariance work OK from the source compatibility point of view in Java, they are not very BackwardCompatible from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like Java, one could think, that usage of Covariance or Contravariance is impossible. However it is not (at least not completely) - with the help of erasure of generic type information in Java one can use both-variances to own benefits.

Imagine there is a simple method that operates on set of integers:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Integer> numbers) {
    for (Integer n : numbers) {
        if (n <= 0) {
            return false;
        }
    }
    return true;
}
 

Any user of such API can call this method with a set of Integers:

Code from ErasureTest.java:
See the whole file.

List<Integer> oneToTen = Arrays.asList(2, 4, 6, 8, 10);
boolean positive = arePositive(oneToTen);
System.err.println("positive = " + positive);
assert positive : "All the numbers are positive: " + oneToTen;
 

Later somebody decides that it would be nice to change method to accept not only integers, but all numbers and creates new version of the API with the same method, but accepting wider parameter types:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Number> numbers) {
    for (Number n : numbers) {
        if (n.doubleValue() <= 0.0d) {
            return false;
        }
    }
    return true;
}
 

Obviously, this is source compatible as this is contravariant extension of the range of values the method can absorb. Without a doubt this is also functionally compatible, as the code now calls doubleValue method and this one was available on Integers as well, so there is no change in the runtime contract.

The only question is whether code compiled against old API will link against new API? Surprisingly (at least for those who remember that Contravariance is not binary compatible) this change is also binary compatible - e.g. it is fully BackwardCompatible! This is caused by the Erasure of generic type information. When compiled into binary form, the signature method is just Set (without any additional generic information), so the JVM sees no change between version 1.0 of the API and version 2.0. Both work on Sets.

<comments/>

PS: Tribute for this inventive use of generic type erasure belongs to Tomáš Zezula. PS: Tribute goes to Rijk van Haaften for discovering the change is functionally incompatible, as it can be differentiated using reflection on parameter types of the arePositive method

Personal tools
buy