Erasure

From APIDesign

(Difference between revisions)
Jump to: navigation, search
(New page: It is well known that while Covariance and Contravariance work OK from the source compatibility point of view in Java, but they are not very BackwardCompatible from the bin...)
Current revision (07:53, 25 June 2014) (edit) (undo)
(Generics, Covariance and Contravariance)
 
(22 intermediate revisions not shown.)
Line 1: Line 1:
-
It is well known that while [[Covariance]] and [[Contravariance]] work OK from the source compatibility point of view in [[Java]], but they are not very [[BackwardCompatible]] from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like [[Java]], one could think, that usage of [[Covariance]] or [[Contravariance]] is impossible. However it is not (at least not completely) - with the help of [[wikipedia:Type_erasure|erasure]] of generic type information in [[Java]] one can use both-variances to own benefits.
+
Generics were added to the [[Java]] language to improve compile time type checking. Generics make it possible to specify the element type of collections. For example {{JDK|java/util|List}}<{{JDK|java/lang|Integer}}> rather than just {{JDK|java/util|List}}. This allows the compiler to differentiate between {{JDK|java/util|List}}<{{JDK|java/lang|Integer}}> and {{JDK|java/util|List}}<{{JDK|java/lang|String}}>, so the compiler can prevent the programmer from adding an element of the wrong type to the list.
 +
 
 +
The designers of the [[Java]] language wanted the language change to be [[BackwardCompatibility|backward compatible]] as much as possible. So using just {{JDK|java/util|List}} is still allowed. Also the goal was to minimize changes in the [[HotSpot]] [[JVM]]. In order to achieve this, the compiler erases the element type from objects on the heap and uses just the raw class for all instantiations. For example {{JDK|java/util|ArrayList}} is used for both {{JDK|java/util|ArrayList}}<String> and {{JDK|java/util|ArrayList}}<Integer>.
 +
 
 +
<source lang="java">
 +
// Both print true
 +
System.out.println(new ArrayList().getClass() == new ArrayList<String>().getClass());
 +
System.out.println(new ArrayList().getClass() == new ArrayList<Integer>().getClass());
 +
</source>
 +
 
 +
On the other hand, the compiler needs the generics information in [[API]]s people compile against. It needs to know {{JDK|java/util|List}} has one type parameter. As a result the things with [[erasure]] of generics are quite complex. The generic types are not there from the point of [[JVM]] but are present from the point of view of [[JavaC]] and reflection.
 +
 
 +
== The [[Erasure]] [[Doublethink]] ==
 +
 
 +
[[Erasure]] is often understood as removal of all information about generics from the compiled class. Such a statement is too strong. Only some type information is erased. Even Oracle's own tutorial on generics, which is mostly correct, sometimes errs in this respect. For example, the page about [http://docs.oracle.com/javase/tutorial/java/generics/bridgeMethods.html bridge methods] states:
 +
<source lang="java">
 +
// After type erasure, the Node and MyNode classes become:
 +
 +
public class Node {
 +
private Object data;
 +
public Node(Object data) { this.data = data; }
 +
public void setData(Object data) {
 +
System.out.println("Node.setData");
 +
this.data = data;
 +
}
 +
}
 +
public class MyNode extends Node {
 +
public MyNode(Integer data) { super(data); }
 +
public void setData(Integer data) {
 +
System.out.println(Integer data);
 +
super.setData(data);
 +
}
 +
}
 +
</source>
 +
 
 +
The type variable T of class Node (see the tutorial of Oracle) is indeed erased on all instances. Given an instance of Node, one cannot, during runtime find out what is the actual type parameter. But the compiler does not erase information about generics from subclass declarations. So after type erasure, the class MyNode actually becomes:
 +
<source lang="java">
 +
public class MyNode extends Node<Integer> {
 +
public MyNode(Integer data) { super(data); }
 +
public void setData(Integer data) {
 +
System.out.println(Integer data);
 +
super.setData(data);
 +
}
 +
}
 +
</source>
 +
 
 +
The type parameter Integer is present in Mynode.class and can be retrieved at runtime using reflection. The same is true for method signatures.
 +
<source lang="java" snippet="variance.erasure.v1"/>
 +
The type parameter Integer (strictly: ? extends Integer) of arePositive is available at runtime. So if it is changed to Number, the reflection and the [[JavaC]] can see the type parameter is now Number. However the [[JVM]] does not care, it operates on the raw types only and thus it sees only {{JDK|java/util|Set}}:
 +
<source lang="java" snippet="variance.erasure.v2"/>
 +
 
 +
== Generics, Covariance and Contravariance ==
 +
 
 +
It is well known that while [[Covariance]] and [[Contravariance]] work OK from the source compatibility point of view in [[Java]], they are not very [[BackwardCompatible]] from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like [[Java]], one could think, that usage of [[Covariance]] or [[Contravariance]] is impossible. However it is not (at least not completely) - with the help of [[wikipedia:Type_erasure|erasure]] of generic type information in [[Java]] one can use both-variances to own benefits.
Imagine there is a simple method that operates on set of integers:
Imagine there is a simple method that operates on set of integers:
Line 13: Line 66:
<source lang="java" snippet="variance.erasure.v2"/>
<source lang="java" snippet="variance.erasure.v2"/>
-
Obviously, this is source compatible as this is [[Contravariant]] extension of the range of values the method can absorb. Surprising (at least for those who remember that [[Contravariance]] is not binary compatible) is that this change is also binary compatible - e.g. it is fully [[BackwardCompatible]]! The is caused by the [[Erasure]] of generic type information. When compiled into binary form, the signature method is just {{JDK|java/util|Set}} (without any additional generic information), so the [[JVM]] sees no change between version 1.0 of the API and version 2.0. Both work on {{JDK|java/util|Set}}s.
+
Obviously, this is [[SourceCompatibility|source compatible]] as this is [[Contravariance|contravariant]] extension of the range of values the method can absorb. Without a doubt this is also [[FunctionalCompatibility|functionally compatible]], as the code now calls '''doubleValue''' method and this one was available on {{JDK|java/lang|Integer}}s as well, so there is no change in the runtime contract.
 +
 
 +
The only question is whether code compiled against old [[API]] will link against new [[API]]? Surprisingly (at least for those who remember that [[Contravariance]] is not [[BinaryCompatibility|binary compatible]]) this change is also binary compatible - e.g. it is fully [[BackwardCompatible]]! This is caused by the [[Erasure]] of generic type information. When compiled into binary form, the signature method is just {{JDK|java/util|Set}} (without any additional generic information), so the [[JVM]] sees no change between version 1.0 of the API and version 2.0. Both work on {{JDK|java/util|Set}}s.
 +
 
 +
<comments/>
 +
 
 +
PS: Tribute for this inventive use of generic type [[erasure]] belongs to Tomáš Zezula.
 +
PS: Tribute goes to [[User:Cordeo|Rijk van Haaften]] for discovering the change is functionally incompatible, as it can be differentiated using reflection on parameter types of the '''arePositive''' method

Current revision

Generics were added to the Java language to improve compile time type checking. Generics make it possible to specify the element type of collections. For example List<Integer> rather than just List. This allows the compiler to differentiate between List<Integer> and List<String>, so the compiler can prevent the programmer from adding an element of the wrong type to the list.

The designers of the Java language wanted the language change to be backward compatible as much as possible. So using just List is still allowed. Also the goal was to minimize changes in the HotSpot JVM. In order to achieve this, the compiler erases the element type from objects on the heap and uses just the raw class for all instantiations. For example ArrayList is used for both ArrayList<String> and ArrayList<Integer>.

// Both print true
System.out.println(new ArrayList().getClass() == new ArrayList<String>().getClass());
System.out.println(new ArrayList().getClass() == new ArrayList<Integer>().getClass());

On the other hand, the compiler needs the generics information in APIs people compile against. It needs to know List has one type parameter. As a result the things with erasure of generics are quite complex. The generic types are not there from the point of JVM but are present from the point of view of JavaC and reflection.

The Erasure Doublethink

Erasure is often understood as removal of all information about generics from the compiled class. Such a statement is too strong. Only some type information is erased. Even Oracle's own tutorial on generics, which is mostly correct, sometimes errs in this respect. For example, the page about bridge methods states:

// After type erasure, the Node and MyNode classes become:
 
 public class Node {
    private Object data;
    public Node(Object data) { this.data = data; }
    public void setData(Object data) {
        System.out.println("Node.setData");
        this.data = data;
    }
 }
 public class MyNode extends Node {
    public MyNode(Integer data) { super(data); }
    public void setData(Integer data) {
        System.out.println(Integer data);
        super.setData(data);
    }
 }

The type variable T of class Node (see the tutorial of Oracle) is indeed erased on all instances. Given an instance of Node, one cannot, during runtime find out what is the actual type parameter. But the compiler does not erase information about generics from subclass declarations. So after type erasure, the class MyNode actually becomes:

public class MyNode extends Node<Integer> {
    public MyNode(Integer data) { super(data); }
    public void setData(Integer data) {
        System.out.println(Integer data);
        super.setData(data);
    }
 }

The type parameter Integer is present in Mynode.class and can be retrieved at runtime using reflection. The same is true for method signatures.

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Integer> numbers) {
    for (Integer n : numbers) {
        if (n <= 0) {
            return false;
        }
    }
    return true;
}
 

The type parameter Integer (strictly: ? extends Integer) of arePositive is available at runtime. So if it is changed to Number, the reflection and the JavaC can see the type parameter is now Number. However the JVM does not care, it operates on the raw types only and thus it sees only Set:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Number> numbers) {
    for (Number n : numbers) {
        if (n.doubleValue() <= 0.0d) {
            return false;
        }
    }
    return true;
}
 

Generics, Covariance and Contravariance

It is well known that while Covariance and Contravariance work OK from the source compatibility point of view in Java, they are not very BackwardCompatible from the binary point of view. As the binary compatibility is one of the most important ones for a compiled language like Java, one could think, that usage of Covariance or Contravariance is impossible. However it is not (at least not completely) - with the help of erasure of generic type information in Java one can use both-variances to own benefits.

Imagine there is a simple method that operates on set of integers:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Integer> numbers) {
    for (Integer n : numbers) {
        if (n <= 0) {
            return false;
        }
    }
    return true;
}
 

Any user of such API can call this method with a set of Integers:

Code from ErasureTest.java:
See the whole file.

List<Integer> oneToTen = Arrays.asList(2, 4, 6, 8, 10);
boolean positive = arePositive(oneToTen);
System.err.println("positive = " + positive);
assert positive : "All the numbers are positive: " + oneToTen;
 

Later somebody decides that it would be nice to change method to accept not only integers, but all numbers and creates new version of the API with the same method, but accepting wider parameter types:

Code from Erasure.java:
See the whole file.

public static boolean arePositive(Collection<? extends Number> numbers) {
    for (Number n : numbers) {
        if (n.doubleValue() <= 0.0d) {
            return false;
        }
    }
    return true;
}
 

Obviously, this is source compatible as this is contravariant extension of the range of values the method can absorb. Without a doubt this is also functionally compatible, as the code now calls doubleValue method and this one was available on Integers as well, so there is no change in the runtime contract.

The only question is whether code compiled against old API will link against new API? Surprisingly (at least for those who remember that Contravariance is not binary compatible) this change is also binary compatible - e.g. it is fully BackwardCompatible! This is caused by the Erasure of generic type information. When compiled into binary form, the signature method is just Set (without any additional generic information), so the JVM sees no change between version 1.0 of the API and version 2.0. Both work on Sets.

<comments/>

PS: Tribute for this inventive use of generic type erasure belongs to Tomáš Zezula. PS: Tribute goes to Rijk van Haaften for discovering the change is functionally incompatible, as it can be differentiated using reflection on parameter types of the arePositive method

Personal tools
buy