Tuesday, September 30, 2014

Java Generics

The Java Tutorials kicks off it's Generics series with the following POJO (Plain Old Java Object):
public class Box {
   private Object t;

   public Object get() {
      return t;
   }
 
   public void set(Object t) {
      this.t = t;
   }
}
The primary problem with the above class is that there's no type checking at compile time. Suppose, for example, you stored the String "Hello World" in the box.
Box box = new Box();

box.set("Hello World");
Then, somewhere else in your code, you retrieve the Object being stored in the box thinking that it's an Integer.
Integer i = box.get();
Of course if you tried to do that the compiler is going to complain. That's because you're downcasting. That is, you're assigning an instance of a parent class (ie. Object) to a variable of a subclass (ie. Integer). Whenever you downcast you have to explicitly state the child type in parenthesis, like so:
Integer i = (Integer) box.get();
Now everything seems to be okay. At least the compiler isn't complaining anymore. Unfortunately, if you were to actually run the program it would throw the following following exception:
java.lang.ClassCastException:
  java.lang.String cannot be cast to java.lang.Integer
It's unfortunate because if the compiler had warned us about this issue beforehand we could have easily fixed it. But since the issue doesn't crop up until the program is actually executed, who knows what caused it. (This is assuming, of course, you're working on a large code base.)

So let's use generics to fix this program. Generics were one of the new features that were introduced in Java 5. To make a class generic you put what's known as a type parameter between two angle brackets.
public class Box<T> {
   private T t;
 
   public T get() {
      return t;
   }
 
   public void set(T t) {
      this.t = t;
   }
}
Note: You don't have to use the letter T as the name of the type parameter. Any valid identifier will do. Of course you can't use any of the reserved or keywords in Java, but if you wanted to make your program confusing by using the name of a common class (like Object) you can:
public class Box<Object> {
   private Object t;
 
   public Object get() {
      return t;
   }
 
   public void set(Object t) {
      this.t = t;
   }
}
The reason the uppercase letter T was used is because it follows Java's naming convention for type parameter names. Typically type parameter names are single, uppercase letters and the letter that you use indicates something about that type parameter. For example, the java.util.AbstractMap class has two type parameters, K and V, which stand for key and value.

Here are some of the type parameter names you'll frequently encounter in the Java API:
  • E - Element
  • K - Key
  • N - Number
  • T - Type
  • V - Value
  • S, U, V etc. - 2nd, 3rd, 4th types
When you instantiate an object of a generic class, you pass the type of the generic between two angle brackets < >.

Recall that we wanted our box to store the String "Hello World".
Box<String> box = new Box<String>();

box.set("Hello World");
Although technically not true, Java effectively transforms the Box class into the following:
public class Box {
 
   private String t;
 
   public String get() {
      return t;
   }
 
   public void set(String t) {
      this.t = t;
   }
}
Note: In reality, only one Box class is created. That is, Java doesn't create a separate Box class just for boxes that hold a String reference. So the following will result in the output of true.
public static void main(String[] args) {
   Box<String> strBox = new Box<String>();
   Box<Integer> intBox = new Box<Integer>();

   System.out.println(strBox.getClass() == intBox.getClass()); // true
}
Now when we try to cast the String to an Integer we get a compiler error:
Integer i = (Integer) box.get(); // Compiler Error:
   // cannot cast java.lang.String to java.lang.Integer
Besides type checking at compile time, generics reduce the need to perform casts. Recall that the original version of the Box class required us to downcast from Object to Integer. With generics, however, we don't have to do this; just so long as the type of the variable matches the type parameter for the generic.
Box<Integer> box = new Box<Integer>();

box.set(123);

Integer i = box.get();
The great thing about generics is that they provide a way to generalize your code. For example, suppose you wanted to sort a collection of integer and floating point numbers using the Selection Sort algorithm. If we didn't have generics, we'd have to implement our sort method twice; one that takes in an array of integers and another that takes in an array of doubles.
import java.util.Arrays;

public class Klass {

   public static void sort(int[] array) {

      for(int i = 0; i < array.length; i++) {
         int temp;
         int indexOfSmallest = i;

         for(int j = array.length - 1; j > i; j--) {
            if(array[j] < array[indexOfSmallest]) {
               indexOfSmallest = j;
            }
         }

         temp = array[indexOfSmallest];
         array[indexOfSmallest] = array[i];
         array[i] = temp;
      }
   }

   public static void sort(double[] array) {

      for(int i = 0; i < array.length; i++) {
         double temp;
         int indexOfSmallest = i;

         for(int j = array.length - 1; j > i; j--) {
            if(array[j] < array[indexOfSmallest]) {
               indexOfSmallest = j;
            }
         }

         temp = array[indexOfSmallest];
         array[indexOfSmallest] = array[i];
         array[i] = temp;
      }
   }

   public static void main(String[] args) {
      int[] intArray = {42, 20, 17, 13, 28, 14, 23, 15};
      System.out.println("Before: " + Arrays.toString(intArray));
      sort(intArray);
      System.out.println("After: " + Arrays.toString(intArray));
      System.out.println();

      double[] dblArray = {42.8, 20.3, 17.6, 13.9, 28.1, 14.0, 23.2, 15.3};
      System.out.println("Before: " + Arrays.toString(dblArray));
      sort(dblArray);
      System.out.println("After: " + Arrays.toString(dblArray));
   }
}
But with generics we only have to implement our sort method once.
import java.util.Arrays;

public class Klass {

   public static <N extends Number> void sort(N[] array) {

      for(int i = 0; i < array.length; i++) {
         N temp;
         int indexOfSmallest = i;

         for(int j = array.length - 1; j > i; j--) {
            if(array[j].doubleValue() < array[indexOfSmallest].doubleValue()) {
               indexOfSmallest = j;
            }
         }

         temp = array[indexOfSmallest];
         array[indexOfSmallest] = array[i];
         array[i] = temp;
      }
   }

   public static void main(String[] args) {
      Integer[] intArray = {42, 20, 17, 13, 28, 14, 23, 15};
      System.out.println("Before: " + Arrays.toString(intArray));
      sort(intArray);
      System.out.println("After: " + Arrays.toString(intArray));
      System.out.println();

      Double[] dblArray = {42.8, 20.3, 17.6, 13.9, 28.1, 14.0, 23.2, 15.3};
      System.out.println("Before: " + Arrays.toString(dblArray));
      sort(dblArray);
      System.out.println("After: " + Arrays.toString(dblArray));
   }
}
There's quite a few things to take note of from this example. For starters, notice that you can declare a method as being generic even when the class itself is not. When you declare a generic method, the type parameters are specified before the method's return type.
public static <N extends Number> void sort(N[] array) {
This example also demonstrates using an upper bounded parameter, but I'll get to that later. For now, let's just focus on how the method is used. Notice that Java infers the type of the generic parameter based on the argument passed to the method.
Double[] dblArray = {42.8, 20.3, 17.6, 13.9, 28.1, 14.0, 23.2, 15.3};

sort(dblArray);
But if you wanted to make things explicit, you can. Here's what that looks like:
Double[] dblArray = {42.8, 20.3, 17.6, 13.9, 28.1, 14.0, 23.2, 15.3};

Klass.<Double>sort(dblArray);
Besides classes and methods, interfaces can be made generic as well.
public interface MyInterface<T> {
}
Enums, however, cannot.
public enum MyEnum <T> { // compiler error
}
Check out the post I did on Java Enums.

Whenever a class implements an interface that is declared as being generic, that class itself must also be declared as generic.
public interface MyInterface<T> {
}
public class Correct<T> implements MyInterface<T> {
}
public class Wrong implements MyInterface<T> { // compiler error
 // Notice that the class isn't declared as being generic
}
There is an exception to this rule and that's when you specify the type parameter for the generic interface that your class is implementing. For such cases you don't declare the class as being generic.
public class MyClass implements MyInterface<Double> {
}
Note: The following...
public class MyClass<Double> implements MyInterface<Double> {
}
...is the same as...
public class MyClass<T> implements MyInterface<T> {
}
...but not...
public class MyClass implements MyInterface<Double> {
}
Don't get confused on this. Let's look at a simple example just to prove that this is true.
public interface MyInterface<T> {
   T get();
   void set(T t);
}
public class Klass<Double> implements MyInterface<Double> {

   private Double dbl;

   @Override
   public Double get() {
      return dbl;
   }

   @Override
   public void set(Double dbl) {
      this.dbl = dbl;
   }

   public static void main(String[] args) {
      Klass klass = new Klass();

      klass.set(1.23);
      klass.set("Hello World");
   }
}
Now let's look at what happens when we specify the type parameter for the generic interface.
public class Klass implements MyInterface<Double> {

   private Double dbl;

   @Override
   public Double get() {
      return dbl;
   }

   @Override
   public void set(Double dbl) {
      this.dbl = dbl;
   }

   public static void main(String[] args) {
      Klass klass = new Klass();

      klass.set(1.23);
      klass.set("Hello World"); // compiler error
   }
}
Notice that when the following is used:
public class Klass<Double> implements MyInterface<Double> {
The Double doesn't correspond to the Java class Double. It's just a type parameter name. Therefore we can pass any type of Object to the set method of the Klass.
public static void main(String[] args) {
   Klass klass = new Klass();

   klass.set(1.23);
   klass.set("Hello World");
}
On the other hand, if we do the following:
public class Klass implements MyInterface<Double> {
We're specifying that the Klass will only accept objects of type Double
public static void main(String[] args) {
   Klass klass = new Klass();

   klass.set(1.23);
   klass.set("Hello World"); // compiler error
}
When a generic class or interface is used without a corresponding type, it's known as the raw type of the generic. For instance, the raw type of the above generic Box class is...Box.
Box box = new Box(); // raw type used
If you tried to set an object in this raw box...
box.setObject("Hello World");
...you'd get the following compiler warning.
Warning: Box.java uses unchecked or unsafe operations.
Warning: Recompile with -Xlint:unchecked for details.
And if you recompile the class with the -Xlint:unchecked option...
javac Box -Xlint:unchecked
...you'll get an even more descriptive warning message.
unchecked call to setObject(Object) as a member of the raw type Box
Basically what it's telling you is that Java doesn't have enough information to ensure type safety, which means you could do the following without any compiler errors.
Box box = new Box();

box.setObject("Hello World");

Integer i = (Integer) box.get();
Recall that we already did this and that it resulted in the following ClassCastException being thrown.
java.lang.ClassCastException:
  java.lang.String cannot be cast to java.lang.Integer
You can prevent the unchecked warnings from appearing by using the SuppressWarnings annotation.
@SuppressWarnings("unchecked")
public static void main(String[] args) {
   Box box = new Box();
 
   box.setObject("Hello World");

   String str = (String) box.getObject();
}
Going back to our discussion of raw types. Classes or interfaces that have not been programmed to accept a parameter type are not raw types.
public class Box {

   private Object t;
 
   public Object get() {
      return t;
   }
 
   public void set(Object t) {
      this.t = t;
   }
 
   public static void main(String[] args) {
      Box box = new Box(); // not a raw type
   }
}
Basically raw types are there to prevent code that was written before generics were introduced in Java 5 from breaking.

You can assign an instance of a generic class to its raw type.
Box<String> strBox = new Box<String>();
Box rawBox = strBox;
Why is this allowed? Recall that when Java compiled the generic Box class....
public class Box<T> {
   private T t;
 
   public T get() {
      return t;
   }
 
   public void set(T t) {
      this.t = t;
   }
}
...it transformed it into the following.
public class Box {
   private Object t;

   public Object get() {
      return t;
   }
 
   public void set(Object t) {
      this.t = t;
   }
}
When the Java compiler compiles a generic class it goes through a process known as type erasure. Basically what that means is that it erases the type parameters in the generic class and replaces them with references of type Object. Apparently you can verify this behavior with a Java decompiler, but that's not the point. The important thing to remember is that information about type parameters is not available during runtime.
public class Klass<T> {

   private T t;

   public void fooBar (T t) {
      this.t = t;

      System.out.println(String.class); // class java.lang.String
      System.out.println(T.class);      // Compiler Error: 
         // Cannot select from a type variable
   }
}
So to answer the question as to why you're allowed to assign an instance of a generic class to its raw type is because, after type erasure, they both look the same in bytecode. That is,
Box<String> box = new Box<String>();
...looks the same as...
Box<Object> box = new Box<Object>();
...which looks the same as...
Box box = new Box();
It's only when you actually use the generic instance that the Java compiler inserts type information in the form of casts. So the following
Box<String> box = Box<String>();

box.set("Hello World");
String str = box.get();
...would be compiled to...
Box box = Box();

box.set("Hello World");
String str = (String) box.get()
You can also assign a raw type to a specific version of a generic class. You'll just get an unchecked conversion warning when you do so.
Box rawBox = new Box();           
Box<String> strBox = rawBox;     // warning: unchecked conversion
That's because Java can't guarantee the type of object being stored in the raw type, which means you can do the following without any compiler errors.
rawBox.set(new Integer(3));
String str = strBox.get();
However, as soon as you tried to run the program, you'd get the following ClassCastException:
java.lang.ClassCastException:
  java.lang.Integer cannot be cast to java.lang.String
What about assigning a reference to a specific version of a generic type to a variable whose type parameter differs from it?

Recall that in Java you can instantiate an instance of a class using one of its subclasses. This is called upcasting.
public class Parent {
}
public class Child extends Parent {
}
Parent parent = new Child();
So what about this?
Box<Number> box = new Box<Integer>();
Will this work? I mean, it looks pretty reasonable doesn't it? Both sides are of type Box and Integer extends from Number. Unfortunately, though, this line of code will result in the following compiler error.
incompatible types
   required: Box<java.lang.Number>
   found: Box<java.lang.Integer>
So when it comes to specific versions of a generic type, it matters what the type argument is. That is, a Box of Number cannot reference a Box of Integer, which is how the above would read.

Let's take this one step further and look at what happens when you overload a method that takes in a generic type.
public class Klass {
   public void fooBar(Box<String> strBox) {
   }

   public void fooBar(Box<Integer> intBox) {
   }
}
Unfortunately, because the Java compiler erases type information, its ambiguous as to which of these methods should be used. That is, after type erasure, this is what the two methods would look like:
public class Klass {
   public void fooBar(Box strBox) {
   }

   public void fooBar(Box intBox) {
   }
}
Check out the post I did on Method Overloading vs Method Overriding.

There's actually quite a few restrictions on how generics may be used. Here are just a few of them:
  • You cannot instantiate a generic type using the new keyword.
T t = new T(); // Compiler Error:
   // Type parameter 'T' cannot be instantiated directly
There is a workaround to this, though, which requires using reflection:
public static <T> void fooBar(Class<T> klass) {
   try {
      T t = klass.newInstance();
   }
   catch(InstantiationException | IllegalAccessException exception) {
   }
}
Check out the post I did on Exception Handling in Java.

You would then invoke the fooBar method like so:
fooBar(String.class);
  • You cannot instantiate an array of a specific version of a generic type.
Box<String>[] boxes = new Box<String>[3]; // compiler error
But you can instantiate an array of raw types.
Box[] boxes = new Box[3];
Check out the post I did on Java Arrays.
  • You cannot use the type parameter to declare a static variable.
public class Klass<T> {

   private T good;

   private static T bad; // Compiler Error:
      // non-static type variable T cannot be referenced from a static context
}
Likewise you can't declare a static method that returns an instance of the type parameter:
public static T bad(T t) { // Compiler Error
   return t;
}
But you can declare a static generic method that returns an instance of the type parameter.
public static <T> T good(T t) { // generic method, so this is okay
   return t;
}
Check out the post I did on using the static keyword in Java.
public class Box<T> extends IOException { // Compiler Error:
   // Generic class may not extend 'java.lang.Throwable'
}
You need to be particularly careful with methods that take in a variable number of type parameters as such methods can lead to what's known as heap pollution. According to The Java Tutorials:
Heap pollution occurs when a variable of a parameterized type refers to an object that is not of that parameterized type.
Recall that when you instantiate an object in Java it is placed in heap memory.

Suppose we had the following method:
public static<T> void fooBar(T... elements) {
   Object[] array = elements;
   array[1] = new Integer(3);
   T t = elements[1];
}
We then exercised that method like so:
public static void main(String[] args) {
   String[] strArray = {"Hello", "World"};

   fooBar(strArray);
}
Notice that the elements in the strArray reference objects of type String. We then call the fooBar method which pollutes the heap by changing the reference of one of the array's elements to point to an instance of an Integer.

If we were to run this piece of code, it would generate the following exception:
java.lang.ArrayStoreException: java.lang.Integer
Fortunately Java warns you about possible heap pollution.
Warning: possible heap pollution from parameterized vararg type T
If you're certain that the method will not result in the pollution of the heap you can annotate the method with SafeVarargs. This will prevent the previous warning from appearing.
@SafeVarargs
public static<T> void fooBar(T... elements) {
}
Recall that when you instantiate a generic type you have to declare the type of the generic on both sides.
Box<String> list = new Box<String>();
Doesn't that seem a little redundant to you? Can't the compiler infer the types for the right-hand side by looking at the types on the left? Well, in Java 7 it can:
Box<String> list = new Box< >();
The pair of angle brackets you see here is called the diamond operator.

To really see the usefulness of the diamond operator we need to remember how Java was able to infer the type of a generic method based on the arguments passed to it.
public class Box<T> {

   public Box(T t) {
      this.t = t;
   }

   private T t;

   public static void main(String[] args) {
      Box<Number> good = new Box<Number>(new Integer(123));

      Number num = new Integer(123);
      good = new Box<>(num);

      Box<Number> bad = new Box<>(new Integer(123)); // compiler error:
         // Incompatible types: Required Box<Number>, Found Box<Integer>

      // the following throws an unchecked conversion warning
      Box<Number> okay = new Box(new Integer(123)); 
   }
}
You can use what are known as bounds to restrict the type of classes that are allowed when instantiating a generic class or calling a generic method.

If you want to restrict the type parameter to a specific type or a subtype of that type, you can use what's known as an upper bounded parameter.
public class Parent {

   public void foo() {
   }
}
public class Child extends Parent {
}
public static <T extends Parent> void fooBar(T t) {
}
public static void main(String[] args) {
   fooBar(new Parent());
   fooBar(new Child());  // subtype, okay

   fooBar(new Object()); // supertype of Parent, NOT okay
   fooBar(new String()); // neither is this non-related class
}
Because we've restricted the type that can be used as an argument to the fooBar method, we can call methods on that type.
public static <T extends Parent> void fooBar(T t) {
   t.foo();
}
If we hadn't used an upper bounded parameter, we could only call methods from the Object class.
public static <T> void fooBar(T t) {
   t.foo();      // Compiler Error: Cannot find symbol 'foo'

   t.toString(); // this is okay
}
You can specify multiple bounds for a type parameter. Here's what that looks like:
public class MyClass {
}
public interface MyInterface {
}
public static <T extends MyClass & MyInterface> void fooBar(T t) {
}
Notice that when the bound is a class, you specify it before the bounds that are interfaces. Failing to do this results in a compiler error.
public static <T extends MyInterface & MyClass> void fooBar(T t) {
}
Compiler Error: interface expected here
In order to call the fooBar method, we need to pass a type that satisfies all the types listed in the bound. Therefore, neither of the following statements would work.
fooBar(new MyClass());        // Compiler Error
fooBar(new MyInterface(){});  // Compiler Error
However, if we had the following class...
public class Thing extends MyClass implements MyInterface {
}
Then we could do this, since it satisfies all the types listed in the bounds.
fooBar(new Thing());
Recall that interfaces can be made generic. So it makes sense that you can specify bounds for the type parameter of a generic interface.
interface MyInterface<T extends MyClass> {
}
When declaring a generic class whose type parameter is bounded...
public class Thing<T extends MyClass> ...
...you don't need to specify the bounds again in the implements clause.
public class Wrong<T extends MyClass> implements MyInterface<T extends Number> {
public class Correct<T extends MyClass> implements MyInterface<T> {
Recall that I had earlier said the following wouldn't compile:
Box<Number> box = new Box<Integer>();
If we make the following modification to the Box example, however, we can get it to work:
Box<?> box = new Box<Integer>();
The question mark '?' in the above example is what's known as a wildcard. That is, were saying we don't know what type of object the box is going to store. And since we don't have that piece of information, the only thing we are allowed to store in the box is a null.
box.setObject(null);

box.setObject(new Integer(3)); // Compiler Error: 
   // method setObject in class Box<t> cannot be applied to given types
Wildcards provide a shortcut for writing generic methods. That is, the following...
public static<K extends Object> void fooBar(Box<K> box) {
}
...is the same as...
public static void fooBar(Box<?> box) {
}
...but not...
public static void fooBar(Box<Object> box) {
In that last method, we're saying the fooBar method only accepts a Box of Object whereas the other fooBar methods will take in any kind of Box whose type parameter is upper bounded by Object.

Let's look at another instance where the wildcard saves us some typing. The Java Tutorials has an example where they have a sum method that takes in a list of any type of number.
public static double sum(List<? extends Number> list) {
The reason that they used a wildcard is not only so that they could restrict the type of lists that the sum method will take in but also so that they can iterate over the numbers in the list:
public static double sum(List<? extends Number> list) {
   double result = 0;

   for(Number num : list) {
      result += num.doubleValue();
   }

   return result;
}
Here's the same method, but without the wildcard:
public static<N extends Number> double sum(List<N> list) {
   double result = 0;

   for(Number num : list) {
      result += num.doubleValue();
   }

   return result;
}
There are a few restrictions on where you can declare the type parameter as being wild:
  • You cannot declare the type parameter for a generic method as being wild.
public <T> void good() {
}

public <?> void bad() {  // Compiler Error
}
  • Likewise, you can't declare a generic class as taking a wild type parameter.
public class MyClass<?> { // Compiler Error
}
  • And finally, you cannot use the wildcard when instantiating a generic class.
Box<?> box = new Box<?>();
However, you can declare an array of a generic type whose type parameter is the wildcard:
Box<?>[] boxes = new Box<?>[3];
Wildcards also allow you to specify a lower bound for a type parameter. Recall that upper bounds restricts the type parameter to a specific type or a subtype of that type. Lower bounds goes the other way. It restricts the type parameter to a specific type or a super type of that type.
public static void fooBar(Box<? super Number> box) {
}

public static void main(String[] args) {
   fooBar(new Box<Object>());
   fooBar(new Box<Number>());

   fooBar(new Box<Integer>()); // compiler error: Integer is subtype of Number
}
Note: This is the only way to declare this method. That is, lower bounds only work with wildcards. You also can't specify both an upper bound and a lower bound for a type parameter. You have to use one or the other, but not both.

References
The Java Tutorials : Generics

Jakob Jenkov's article titled "Java Generics Tutorial"

Enrico Crisotomo's article titled "Java Generics Quick Tutorial" posted on April 26, 2011

Java 7 New Features Cookbook by Richard M. Reese and Jennifer L. Reese

Java All-in-One For Dummies by Doug Lowe

Java 7 : A Beginner's Tutorial by Budi Kurniawan

Ayushman Jain's article titled "Java 7: Decoding the new Diamond operator with JDT" posted on Saturday, July 30, 2011

No comments:

Post a Comment