Benki Post #1350

Scott encoding in Java

Assume you have your typical persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

We can represent it in modern Java in a straight-forward way:

sealed interface BinaryTree<T> {
  record Leaf<T>(T value) implements BinaryTree<T> {}
  record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {}
}

How do you pattern-match on it?

You could wait for pattern matching for switch to exit preview. Then you will be able to write a function that sums up all the leaves in a tree of integers in a straight-forward way:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return switch(b) {
      BinaryTree.Leaf leaf -> leaf.value;
      BinaryTree.Fork fork -> sum(fork.left) + sum(fork.right);
    };
  }
}

If you do not want to wait, here is an alternative. Add a visit method that takes a deconstructor function for each case and implement it by calling the corresponding deconstructor in each case class:

sealed interface BinaryTree<T> {

    <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

    record Leaf<T>(T value) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }        
    }

    record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }        
    }
}

Now you can sum up a tree of integers by folding it using the visit method:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return b.visit(
        n -> n, 
        (l, r) -> sum(l) + sum(r));
  }
}

Once you realize that pattern matching is a universal operation and therefore all you ever need out of a data structure (if we ignore performance concerns, that is), it becomes evident that you can get rid of the record middlemen and take visit as the data structure itself:

@FunctionalInterface
interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return (onLeaf, onFork) -> onLeaf.apply(value);   // does not compile; see below
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return (onLeaf, onFork) -> onFork.apply(left, right);   // does not compile; see below
  }
}

No records involved. In fact, notice how there are no data structures in a classical sense whatsoever. Instead, this implementation uses pure closures as data containers.

This would be a valid representation of a binary tree; alas, Java does not permit lambda expressions to implement interface methods with type parameters, so you have to forgo the syntactic sugar and write it out in the old-fashioned way:

interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }
    };
  }
}

More verbose, but the same thing, and this time it compiles and works just fine.

This encoding of an algebraic data structure is called the Scott encoding of the data structure.

Clearly it is not very Enterprise-ready, but it does demonstrate how you can summon data structures out of nothing in a language that supports lexical closures.