Benki Post #1351

Boehm–Berarducci encoding in Java

Last time we looked at how to apply Scott encoding to algebraic data types in Java. This time we are going to look at Boehm–Berarducci encoding, which is related.

Again we take the well-known persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

Recall the Scott encoding of this data structure:

interface BinaryTreeScott<T> {
  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTreeScott<T>, BinaryTreeScott<T>, R> onFork);
}

From a type-theoretic perspective, there is a problem with this: BinaryTreeScott<T> is defined in terms of itself. Perhaps you want to work in a type system that does not support recursive types.

Encoding

The Boehm–Berarducci encoding comes to the rescue. For our binary tree example it is:

interface BinaryTreeBoehmBerarducci<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
}

Note how the only technical difference is in how the self-type is encoded. In other words, the two encodings coincide on non-recursive types.

The intuitive difference is that while the Scott encoding encodes a data type using its pattern match, Boehm–Berarducci encodes it using its fold.

Implementation

The implementation is straight-forward:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) {
    //return (foldLeaf, foldFork) -> foldLeaf.apply(value);

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) {
    //return (foldLeaf, foldFork) -> foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));
      }
    };
  }
}

Summing up the leaves is now even easier than it was before, without even requiring explicit recursion:

class BinaryTreeBoehmBerarducciOps {

  static Integer sum(BinaryTreeBoehmBerarducci<Integer> b) {
    return b.fold(
        n -> n,
        (l, r) -> l + r);
  }
}

Pattern matching

But how do you perform a mere pattern match without recurring? This has now become quite tricky to do.

Your first idea might be to convert the Boehm–Berarducci-encoded data type into its Scott encoding:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default BinaryTreeScott<T> toScott() {
    return fold(
        BinaryTreeScott::ofLeaf,
        BinaryTreeScott::ofFork);
  }
}

This works, but now we depend on the recursively defined type of BinaryTreeScott<T> again, which is what we wanted to avoid. To get around this limitation, we define a non-recursive helper type that fulfills the same role as BinaryTreeScott<T>, but based on BinaryTreeBoehmBerarducci<T>:

interface Deconstrutor<T> {
  <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
}

Then we fold our BinaryTreeBoehmBerarducci<T> into a Deconstructor<T>, on which we can call visit to perform our pattern match:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, R> onFork) {

    interface Deconstructor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
    }

    return
        this.<Deconstructor<T>>fold(
                v ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onLeaf1.apply(v);
                      }
                    },

                (left, right) ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onFork1.apply(
                            left.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork),
                            right.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}

BinaryTreeBoehmBerarducci#visit now works as BinaryTreeScott#visit did before.

The only problem is that it is horrendously inefficient as it traverses the whole tree and constructs a complete mirror tree of Deconstructor<T> objects for just a single pattern match.

Optimization: lazy fold

We can remedy the pathological inefficiency by making the fold operation lazy in the recursive argument:

interface BinaryTreeBoehmBerarducciLazy<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);
}

The complete code is thus:

@FunctionalInterface
public interface BinaryTreeBoehmBerarducciLazy<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofLeaf(T value) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofFork(BinaryTreeBoehmBerarducciLazy<T> left, BinaryTreeBoehmBerarducciLazy<T> right) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldFork.apply(() -> left.fold(foldLeaf, foldFork), () -> right.fold(foldLeaf, foldFork));
      }
    };
  }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, R> onFork) {

    interface Deconstrutor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork);
    }

    return
        this.<Deconstrutor<T>>fold(
                value ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onLeaf1.apply(value);
                      }
                    },

                (left, right) ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onFork1.apply(
                            left.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork),
                            right.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}