Generators and Streams

Arguably, one of the most elegant ways of representing a recursively enumerable set such as the set of prime numbers in a computer program is as an enumeration, or, as some might call it, a generator. A generator, simply speaking, is an object that repeatedly generates the next value of a set when asked for one by the user.

Generators are a sufficiently useful concept that even mainstream programming languages like Python support them with special syntax:

def squares():
    i = 0
    while True:
        yield i*i
        i += 1

sqs = squares()   # Instantiate generator
print sqs.next()  # => 0
print sqs.next()  # => 1
print sqs.next()  # => 4
print sqs.next()  # => 9

Some less mainstream programming languages, such as Haskell, support the concept of a stream instead, which is basically an infinite list:

squares = [x*x | x <- [0..]]

main = do
  println $ take 4 squares

The main difference between a generator and a stream is that a stream need not be instantiated. Whereas a generator has a state that changes every time the user requests a value, a stream always behaves as the same infinite list. As with all things, this entails both disadvantages and benefits, which I am not going to talk about here. However, one thing that makes streams particularly interesting is that since they are static, an implementation can choose to cache the values that have already been generated, so that subsequent uses of the same stream can simply use the cached values instead of generating them from the start again.

This caching behavior is particularly important when generation is expensive—such as is the case for the set of prime numbers.

Generating Primes

Clearly, it is not particularly hard to generate all prime numbers. Simply provide a primality test and filter the natural numbers:

def is_prime(n):
  for p in primes():
    # Break out of the loop if there is
    # no chance left of finding a factor.
    if p*p > n:
      return True
    # If p is a factor of n, n ain't a
    # prime number.
    if n % p == 0:
      return False

def primes():
  yield 2
  i = 3
  while True:
    if is_prime(i):
      yield i
    i += 1

Pretty simple, right? Note that the primes generator is used in the definition of is_prime. This works because primes is defined to yield the first prime number without calling is_prime at all (otherwise, this would lead to an infinite recursion).

Now, the above code is pretty inefficient since is_prime repeatedly generates the primes up to sqrt(n). This could easily be solved by making primes a stream, so let's try making it one. In order to do this, I shall switch to the Scheme-derived Racket programming language for the rest of the article, since it provides a rich set of features dealing with generators and streams:

;; Some helper definitions.
(define (divides? p n)
  (zero? (remainder n p)))

(define (nats [n 0] [step 1])
  (stream-cons n (nats (+ step n) step)))

;; This is the interesting part.
(define (prime? n)
  (for/and ([p (stop-after primes (λ (p) (>= (* p p) n)))])
    (not (divides? p n))))

(define primes
  (stream-cons 2 (stream-filter prime? (nats 3))))

Again, prime? and primes are mutually recursive, but this time, one and the same stream of primes is reused in every invocation of prime?, making it unnecessary to generate the list of primes up to (sqrt n) every single time.

If the functional style is unfamiliar to you, it would have been quite possible to define primes just as in the Python example above, albeit with a little more syntactic overhead (since we still want to convert the generated sequence into a stream):

(define primes
  (sequence->stream
   (in-generator
    (yield 2)
    (for ([n (in-naturals 3)])
      (when (prime? n)
        (yield n))))))

Measuring Performance

So the above prime generation technique is nice and elegant and everything, but it's also hopelessly inefficient.

> (time (stream-ref primes 20000))
cpu time: 24822 real time: 25316 gc time: 379
224743

Surely we can do better. There are a couple of possible optimizations here. One is to test for primality only those numbers that are +/- 1 mod 6. (Figuring out the reason every prime number greater than 3 has this form is left as an exercise for the reader.) This gives us the following improved primality test:

(define (prime? n)
  (or (= n 2)
      (= n 3)
      (let ([mod6 (modulo n 6)])
        (and (> n 1)
             (or (= mod6 1)
                 (= mod6 5))
             (for/and ([p (stop-after primes (λ (p) (>= (* p p) n)))])
               (not (divides? p n)))))))

This improves performance by a bit, but it still leaves a lot to be desired.

> (time (stream-ref primes 20000))
cpu time: 23223 real time: 23424 gc time: 554
224743

Prime Sieves

The reason for the sub-par performance is that we are still using a trial-division algorithm. A much more efficient way of generating primes is by using a prime sieve algorithm.

A simple prime sieve algorithm works like this (this is known as the Sieve of Eratosthenes): Assume some bound n, below which we need to enumerate all prime numbers. Allocate an array of length n. Initialize the array with ones, marking all numbers greater than 1 as potential prime numbers. Now, for every new prime number that you find (by simply looking at the first number in the list that is not yet marked), mark off all multiples by setting their array cells to zero. You may stop when you have reached sqrt(n).

Since the times of Eratosthenes, mathematicians have discovered more elaborate prime sieve algorithms. Of note in particular is the Sieve of Atkin, which is known to be significantly more efficient than the Sieve of Eratosthenes.

The Problem

Unfortunately, it is not at all clear how to use a traditional prime sieve algorithm to generate an infinite stream of primes. Just like the Sieve of Eratosthenes, modern sieve algorithms assume some upper bound n which primes should be generated below of. The reason is that a sieve works by marking off all multiples of some number, and this marking-off process must end at some point. In addition, some of the efficiency benefits that sieve algorithms provide is the fact that they are able to stop relatively early. For instance, the Sieve of Eratosthenes can terminate as soon as the last prime smaller than sqrt(n) has been processed.

Certainly, you could set some preliminary bound and restart the algorithm whenever the bound is crossed, setting a new bound in the process. There may even be a way of choosing the new bound so as to avoid making the asymptotic runtime significantly worse (there's probably a paper or two in here somewhere); but even so, an approach that recomputes a lot of prime numbers again and again is still dissatisfactory.

The Solution

You would think that there must be some other way—and indeed there is. Let's take a step back for a moment. What is a sieve algorithm actually about? The algorithm steps through the already filtered part of the array, and for each value that it finds, it starts to check off its multiples (or square multiples, or whatever it is the specific algorithm does) up to the end of the array. But wait—why should such a subprocess have to terminate as soon as the end of the array is reached? It might as well sleep until the array becomes larger, at which point it can continue to check off items—after which we can be sure that the enlarged array is also filtered. So if we can somehow make it do that, we have magically solved all of our problems. (Admittedly, this isn't quite true, since the sieve process itself also needs to be suspended whenever it is okay to do so—but that is a negligible issue, since we can just make it stop and continue as well.)

This is indeed feasible. In fact, there are multiple ways of doing something like this. We will take a rather simple and intuitive route: Make every check-off process a coroutine, collect the resulting coroutines into a list, and run them in order every time we resize the array. Each time, the coroutines will stop at the end of the array, waiting for the next iteration to arrive. In pseudo-code,

define sieve coroutine:
  loop:
    for i from 1 to infinity:
      if sieve[i] is 1:
        spawn check-off coroutines for i
      if i >= stopping number (depending on size of array):
        yield to parent
          (and sleep until reentered)

define primes:
  set up sieve
  set coroutines = []
  loop:
    run sieve coroutine,
      saving spawned coroutines into the coroutines list
    run coroutines from coroutines list
    enumerate all primes from sieve
    resize sieve

Implementation in Racket

We implement the above by first defining a facility for creating coroutines:

(define (make-coroutine fn)
  (let ([cont #f]
        [return #f])
    (λ ()
      (let/cc return*
        (set! return return*)
        (if cont
            (cont)
            (fn (λ () (let/cc k
                        (set! cont k)
                        (return)))))))))

Next, we need to implement the main loop. We do this by writing it directly as a stream—though we could also have used a generator, as noted above.

;; Parameters:
;;  sieve-constructior — the function that constructs the sieve coroutine
;;  starting-numbers   — a list of numbers that the sieve does not yield
;;                       but that still need to be in the stream
;;  index->number      — a procedure that maps an array index to the number it represents
;;                       (in the Sieve of Eratosthenes, this is simply the identity function)
(define (sieve->stream sieve-constructor starting-numbers index->number)
  ;; Starting array size: 100'000 items.
  (let* ([sieve      (box (make-vector 100000 #t))]
         [coroutines (box (list))]
         [sieve-coro (sieve-constructor sieve coroutines)])
    (define (iter pos)
      (letrec
          ([len
            (vector-length (unbox sieve))]
           [init
            ;; Run sieve and filter coroutines.
            (λ ()
              (sieve-coro)
              (for ([coro (unbox coroutines)])
                (coro)))]
           [loop
            ;; Yield the generated primes by iterating over the sieve.
            (λ (i)
              (if (< i len)
                  (begin
                    (if (vector-ref (unbox sieve) i)
                        (stream-cons (index->number i)
                                     (loop (add1 i)))
                        (loop (add1 i))))
                  (next-iteration i)))]
           [next-iteration
            ;; Resize array, repeat.
            (λ (newpos)
              (let* ([old-sieve (unbox sieve)]
                     [new-size (inexact->exact
                                (round
                                 (* (vector-length old-sieve)
                                    1.2)))])
                (printf "Resizing prime sieve.  New size: ~A\n" new-size)
                (set-box! sieve (make-vector new-size))
                (vector-fill! (unbox sieve) #t)
                (vector-copy! (unbox sieve) 0 old-sieve)
                (iter newpos)))])
        (init)
        (loop pos)))
    (define (start-iter starting-numbers)
      (match starting-numbers
        [(list)
         (iter 1)]
        [(list* x xs)
         (stream-cons x (start-iter xs))]))
    (start-iter starting-numbers)))

For the sieve algorithm, we can use basically whichever one we want. I have chosen to adapt an algorithm published on StackOverflow.com by one Robert William Hanks (which he says is not his real name). The original version posted on SO is the following:

import numpy
def primesfrom2to(n):
    """ Input n>=6, Returns a array of primes, 2 <= p < n """
    sieve = numpy.ones(n/3 + (n%6==2), dtype=numpy.bool)
    for i in xrange(1,int(n**0.5)/3+1):
        if sieve[i]:
            k=3*i+1|1
            sieve[       k*k/3     ::2*k] = False
            sieve[k*(k-2*(i&1)+4)/3::2*k] = False
    return numpy.r_[2,3,((3*numpy.nonzero(sieve)[0][1:]+1)|1)]

We translate this into a coroutine-based implementation as below. Note that the sieve routine destructively modifies the sieve array. In addition, we use explicit boxes in order to communicate changes in the list of coroutines from the sieve algorithm to the main loop as well as for the array, which is reallocated on every major iteration.

;; A little helper definition.
(define-syntax-rule (while cond body ...)
  (when cond
    (let loop ()
      body ...
      (when cond
        (loop)))))
(define (make-rwh-prime-sieve2 sieve coroutines)
  (make-coroutine
   (λ (yield)
     (let loop ([pos 1])
       (let* ([n (add1 (* (vector-length (unbox sieve)) 3))])
         (for ([i (in-range pos (add1 (quotient (integer-sqrt n) 3)))])
           (set! pos (add1 i))
           (when (vector-ref (unbox sieve) i)
             (let* ([k (bitwise-ior (+ (* 3 i) 1)
                                    1)]
                    [make-filter
                     (λ (starting-pos)
                       (make-coroutine
                        (λ (yield)
                          (let loop ([ii (quotient starting-pos 3)])
                            (while (>= ii (vector-length (unbox sieve)))
                              (yield))
                            (vector-set! (unbox sieve) ii #f)
                            (loop (+ ii (* 2 k)))))))])
               (set-box! coroutines
                         (list* (make-filter (sqr k))
                                (make-filter (* k
                                                (+ k
                                                   (- (* 2 (if (odd? i) 1 0)))
                                                   4)))
                                (unbox coroutines)))))))
       (yield)
       (loop pos)))))

Finally, putting it all together, we finally get a stream of primes:

(define primes
  (sieve->stream make-rwh-prime-sieve2
                 '(2 3)
                 (λ (i)
                   (bitwise-ior (+ (* 3 i) 1) 1))))

What Did We Gain?

So after all this, have we gained anything with regard to performance? Indeed we have:

> (time (stream-ref primes 20000))
cpu time: 387 real time: 391 gc time: 148
224743

In fact, we can now easily generate one order of magnitude more prime numbers than before (note that this does not merely generate the prime numbers up to 100'000—those would be quite a bit fewer!):

> (time (stream-ref primes 100000))
cpu time: 2120 real time: 2322 gc time: 621
1299721

Not too shabby, I'd say, considering that the program has now stored every single prime up to 1'299'721 in a stream of length 100'001. We can verify that this is true by summing over all of them and measuring the time:

> (time (stream-fold + 0 (stream-take primes 100001)))
cpu time: 301 real time: 301 gc time: 0
62261998442

Similarly, generating some more primes reuses the existing stream:

> (time (stream-take (stream-drop primes 100001) 10))
cpu time: 270 real time: 269 gc time: 0
'(1299743
  1299763
  1299791
  1299811
  1299817
  1299821
  1299827
  1299833
  1299841
  1299853)

Conclusion

We can conclude that using prime sieves is an efficient way of enumerating primes. More interestingly, even though they are inherently imperative in nature, heavily leveraging side-effects and mutable arrays, they can be given an elegant functional interface using streams and can, with some work, even be used to enumerate unbounded sequences of prime numbers by making use of filtering coroutines.