-
Notifications
You must be signed in to change notification settings - Fork 212
Sieve of Eratosthenes
Using channels to generate a sequence of prime numbers is an example often seen in presentations about CSP and channels. This turns out to be slightly tricky to implement. This page captures the process of developing a reasonable prime number generator using core.async channels and transducers.
We begin with an example in Clojure(Script) that filters out non-prime values using a naive implementation. Look for the chan-of-primes-naive
function below.
One go-loop
creates a channel of integers beginning with the value of 2
, the initial seed value in our sequence of prime numbers. The other go-loop
populates a channel of prime numbers. Notice how it recursively pipes the values of each preceding channel into the newly created channel. Each channel is thereby fed integers from the preceding channel, but through a filtering transducer only retains those integers that are not multiples of the newly known prime value.
Each time through the loop means that the next prime was found (i.e, it wasn't filtered out by the preceding series of transducers as a multiple of a known prime, therefore it must be the next prime value). When that happens, yet another new channel is created with a new filter for the new prime, and the previous channel is piped into this new channel. The result is a series of piped-together channels, each one further constrained than the previous though a combination of its own filter and the effective sum of all the filters in all the channels hooked together in the series.
(defn chan-of-primes-naive []
(let [ints (chan)
primes (chan)]
(go-loop [n 2]
(>! ints n)
(recur (inc n)))
(go-loop [cur-ch ints]
(let [prime (<! cur-ch)
new-ch (chan 1 (filter #(pos? (mod % prime))))]
(>! primes prime)
(pipe cur-ch new-ch)
(recur new-ch)))
primes))
The previous code works, but notice that it doesn't take advantage of the composability of transducers. To do that, within our main go-loop we can create a new channel of integers that begins with the previous prime number and has a transducer that combines the new prime filter with the previous filter (using the standard comp
function), like this:
(defn chan-of-ints [xform start-n]
(let [ints (chan 1 xform)]
(go-loop [n start-n]
(>! ints n)
(recur (inc n)))
ints))
(defn chan-of-primes-doomed []
(let [primes (chan)]
(go-loop [cur-xf (map identity)
cur-ch (chan-of-ints cur-xf 2)]
(let [prime (<! cur-ch)
new-xf (comp cur-xf (filter #(not= 0 (mod % prime))))
new-ch (chan-of-ints new-xf prime)]
(>! primes prime)
(recur new-xf new-ch)))
primes))
Notice how this version, which recursively composes a new transducer, also avoids the need to pipe values from the previous channel to the new channel. To see the difference this makes in performance you can do some simple timings from a REPL using code like the following:
(defn consume [n ch]
(dorun (repeatedly n #(<!! ch))))
(time (consume 2000 (chan-of-primes-naive)))
(time (consume 2000 (chan-of-primes-doomed)))
Note also that the order of composition of the transducer is significant. If the order is reversed, as in this example line of code, performance is significantly worse:
new-xf (comp (filter #(not= 0 (mod % prime))) cur-xf)
Now, if we lived in a world where computers had infinite stack sizes, then the previous example wouldn't have been "doomed". But we don't, so it is, and you ought to know why.
The reason is simple. Although composition (through the use of comp
) is powerful and useful in most cases, in this one, where we are using a recursive loop, the composed function is growing and growing on the stack until the point that you get an unwelcome stack overflow error. Doomed.
Here is our third and final attempt. (Well, at least we hoped it was our final attempt.) It is neither naive, nor doomed. Which means it is much faster than our first attempt, but won't blow the call stack like our second attempt. And it is still concise and readable.
(defn chan-of-ints [xform start-n]
(let [ints (chan 1 xform)]
(go-loop [n start-n]
(>! ints n)
(recur (inc n)))
ints))
(defn new-prime? [n knowm-primes]
(every? #(not= 0 (mod n %)) knowm-primes))
(defn chan-of-primes []
(let [primes (chan)]
(go-loop [cur-xf (map identity)
cur-ch (chan-of-ints cur-xf 2)
knowns [2]]
(let [prime (<! cur-ch)
knowns (conj knowns prime)
new-xf (filter #(new-prime? % knowns))
new-ch (chan-of-ints new-xf prime)]
(>! primes prime)
(recur new-xf new-ch knowns)))
primes))
The main change here is that we have traded the composability of functions for the conjoinability of data. Specifically, instead of comp
ing the transducer until we run out of room on the call stack, we conj
each newly discovered prime number onto a vector of known primes. We then pass these known primes to our simplified transducer filter function so it can check each integer being fed into the channel against the entire collection of known primes. Once again, simple data trumps all the rest.
And that's how you can generate an infinite sequence of prime numbers using core.async
channels. Keep in mind, this is not a particularly good way to get prime numbers. There are far better ways to do that. There are only two justifications for the existence of this example. The first reason is that everyone else is doing it, so we should as well. A classic justification if ever there was one. The second reason is simply to illustrate an approach to solving a problem using particular tools. So if you have problems to solve where channels and go-loops are your best choice of tools, the hope is that you will learn a thing or two from this not-so-practical example that you can apply to your own, real-world, practical problems.
And if you've read this far, just wait until you see what comes next...
Did we say that was the final version of our code? Ha! In the fresh light of a new day, with a fresh cup of strong coffee, we realize that while our third attempt at solving this problem has some nice characteristics, the code is actually a bit of a mess:
-
the
channel-of-ints
function doesn't really return a channel that merely produces integer values. -
the
new-prime?
predicate function requires state to be passed to it fromchan-of-primes
in a way that just doesn't feel right. -
the combination of functions lacks elegance and looks like something written by someone with years of object-oriented coding experience.
Fortunately, the good folks in the Clojure community suggested that transducers were mighty powerful and a better transducer could be used in other contexts and could improve the code used to generate primes. Turns out they were right. Take a look:
(defn chan-of-ints [start-n]
(let [ints (chan)]
(go-loop [n start-n]
(>! ints n)
(recur (inc n)))
ints))
(defn posmod-sift []
(fn [rf]
(let [seen (volatile! [])]
(fn
([] (rf))
([result] (rf result))
([result input]
(if (every? #(pos? (mod input %)) @seen)
(do (vswap! seen conj input)
(rf result input))
result))))))
(defn chan-of-primes []
(let [ints (chan-of-ints 2)
sieve (posmod-sift)
primes (chan 1 sieve)]
(pipe ints primes)
primes))
Essentially what we've done here is taken all the algorithmic transformation logic (the code that knows how to take a series of integers as input and filter out everything that is a multiple of a previously seen value) and moved it into a custom transducer named posmod-sift
, where it really belonged all along.
And so it seems like we've been on a journey that has been traveled before by none other than Rich Hickey, who described the development of transducers this way:
In working recently on providing algorithmic combinators for core.async, I became more and more convinced of the superiority of reducing function transformers over channel->channel functions for algorithmic transformation. In fact, I think they are a better way to do many things for which we normally create bespoke replicas of map, filter etc.
So when we abstract out the transformation logic into a stateful transducer, what we end up with is a tool that is independent of core.asyn, channels, and even prime numbers.
For example, this independent nature of transducers can be seen in the following example run in a REPL:
(transduce (posmod-sift) conj (range 2 100))
=> [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
And here is a slightly-optimized variation that skips even numbers, since they cannot be primes:
(transduce (posmod-sift) conj (cons 2 (range 3 100 2)))
=> [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
And here is a variation that generates a series of something, we aren't completely sure what:
(transduce (posmod-sift) conj (range 5 100 7))
=> [5 12 19 26 33 47 54 61 68 82 89]