Skip to content

johnmn3/cljs-thread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cljs-thread: spawn, in, future, pmap, pcalls, pvalues and =>>

"One step closer to threads on the web"

cljs-thread makes using webworkers take less... work. Eventually, I'd like it to be able to minimize the amount of build tool configuration it takes to spawn workers in Clojurescript too, but that's a longer term goal.

When you spawn a node, it automatically creates connections to all the other nodes, creating a fully connected mesh.

The in macro then abstracts over the message passing infrastructure, with implicit binding conveyance and blocking semantics, allowing you to do work on threads in a manner similar to what you would experience with threads in Clojure and other languages. cljs-thread provides familiar constructs like future, pmap, pcalls and pvalues. For transducing over large sequences in parallel, the =>> thread-last macro is provided.

Getting Started

deps.edn

Place the following in the :deps map of your deps.edn file:

...
net.clojars.john/cljs-thread {:mvn/version "0.1.0-alpha.4"}
...

Build Tools

Shadow-cljs

You'll want to put something like this in the build section of your shadow-cljs.edn file:

 :builds
 {:repl ; <- just for getting a stable connection for repling, optional
  {:target :browser
   :output-dir "resources/public"
   :modules {:repl {:entries [dashboard.core]
                    :web-worker true}}}
  :sw
  {:target :browser
   :output-dir "resources/public"
   :modules {:sw {:entries [dashboard.core]
                  :web-worker true}}}
  :core
  {:target     :browser
   :output-dir "resources/public"
   :modules
   {:shared {:entries []}
    :screen
    {:init-fn dashboard.screen/init!
     :depends-on #{:shared}}
    :core
    {:init-fn dashboard.core/init!
     :depends-on #{:shared}
     :web-worker true}}}}}

You can get by with less, but if you want a comfortable repling experience then you want your build file to look something similar to the above. Technically, you can get get by with just a single build artifact if you're careful enough in your worker code to never access js/window, js/document, etc. But, by having a screen.js artifact, you can more easily separate your "main thread" code from your web worker code. Having a separate service worker artifact (:sw) is fine because it doesn't need your deps - we only use it for getting blocking semantics in web workers. Having a separate :repl artifact is helpful for when you're using an IDE that only allows you to select repl connections on a per-build-id basis (such as VsCode Calva, which I use).

The sub-project in this repo - shadow_dashboard - has an example project with a working build config (similar to the above) that you can use as an example to get started.

To launch the project in Calva, type shift-cmd-p and choose "Start a Project REPL and Connect" and then enable the three build options that come up. When it asks which build you want to connect to, select :repl. You can also connect to :screen and that will be a stable connection as well. For :core, however, in the above configuration, there will be lots of web worker connections pointed to it and you can't control which one will have ended up as the current connection.

You can choose in Calva which build your are currently connected to by typing shift-cmd-p and choosing "Select CLJS Build Connection".

There are lots of possibilities with build configurations around web workers and eventually there will be an entire wiki article here on just that topic. Please file an issue if you find improved workflows or have questions about how to get things working.

Figwheel

There currently isn't a figwheel build configuration example provided in this repo, but I've had prior versions of this library working on figwheel and I'm hoping to have examples here soon - I just haven't had time. Please submit a PR if you get a great build configuration similar to the one above for shadow.

cljs.main

As with figwheel, a solid set of directions for getting this working with the default cljs.main build tools is forthcoming - PRs welcome!

cljs-thread.core/init!

Eventually, once all the different build tools have robust configurations, I would like to iron out a set of default configurations within cljs-thread such that things Just Work - just like spawning threads on the JVM. For now, you have to provide cljs-thread details on what your build configuration is in code with cljs-thread.core/init! like so:

(thread/init!
 {:sw-connect-string "/sw.js"
  :repl-connect-string "/repl.js"
  :core-connect-string "/core.js"})

:sw-connect-string defines where your service worker artifact is found, relative to the base directory of your server. Same goes for :repl-connect-string and :core-connect-string. You can also provide a :root-connect-string, :future-connect-string and :injest-connect-string - if you don't, they will default to your :core-connect-string.

Demo

https://johnmn3.github.io/cljs-thread/

The shadow-dashboard example project contains a standard dashboard demo built on re-frame/mui-v5/comp.el/sync-IndexedDB, with application logic (reg-subs & reg-event functions) moved into a webworker, where only react rendering is handled on the screen thread, allowing for buttery-smooth components backed by large data transformations in the workers.

Screen Shot 2022-07-30 at 5 46 23 PM

spawn

There are a few ways you can spawn a worker.

No args:

(def s1 (spawn))

This will create a webworker and you'll be able to do something with it afterwards.

Only a body:

(spawn (println :addition (+ 1 2 3)))
;:addition 6

This will create a webworker, run the code in it (presumably for side effects) and then terminate the worker. This is considered an ephemeral worker.

Named worker:

(def s2 (spawn {:id :s2} (println :hi :from thread/id)))
;:hi :from :s2

This creates a webworker named :s2 and you'll be able to do something with s2 afterwards.

Ephemeral deref:

(println :ephemeral :result @(spawn (+ 1 2 3)))
;:ephemeral :result 6

In workers, spawn returns a derefable, which returns the body's return value. In the main/screen thread, it returns a promise:

(-> @(spawn (+ 1 2 3))
    (.then #(println :ephemeral :result %)))
;:ephemeral :result 6

Note: The deref (@(spawn...) forces the result to be resolved for .thening. Choosing not to deref on the main thread, as with on workers, implies the evaluation is side effecting and that we don't care about returning the result.

In a worker, you can force the return of a promise with (spawn {:promise? true} (+ 1 2 3)) if you'd rather treat it like a promise:

(-> @(js/Promise.all #js [(spawn {:promise? true} 1) (spawn {:promise? true} 2)])
    (.then #(println :res (js->clj %))))
;:res [1 2]

spawn has more features, but they mostly match the features of in and future which we'll go over below. spawn has a startup cost that we don't want to have to pay all the time, so you should use it sparingly.

in

Now that we have some workers, let's do some stuff in them:

(in s1 (println :hi :from :s1))
;:hi :from :s1

We can also make chains of execution across multiple workers:

(in s1
    (println :now :we're :in :s1)
    (in s2
        (println :now :we're :in :s2 :through :s1)))
;:now :we're :in :s1
;:now :we're :in :s2 :through :s1

You can also deref the return value of in:

@(in s1 (+ 1 @(in s2 (+ 2 3))))
;=> 6

Binding conveyance

For most functions, cljs-thread will try to automatically convey local bindings, as well as vars local to the invoking namespace, across workers:

(let [x 3] 
    @(in s1 (+ 1 @(in s2 (+ 2 x)))))
;=> 6

That works for both symbols bound to the local scope of the form and to top level defs of the current namespace. So this will work:

(def x 3)
@(in s1 (+ 1 @(in s2 (+ 2 x))))

Some things however cannot be transmitted. This will not work:

(def y (atom 3))
@(in s1 (+ 1 @(in s2 (+ 2 @y))))

Atoms will not be serialized. That would break the identity semantics that atoms provide. If the current namespace is being shared between both sides of the invocation and you want to reference an atom that lives on the remote side without conveying the local one, you can either:

  • Define it in another namespace, so the local version is not conveyed (it's not a bad idea to define stateful things in a special namespace anyway); or
  • Declare the invocation with :no-globals? like @(in s1 {:no-globals? true} (+ 1 @(in s2 (+ 2 @y)))). This way, you can have y defined in the same namespace on both ends of the invocation but you'll be explicitly referencing the one on the remote side; or
  • Use an explicit conveyence vector that does not include the local symbol, like @(in s1 [s2] (+ 1 @(in s2 (+ 2 @y)))). Using explicit conveyance vectors disables implicit conveyance altogether.

As mentioned above, you can also explicity define a conveyance vector:

@(in s1 [x s2] (+ 1 @(in s2 (+ 2 x))))
;=> 6

Here, [x s2] declares that we want to pass x (here defined as 3) through to s2. We don't need to declare it again in s2 because now it is implicitly conveyed as it is in the local scope of the form.

We could also avoid passing s2 by simpling referencing it by its :id:

@(in s1 [x] (+ 1 @(in :s2 (+ 2 x))))
;=> 6

However, you can't mix both implicit and explicit binding conveyance:

(let [z 3]
  @(in s1 [x] (+ 1 @(in :s2 (+ x z)))))
;=> nil

Rather, this would work:

(let [z 3]
  @(in s1 [x y] (+ 1 @(in :s2 (+ x z)))))
;=> 7

The explicit conveyance vector is essentially your escape hatch, for when the simple implicit conveyance isn't enough or is too much.

yield

When you want to convert an async javascript function into a synchronous one, yield is especially useful:

(->> @(in s1 (-> (js/fetch "http://api.open-notify.org/iss-now.json")
                 (.then #(.json %))
                 (.then #(yield (js->clj % :keywordize-keys true)))))
     :iss_position
     (println "ISS Position:"))
;ISS Position: {:latitude 44.4403, :longitude 177.0011}

Note: binding conveyance and yield also work with spawn

(let [x 6]
  @(spawn (yield (+ x 2)) (println :i'm :ephemeral)))
;:i'm :ephemeral
;=> 8

You can also nest spawns

@(spawn (+ 1 @(spawn (+ 2 3))))
;=> 6

But that will take 10 to 100 times longer, due to worker startup delay, so make sure that your work is truly heavy and ephemeral. With re-frame, react and a few other megabytes of dev-time dependencies loaded in /core.js, that call took me about 1 second to complete - not very fast.

Also note: You can use yield to temporarily prevent the closing of an ephemeral spawn as well:

@(spawn (js/setTimeout
         #(yield (println :finally!) (+ 1 2 3))
         5000))
;:finally!
;=> 6

Where 6 took 5 seconds to return - handy for async tasks in ephemeral workers.

future

You don't have to create new workers though. cljs-thread comes with a thread pool of workers which you can invoke future on. Once invoked, it will grab one of the available workers, do the work on it and then free it when it's done.

(let [x 2]
  @(future (+ 1 @(future (+ x 3)))))
;=> 6

That took about 20 milliseconds.

Note: A single synchronous future call will cost you around 8 to 10 milliseconds. A single synchronous in call will cost you around 4 to 5 milliseconds, depending on if it needs to be proxied.

Again, all of these constructs return promises on the main/screen thread:

(-> @(future (-> (js/fetch "http://api.open-notify.org/iss-now.json")
                 (.then #(.json %))
                 (.then #(yield (js->clj % :keywordize-keys true)))))
    (.then #(println "ISS Position:" (:iss_position %))))
;ISS Position: {:latitude 45.3612, :longitude -110.6497}

You wouldn't want to do this for such a lite-weight api call, but if you have some large payloads that you need fetched and normalized, it can be convenient to run them in futures for handling off the main thread.

cljs-thread's blocking semantics are great for achieving synchronous control flow when you need it, but as shown above, it has a performance cost of having to wait on the service worker to proxy results. Therefore, you wouldn't want to use them in very hot loops or for implementing tight algorithms. We can beat single threaded performance though if we're smart about chunking work up into large pieces and fanning it across a pool of workers. You can design your own system for doing that, but cljs-thread comes with a solution for pure functions: =>>. It also comes with a version of pmap. (see the official clojure.core/pmap for more info)

pmap

pmap lazily consumes one or more collections and maps a function across them in parallel.

(def z inc)
(let [i +]
  (->> [1 2 3 4]
       (pmap (fn [x y] (pr :x x :y y) (z (i x y))) [9 8 7 6])
       (take 2)))
;:x 9 :y 1
;:x 8 :y 2
;=> (11 11)

Taking an example from clojuredocs.org:

;; A function that simulates a long-running process by calling thread/sleep:
(defn long-running-job [n]
    (thread/sleep 1000) ; wait for 1 second
    (+ n 10))

;; Use `doall` to eagerly evaluate `map`, which evaluates lazily by default.

;; With `map`, the total elapsed time is just over 4 seconds:
user=> (time (doall (map long-running-job (range 4))))
"Elapsed time: 4012.500000 msecs"
(10 11 12 13)

;; With `pmap`, the total elapsed time is just over 1 second:
user=> (time (doall (pmap long-running-job (range 4))))
"Elapsed time: 1021.500000 msecs"
(10 11 12 13)

=>>

injest is a library that makes it easier to work with transducers. It provides a x>> macro for Clojure and Clojurescript that converts thread-last macros (->>) into transducer chains. For Clojure, it provides a =>> variant that also parallelizes the transducers across a fork-join pool with r/fold. However, because we've been lacking blocking semantics in the browser, it was unable to provide the same macro to Clojurescript.

cljs-thread provides the auto-transducifying, auto-parallelizing =>> macro that injest was missing.

So, suppose you have some non-trivial work:

(defn flip [n]
  (apply comp (take n (cycle [inc dec]))))

On a single thread, in Chrome, this takes between 16 and 20 seconds (on this computer):

(->> (range)
     (map (flip 100))
     (map (flip 100))
     (map (flip 100))
     (take 1000000)
     (apply +)
     time)

On Safari and Firefox, that will take between 60 and 70 seconds.

Let's try it with =>>:

(=>> (range)
     (map (flip 100))
     (map (flip 100))
     (map (flip 100))
     (take 1000000)
     (apply +)
     time)

On Chrome, that'll take only about 8 to 10 seconds. On Safari it takes about 30 seconds and in Firefox it takes around 20 seconds.

So in Chrome and Safari, you can roughly double your speed and in Firefox you can go three or more times faster.

By changing only one character, we can double or triple our performance, all while leaving the main thread free to render at 60 frames per second. Notice also how it's lazy :)

Note: On the main/screen thread, =>> returns a promise. =>> defaults to a chunk size of 512.

Stepping debugger

The blocking semantics that cljs-thread provides open up the doors to a lot of things that weren't possible in Clojurescript/Javascript and the browser in general. One of these things is a stepping debugger in the runtime (outside of the JS console debugger). cljs-thread ships with a simple example of a stepping debugger:

(dbg
 (let [x 1 y 3 z 5]
   (println :starting)
   (dotimes [i z]
     (break (= i y))
     (println :i i))
   (println :done)
   x))
;:starting
;:i 0
;:i 1
;:i 2
;=> :starting-dbg

dbg is a convenience macro for sending a form to a debug worker that is constantly listening for new forms to evaluate for the purpose of debugging. break stops the execution from running beyond a particular location in the code. It also takes an optional form that defines when the break should stop the execution. Upon entering the break, the debugger enters a sub-loop, waiting for forms which can inspect the local variables of the form in the context of the break.

The in? macro allows you to send forms to the break context within the debugger:

(in? z)
;=> 5
(in? i)
;=> 3
(in? [i x y z])
;=> [3 1 3 5]
(in? [z y x])
;=> [5 3 1]
(in? a)
;=> nil

For forms that have symbols that are not locally bound variables in the remote form, you must declare an explicit conveyance vector containing the variables that should be referenced:

(in? [x i] (+ x i))
;=> 4

The in? macro above cannot know ahead of time that the form in the dbg instance hasn't locally re-bound the + symbol. Therefore, for non-simple forms, the conveyance vector is necessary to disambiguate which symbols require resolving in the local context of the remote form and which don't.

By evaluating :in/exit, the running break context exits and the execution procedes to either the next break or until completion.

(in? :in/exit)
;:i 3
;:i 4
;:done
;=> 1

This is just a rudimentary implementation of a stepping debugger. I've added keybindings for usage in Calva.

It would be nice to implement a sub-repl that wrapped repl evaluations in the in? macro until exit. It would also be nice to implement an nrepl middle where for the same thing, transparently filling in the missing bits for cider's debugging middleware, such that editors like emacs and calva can automatically use their debugging workflows in a Clojurescript context. There's a github issue for this feature in the cider repo and it would be nice to finally be able to unlock this capability for browser development. PRs welcome!

Note: There are a host of other use cases that weren't previously possible that become possible with blocking semantics. Another example might be porting Datascript to IndexedDB using a synchronous set/get interface. If there are any other possibilities that come to mind - things you've always wanted to be able to do but weren't able to due to the lack of blocking semantics in the browser - feel free to drop a request in the issues and we can explore it.

Some history

cljs-thread is derived from tau.alpha which I released about four years ago. That project evolved towards working with SharedArrayBuffers (SABs). A slightly update version of tau.alpha is available here: https://gitlab.com/johnmn3/tau and you can see a demo of the benefits of SABs here: https://simultaneous.netlify.app/

At an early point during the development of tau.alpha about four years ago, I got blocking semantics to work with these synchronous XHRs and hacking the response from a sharedworker. I eventually abandoned this strategy when I discovered you could get blocking semantics and better performance out of SABs and js/Atomics.

Unfortunately there was lot's of drama around the security of SABs and, years later, they require very constraining security settings, making their usage impractical for some deployment situations. Compared to using typed arrays in tau.alpha, you'll never get that same performance in cljs-thread, in terms of worker-to-worker communication - in tau.alpha you're literally using shared memory - but there's no reason these other features shouldn't be available in non-SAB scenarios, so I figured it would make sense to extract these other bits out into cljs-thread and build V2 of tau.alpha on top of it. With tau.beta, built on cljs-thread, I'll be implementing SAB-less variants of atoms and agents, with similar semantics to that of Clojure's. Then I'll be implementing SAB-based versions that folks can opt in to if desired.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published