Making Clojure Even Sweeter

⇐ Current Version (click for dep string)

⇐ Full API docs (click for cljdoc.org)

Old-style API Docs on GitHub Pages (codox)

Tupelo Overview

Have you ever wanted to do something simple but clojure.core doesn’t support it? Or, maybe you are wishing for an enhanced version of a standard function. If so, then Tupelo is for you! Tupelo is a library of helper and convenience functions to make working with Clojure simpler, easier, and more bulletproof.

Tupelo Organization

The functionality of the Tupelo library is divided into a number of namespaces, each with a single area of focus. These are:

Tupelo Core - A library of helper functions for core Clojure.

Please see the tupelo.core docs further below.

Tupelo-Forest - A library for searching & manipulating tree-like data structures

Please see the tupelo.forest docs for further information.

Tupelo-Datomic - A library of helper functions for Datomic.

The tupelo-datomic library has been split out into an independent project. Please see the tupelo-datomic project for details.

Tupelo CSV - Functions for using CSV (Comma Separate Value) files

The standard clojure-csv library has well-tested and useful functions for parsing CSV (Comma Separated Value) text data, but it does not offer all of the convenience one may wish. Tupelo CSV emphasizes the idomatic Clojure usage of data, using sequences and maps. Please see the tupelo.csv docs.

Tupelo Parse - Functions to ease text parsing

Please see the tupelo.parse docs.

Tupelo String - Functions to ease string operations

Please see the tupelo.string docs.

Tupelo Schema - Type Definitions

Enables type checking in Clojure code via Plumatic Schema. Please see the source code for definitions, and the James Bond example code for examples of the type-checking in action.

Tupelo Types - A collection of functions for testing object types

Please see the tupelo.types docs.

Tupelo Misc - A grab bag of functions that don’t fit anywhere else (yet!)

Please see the tupelo.misc docs.

tupelo.base64 - Convert to/from base64 encoding.

Please see the tupelo.base64 docs.

tupelo.base64url - Convert to/from base64url encoding.

Please see the tupelo.base64url docs.

Tupelo Y64 - Convert to/from the URL-safe Y64 encoding (Yahoo YUI library).

Please see the tupelo.y64 docs.

Tupelo Core Overview

Have you ever wanted to do something simple but clojure.core doesn’t support it? Or, maybe you are wishing for an enhanced version of a standard function. The goal of tupelo.core is to add support for these convenience features, so that you have a simple way of using either the enhanced version or the original version.

The goal in using tupelo.core is that you can just plop it into any namespace without having to worry about any conflicts with clojure.core functionality. So, both the core functions and the added/enhanced functions are both available for use at all times. As such, we normally refer tupelo.core into our namespace as follows:

(ns my.proj
  (:use tupelo.core)
  (:require
    [clojure.string :as str]
    ... ))

Expression Debugging

Have you ever been debugging some code and had trouble printing out intermediate values? For example:

(-> 1
    (inc)       ; want to print value in pipeline after "(inc)" expression
    (* 2))
4

Suppose you want to display the value after the (inc) function. You can’t just insert a (println …) because the return value of nil will break the pipeline structure. Instead, just use spy:

(-> 1
    (inc)
    (spy)       ; print value at this location in pipeline
    (* 2))
; spy => 2      ; output from spy
4               ; return value from the threading pipeline

This tool is named spy since it can display values from inside any threading form without affecting the result of the expression. In this case, spy printed the value 2 resulting from the (inc) expression. Then, the value 2 continued to flow through the following expressions in the pipeline so that the return value of the expression is unchanged.

You can add in a keyword message to label each spy output:

(-> 1
    (inc)
    (spy :after-inc)      ; add a custom keyword message
    (* 2))
; :after-inc => 2          ; spy output is labeled with keyword message
4                         ; return value is unchanged

Note that spy works equally well inside either a "thread-first" or a "thread-last" form (e.g. using -> or ->>), without requiring any changes.

(->> 1
    (inc)
    (spy :after-inc)      ; spy works equally with both  ->  and  ->>  forms
    (* 2))
; :after-inc => 2
4

How does spy accomplish this trick? The answer is that the keyword message is assumed to be the label, since interesting debug values are more likely to be strings, numbers, or collections like vectors & maps (if both args are keywords, an exception is thrown; use some other technique for debugging this use-case). Thus, spy can detect whether it is in a thread-first or thread-last form, and then label the output correctly. A side benefit is that keywords like :after-inc or just :110 are easy to grep for in output log files.

As a bonus for debugging, the value is output using (pr-str …) so that numbers and strings are unambiguous in the output:

(-> 30
    (+ 4)
    (spy :dbg)
    (* 10))
; :dbg => 34            ; integer result = 34
340

(-> "3"
    (str "4")
    (spy :dbg)
    (str "0"))
; :dbg => "34"          ; string result = "34"
"340"

Sometimes you may prefer to print out the literal expression instead of a keyword label. In this case, just use spyx (short for "spy expression") :

(it-> 1                 ; tupelo.core/it->
      (spyx (inc it))
      (* 2 it))
; (inc it) => 2     ; the expression is used as the label
4

In other instances, you may wish to use spyxx to display the expression, its type, and its value:

(defn mystery-fn [] (into (sorted-map) {:b 2 :a 1}))
(spyxx (mystery-fn))
;  (mystery-fn) =>  <#clojure.lang.PersistentTreeMap {:a 1, :b 2}>"

Non-pure functions (i.e. those with side-effects) are safe to use with spy. Any expression supplied to spy will be evaluated only once.

Sometimes you may just want to save some repetition for a simple printout:

(def answer 42)
(spyx answer)
; answer => 42

To be precise, the function signatures for the spy family are:

(spy <expr>)             ; print value of <expr> w/o custom message string
(spy <expr> :kw-label)   ; works with ->
(spy :kw-label <expr>)   ; works with ->>
(spyx  <expr>)           ; prints <expr> and its value
(spyxx <expr>)           ; prints <expr>, its type, and its value

If you are debugging a series of nested function calls, it can often be handy to indent the spy output to help in visualizing the call sequence. Using with-spy-indent will give you just what you want:

(doseq [x [:a :b]]
  (spyx x)
  (with-spy-indent
    (doseq [y (range 3)]
      (spyx y))))
x => :a
  y => 0
  y => 1
  y => 2
x => :b
  y => 0
  y => 1
  y => 2

Literate Threading Macro

We all love to use the threading macros -> and ->> for certain tasks, but they only work if all of the forms should be threaded into the first or last argument.

The built-in threading macro as-> can avoid this problem, but the order of the first expression and the placeholder symbol is arguably backwards from what most users would expect. Also, there is often no obvious name to use for the placeholder symbol. Re-using a good idea from Groovy (also copied by Kotlin), we simply use the symbol it as the placeholder symbol in each expression to represent the value of the previous result.

(it-> 1
      (inc it)                                  ; thread-first or thread-last
      (+ it 3)                                  ; thread-first
      (/ 10 it)                                 ; thread-last
      (str "We need to order " it " items." )   ; middle of 3 arguments
;=> "We need to order 2 items." )

Here is a more complicated example. Note that we can assign into a local let block from the it placeholder value:

(it-> 3
      (spy :initial it)
      (let [x it]
        (inc x))
      (spy it :222)
      (* it 2)
      (spyx it))
; :initial => 3
; :222 => 4
; it => 8
8           ; return value

More examples can be found here.

The it-> macro has a cousin cond-it-> that allows you to thread the updated value through both the conditional and the action expressions:

(let [params {:a 1 :b 1 :c nil :d nil}]
  (cond-it-> params
    (:a it)        (update it :b inc)
    (= (:b it) 2)  (assoc it :c "here")
    (:c it)        (assoc it :d "again")))

;=> {:a 1, :b 2, :c "here", :d "again"}

Map Value Lookup

Maps are convenient, especially when keywords are used as functions to look up a value in a map. Unfortunately, attempting to look up a non-existent keyword in a map will return nil. While sometimes convenient, this means that a simple typo in the keyword name will silently return corrupted data (i.e. nil) instead of the desired value.

Instead, use the function grab for keyword/map lookup:

(grab k m)
  "A fail-fast version of keyword/map lookup.  When invoked as (grab :the-key the-map),
   returns the value associated with :the-key as for (clojure.core/get the-map :the-key).
   Throws an Exception if :the-key is not present in the-map."

(def sidekicks {:batman "robin" :clark "lois"})
(grab :batman sidekicks)
;=> "robin"

(grab :spiderman sidekicks)
;=> IllegalArgumentException Key not present in map:
map : {:batman "robin", :clark "lois"}
keys: [:spiderman]

The function grab should also be used in place of clojure.core/get. Simply reverse the order of arguments to match the "keyword-first, map-second" convention.

For looking up values in nested maps, the function fetch-in replaces clojure.core/get-in:

(fetch-in m ks)
  "A fail-fast version of clojure.core/get-in. When invoked as (fetch-in the-map keys-vec),
   returns the value associated with keys-vec as for (clojure.core/get-in the-map keys-vec).
   Throws an Exception if the path keys-vec is not present in the-map."

(def my-map {:a 1 :b {:c 3}})
(fetch-in my-map [:b :c])
3
(fetch-in my-map [:b :z])
;=> IllegalArgumentException Key seq not present in map:
;=>   map : {:b {:c 3}, :a 1}
;=>   keys: [:b :z]

Map Dissociation

Clojure has functions assoc & assoc-in, update & update-in, and dissoc. However, there is no function dissoc-in. The Tupelo function dissoc-in provides the desired functionality:

(dissoc-in the-map keys-vec)
  "A sane version of dissoc-in that will not delete intermediate keys.
   When invoked as (dissoc-in the-map [:k1 :k2 :k3... :kZ]), acts like
   (clojure.core/update-in the-map [:k1 :k2 :k3...] dissoc :kZ). That is, only
   the map entry containing the last key :kZ is removed, and all map entries
   higher than kZ in the hierarchy are unaffected."

The unit test shows the functions in action:

(let [my-map {:a { :b { :c "c" }}} ]
  (is (= (dissoc-in my-map []         ) my-map ))
  (is (= (dissoc-in my-map [:a      ] ) {} ))
  (is (= (dissoc-in my-map [:a :b   ] ) {:a {}} ))
  (is (= (dissoc-in my-map [:a :b :c] ) {:a { :b {}}} ))
  (is (= (dissoc-in my-map [:a :x :y] ) {:a { :b { :c "c" }
                                             :x nil }} )))

Note that if non-existant keys are included in keys-vec, any missing map layers will be constructed as necessary, which is consistant with the behavior of both clojure.core/assoc-in and clojure.core/update-in (note that nil is the value of the final map entry, not the empty map {} as for the other examples).

Note that only the map entry corresponding to the last key kZ is cleared. This differs from the dissoc-in function in the old clojure-contrib library which had the unpredictable behavior of recursively (& silently) deleting all keys in keys-vec corresponding to empty maps.

Gluing Together Like Collections

The concat function can sometimes have rather surprising results:

(concat {:a 1} {:b 2} {:c 3} )
;=>   ( [:a 1] [:b 2] [:c 3] )

In this example, the user probably meant to merge the 3 maps into one. Instead, the three maps were mysteriously converted into length-2 vectors, which were then nested inside another sequence.

The conj function can also surprise the user:

(conj [1 2] [3 4] )
;=>   [1 2  [3 4] ]

Here the user probably wanted to get [1 2 3 4] back, but instead got a nested vector by mistake.

Instead of having to wonder if the items to be combined will be merged, nested, or converted into another data type, we provide the glue function to always combine like collections together into a result collection of the same type:

; Glue together like collections:
(is (= (glue [ 1 2] '(3 4) [ 5 6] )       [ 1 2 3 4 5 6 ]  ))   ; all sequential (vectors & lists)
(is (= (glue {:a 1} {:b 2} {:c 3} )       {:a 1 :c 3 :b 2} ))   ; all maps
(is (= (glue #{1 2} #{3 4} #{6 5} )      #{ 1 2 6 5 3 4 }  ))   ; all sets
(is (= (glue "I" " like " \a " nap!" )   "I like a nap!"   ))   ; all text (strings & chars)

; If you want to convert to a sorted set or map, just put an empty one first:
(is (= (glue (sorted-map) {:a 1} {:b 2} {:c 3})   {:a 1 :b 2 :c 3} ))
(is (= (glue (sorted-set) #{1 2} #{3 4} #{6 5})  #{ 1 2 3 4 5 6  } ))

An Exception will be thrown if the collections to be 'glued' are not all of the same type. The allowable input types are:

all sequential: any mix of lists & vectors (vector result)
all maps (sorted or not)
all sets (sorted or not)
all text: any mix of strings & characters (string result)

Adding Values to the Beginning or End of a Sequence

Clojure has the cons, conj, and concat functions, but it is not obvious how they should be used to add a new value to the beginning of a vector or list:

; Add to the end
> (concat [1 2] 3)    ;=> IllegalArgumentException
> (cons   [1 2] 3)    ;=> IllegalArgumentException
> (conj   [1 2] 3)    ;=> [1 2 3]
> (conj   [1 2] 3 4)  ;=> [1 2 3 4]
> (conj  '(1 2) 3)    ;=> (3 1 2)       ; oops
> (conj  '(1 2) 3 4)  ;=> (4 3 1 2)     ; oops

; Add to the beginning
> (conj     1  [2 3] ) ;=> ClassCastException
> (concat   1  [2 3] ) ;=> IllegalArgumentException
> (cons     1  [2 3] ) ;=> (1 2 3)
> (cons   1 2  [3 4] ) ;=> ArityException
> (cons     1 '(2 3) ) ;=> (1 2 3)
> (cons   1 2 '(3 4) ) ;=> ArityException

Do you know what conj does when you pass it nil instead of a sequence? It silently replaces it with an empty list: (conj nil 5) ⇒ (5) This can cause you to accumulate items in reverse order if you aren’t aware of the default behavior:

(-> nil
  (conj 1)
  (conj 2)
  (conj 3))
;=> (3 2 1)

These failures are irritating and unproductive, and the error messages don’t make it obvious what went wrong. Instead, use the simple prepend and append functions to add new elements to the beginning or end of a sequence, respectively:

(append [1 2] 3  )   ;=> [1 2 3  ]
(append [1 2] 3 4)   ;=> [1 2 3 4]

(prepend   3 [2 1])  ;=> [  3 2 1]
(prepend 4 3 [2 1])  ;=> [4 3 2 1]

Both prepend and append always return a vector result.

Combining Scalars and Vectors

Suppose we have a mixture of scalars & vectors (or lists) that we want to combine into a single vector. We want a function ??? to give us the following result:

(???  1 2 3 [4 5 6] 7 8 9)  =>  [1 2 3 4 5 6 7 8 9]

Clojure doesn’t have a function for this. Instead we need to wrap all of the scalars into vectors and then use glue or concat:

; can wrap individually or in groups
(glue [1   2   3] [4 5 6] [7   8   9])  =>  [1 2 3 4 5 6 7 8 9]   ; could also use concat
(glue [1] [2] [3] [4 5 6] [7] [8] [9])  =>  [1 2 3 4 5 6 7 8 9]   ; could also use concat

It may be inconvenient to always wrap the scalar values into vectors just to combine them with an occasional vector value. Instead, it might be more convenient to unwrap the vector values, then combine the result with other scalars. We can do that with the ->vector and unwrap functions:

(->vector 1 2 3 4 5 6 7 8 9)             =>  [1 2 3 4 5 6 7 8 9]
(->vector 1 (unwrap [2 3 4 5 6 7 8]) 9)  =>  [1 2 3 4 5 6 7 8 9]

It will also work recursively for nested unwrap calls:

(->vector 1 (unwrap [2 3 (unwrap [4 5 6]) 7 8]) 9)  =>  [1 2 3 4 5 6 7 8 9]

Removing Values from a Sequence

Suppose you want to remove an element form a sequence. Did you know that Clojure has no equivalent to Java’s List.remove(int index) function? Well, now it does:

(s/defn drop-at :- ts/List
  "Removes an element from a collection at the specified index."
  [coll     :- ts/List
   index    :- s/Int]
  ...)

(is (= [  1 2] (drop-at (range 3) 0)))
(is (= [0   2] (drop-at (range 3) 1)))
(is (= [0 1  ] (drop-at (range 3) 2)))

Unlike the raw take and drop functions on which it is based, drop-at will throw an exception for invalid values of index.

Inserting Values into a Sequence

Suppose you want to insert an element into a sequence. Tupelo has you covered here as well:

(s/defn insert-at :- ts/List
  "Inserts an element into a collection at the specified index."
  [coll     :- ts/List
   index    :- s/Int
   elem     :- s/Any]
  ...)

(is (= [9 0 1] (insert-at [0 1] 0 9)))
(is (= [0 9 1] (insert-at [0 1] 1 9)))
(is (= [0 1 9] (insert-at [0 1] 2 9)))

As with assoc, you are allowed to insert the new element into the first empty slot after all existing elements, but no further. insert-at will throw an exception for invalid values of index.

Replacing Values in a Sequence

And, of course, you can also replace an element in a sequence:

(s/defn replace-at :- ts/List
  "Replaces an element in a collection at the specified index."
  [coll     :- ts/List
   index    :- s/Int
   elem     :- s/Any]
   ...)

(is (= [9 1 2] (replace-at (range 3) 0 9)))
(is (= [0 9 2] (replace-at (range 3) 1 9)))
(is (= [0 1 9] (replace-at (range 3) 2 9)))

As with drop-at, replace-at will throw an exception for invalid values of index.

Convenience in Testing Seq’s

Clojure has an empty? function to indicate if a collection has zero elements or is nil (i.e. not present). However, clojure has no corresponding not-empty? function, and people have written into the mailing wondering where it is. Well, now it is available:

(not-empty? coll)
 "For any collection, returns true if coll contains any items;
  otherwise returns false. Equivalent to (not (empty? coll))."

The unit test shows it in action:

(is (= (map not-empty? ["1"   [1]   '(1)  {:1 1}  #{1} ] )
                       [true  true  true  true    true ]  ))
(is (= (map not-empty? [""     []      '()    {}     #{}    nil   ] )
                       [false  false   false  false  false  false ] ))

(is (= (keep-if not-empty?  ["1" [1] '(1) {:1 1} #{1} ] )
                            ["1" [1] '(1) {:1 1} #{1} ] ))
(is (= (drop-if not-empty?  [""  []  '()  {}     #{}  nil] )
                            [""  []  '()  {}     #{}  nil] ))

Just to confuse things, Clojure does have the similarly named functions empty and not-empty. Be sure to avoid these two functions for predicate tests.

A similar, but more complicated, situation exists in the case of not-any?. Clojure has the not-any? function to indicate if a predicate is false for all items in a collection. However, there has never been a corresponding any? function such that

  (= (not-any?  pred coll)
     (not (any? pred coll)))

for any predicate and collection. The situation has become more confusion as of Clojure 1.9.0-alpha10 since a completely unrelated function any? has been added in support of clojure.spec. The new any? function is defined as:

(defn any?
  "Returns true given any argument."
  [x] true)

So the new any? function is a semantic mismatch to the not-any? function and is completely unrelated to testing a collection using a predicate.

The Tupelo library attempts to resolve this confusing situation by providing both positive and negative versions of the collection test with a name which does not conflict with either any? or not-any? in clojure.core:

(has-some? pred coll)
  "For any predicate pred & collection coll, returns true if (pred x) is logical true for at least one x in
   coll; otherwise returns false.  Like clojure.core/some, but returns only true or false."

(has-none? pred coll)
  "For any predicate pred & collection coll, returns false if (pred x) is logical true for at least one x in
   coll; otherwise returns true.  Equivalent to clojure.core/not-any?, and is the inverse of has-some?."

The unit test shows these functions in action:

(is (= true   (has-some? odd? [1 2 3] ) ))
(is (= false  (has-some? odd? [2 4 6] ) ))
(is (= false  (has-some? odd? []      ) ))

(is (= false  (has-none? odd? [1 2 3] ) ))
(is (= true   (has-none? odd? [2 4 6] ) ))
(is (= true   (has-none? odd? []      ) ))

Searching for entries in Collections, Maps, and Sets

Sometimes we want an easy way to find out if an item is n a collection. The Tupelo library supplies three convenient functions for this purpose: contains-elem?, contains-key?, and contains-val?.

The most generic function is contains-elem?, which is intended for vectors or any other clojure seq:

(testing "vecs"
  (let [coll (range 3)]
    (isnt (contains-elem? coll -1))
    (is   (contains-elem? coll  0))
    (is   (contains-elem? coll  1))
    (is   (contains-elem? coll  2))
    (isnt (contains-elem? coll  3))
    (isnt (contains-elem? coll  nil)))

  (let [coll [ 1 :two "three" \4]]
    (isnt (contains-elem? coll  :no-way))
    (isnt (contains-elem? coll  nil))
    (is   (contains-elem? coll  1))
    (is   (contains-elem? coll  :two))
    (is   (contains-elem? coll  "three"))
    (is   (contains-elem? coll  \4)))

  (let [coll [:yes nil 3]]
    (isnt (contains-elem? coll  :no-way))
    (is   (contains-elem? coll  :yes))
    (is   (contains-elem? coll  nil))))

Here we see that for an integer range or a mixed vector, contains-elem? works as expected for both existing and non-existant elements in the collection. For maps, we can also search for any key-value pair (expressed as a len-2 vector):

(testing "maps"
   (let [coll {1 :two "three" \4}]
     (isnt (contains-elem? coll nil ))
     (isnt (contains-elem? coll [1 :no-way] ))
     (is   (contains-elem? coll [1 :two]))
     (is   (contains-elem? coll ["three" \4])))
   (let [coll {1 nil "three" \4}]
     (isnt (contains-elem? coll [nil 1] ))
     (is   (contains-elem? coll [1 nil] )))
   (let [coll {nil 2 "three" \4}]
     (isnt (contains-elem? coll [1 nil] ))
     (is   (contains-elem? coll [nil 2] ))))

It is also straightforward to search a set:

(testing "sets"
  (let [coll #{1 :two "three" \4}]
    (isnt (contains-elem? coll  :no-way))
    (is   (contains-elem? coll  1))
    (is   (contains-elem? coll  :two))
    (is   (contains-elem? coll  "three"))
    (is   (contains-elem? coll  \4)))

  (let [coll #{:yes nil}]
    (isnt (contains-elem? coll  :no-way))
    (is   (contains-elem? coll  :yes))
    (is   (contains-elem? coll  nil)))))

For maps & sets, it is simpler (& more efficient) to use contains-key? to find a map entry or a set element:

(deftest t-contains-key?
  (is   (contains-key?  {:a 1 :b 2} :a))
  (is   (contains-key?  {:a 1 :b 2} :b))
  (isnt (contains-key?  {:a 1 :b 2} :x))
  (isnt (contains-key?  {:a 1 :b 2} :c))
  (isnt (contains-key?  {:a 1 :b 2}  1))
  (isnt (contains-key?  {:a 1 :b 2}  2))

  (is   (contains-key?  {:a 1 nil   2} nil))
  (isnt (contains-key?  {:a 1 :b  nil} nil))
  (isnt (contains-key?  {:a 1 :b    2} nil))

  (is   (contains-key? #{:a 1 :b 2} :a))
  (is   (contains-key? #{:a 1 :b 2} :b))
  (is   (contains-key? #{:a 1 :b 2}  1))
  (is   (contains-key? #{:a 1 :b 2}  2))
  (isnt (contains-key? #{:a 1 :b 2} :x))
  (isnt (contains-key? #{:a 1 :b 2} :c))

  (is   (contains-key? #{:a 5 nil   "hello"} nil))
  (isnt (contains-key? #{:a 5 :doh! "hello"} nil))

  (throws? (contains-key? [:a 1 :b 2] :a))
  (throws? (contains-key? [:a 1 :b 2]  1)))

And, for maps, you can also search for values with contains-val?:

(deftest t-contains-val?
  (is   (contains-val? {:a 1 :b 2} 1))
  (is   (contains-val? {:a 1 :b 2} 2))
  (isnt (contains-val? {:a 1 :b 2} 0))
  (isnt (contains-val? {:a 1 :b 2} 3))
  (isnt (contains-val? {:a 1 :b 2} :a))
  (isnt (contains-val? {:a 1 :b 2} :b))

  (is   (contains-val? {:a 1 :b nil} nil))
  (isnt (contains-val? {:a 1 nil  2} nil))
  (isnt (contains-val? {:a 1 :b   2} nil))

  (throws? (contains-val?  [:a 1 :b 2] 1))
  (throws? (contains-val? #{:a 1 :b 2} 1)))

As seen in the test, each of these functions works correctly when for searching for nil values.

Focus on Vectors

Clojure’s seq abstraction (and lazy seq’s) is very useful, but sometimes you just want everything to stay in a nice, eager, random-access vector. Here is an eager (non-lazy) version of for which always returns results in a vector:

(is= (forv [x (range 4)] (* x x))
       [0 1 4 9] )

Simplified Lazy Sequence Generation

Clojure training materials seem to vary somewhat in the recommended form for the generation of a lazy sequence. This is further complicated by the legacy function lazy-cat which can easily cause an out-of-memory error (please see this post). A simpler form is possible using tupelo.core/lazy-cons macro. An example of this form in use is:

(defn lazy-countdown [n]
  (when (<= 0 n)
    (lazy-cons n (lazy-countdown (dec n)))))

(deftest t-all
  (is= (lazy-countdown  5) [5 4 3 2 1 0] )
  (is= (lazy-countdown  1) [1 0] )
  (is= (lazy-countdown  0) [0] )
  (is= (lazy-countdown -1) nil ))

The new macro lazy-cons accepts the output value as the first arg, and a recursive function call as the second arg. The recursive call will have delayed-execution and will not be invoked until it is required. The (when <condition>) form returns nil to signal the termination of the lazy sequence.

Implementation note:

The canonical structure of when and lazy-cons shown above is not required, but is probably the simplest of multiple possible choices. The new form of (lazy-cons val (recursive-call…)) is nothing but a simplification of the original clojure.core form (lazy-seq (cons val (recursive-call…))) which reduces typing and the possibility of errors.

Please note that tupelo.core/lazy-cons bears no relation to the historical lazy-cons which was briefly considered for clojure.core circa 2008.

Generator Functions for Lazy Sequences (a la Python)

One of the nice features of Python is the ability to use Generator Functions. These allow a function to "yield" a result from anywhere in the code, which is placed in a lazy output buffer for consumption by the calling function. The generator function is "paused" until the output value is consumed, then resumes execution where it left off with all local state preserved. This ability is especially handy when you have nested loops or other structures that make it inconvenient to return a result as the last expression in a function.

(defn concat-gen    ; concat a list of collections
  [& collections]
  (lazy-gen
    (doseq [curr-coll collections]
      (doseq [item curr-coll]
        (yield item)))))

(defn concat-gen-pair
  [& collections]
  (lazy-gen
    (doseq [curr-coll collections]
      (doseq [item curr-coll]
        (yield-all [item item])))))

(def c1 [1 2 3])
(def c2 [4 5 6])
(def c3 [7 8 9])

(is= [1 2 3 4 5 6 7 8 9]                            (concat-gen       c1 c2 c3))
(is= [1 1  2 2  3 3  4 4  5 5  6 6  7 7  8 8  9 9]  (concat-gen-pair  c1 c2 c3))

lazy-gen uses a core.async channel to buffer output, with a default buffer size of 32 (controlled by the dynamic var lazy-gen-buffer-size). Result values passed to yield generate a lazy sequence that is the result of the (lazy-gen …) macro. The closely-related function yield-all inserts the elements of a collection onto the output stream instead of just a single value. Besides doseq, lazy-gen is also very handy for generating a lazy seq within a loop-recur expression.

Validating Intermediate Results

Within a processing chain, it is often desirable to verify that an intermediate value is within an expected range or of an expected type. The built-in assert function cannot be used for this purpose since it returns nil, and the Plumatic Schema validate can only perform a limited amount of type testing. The (validate …) function performs arbitrary validation, throwing an exception if a non-truthy result is returned:

(validate tstfn tstval)
 "Used to validate intermediate results. Returns tstval if the result of
  (tstfn tstval) is truthy.  Otherwise, throws IllegalStateException."

(is (= 3    (validate pos?        3    )))
(is (= 3.14 (validate number?     3.14 )))
(is (= 3.14 (validate #(< 3 % 4)  3.14 )))

A closely related function is verify. It is like validate but accepts an expression instead of a predicate/value pair. Upon success, the expression value is returned; otherwise an exception is thrown:

(throws? (verify (= 1 2)))
(is= 333 (verify (* 3 111))))

Convenient Wild-Card Matches

Sometimes in testing, we want to verify that a key-value pair is present in a map, but we don’t know or care what the value is. For example, Datomic returns maps containing the key :db/id, but the associated value is unpredictable. Tupelo provides the (matches? …) expression to make these tests a snap:

(matches? pattern & values)

(matches? { :a 1 :b _       }
          { :a 1 :b 99      }
          { :a 1 :b [1 2 3] }
          { :a 1 :b nil     } )   ;=> true
(matches? [1 _ 3] [1 2 3] )       ;=> true

Note that a wildcard can match either a primitive or a composite value. It works for both maps and vectors. The only restriction is that the wildcard symbol _ (underscore) cannot be used as a key in the pattern-map (it can be used anywhere in a vector-pattern)."

Fast & Simple Wild-Card Matches

Sometimes using core.match is overkill. For some patterns & values it can run very slowly or even create a stack overflow exception. For most cases, all you really need is a simple wildcard match.

The wild-match? function returns true if a pattern is matched by one or more values. The special keyword :* (colon-star) in the pattern serves as a wildcard value. Note that a wildcard can match either a primitive or a composite value: Usage:

(wild-match? pattern & values)

Samples:

(wild-match?  {:a :* :b 2}
              {:a 1  :b 2})         ;=> true

(wild-match?  [1 :* 3]
              [1 2  3]
              [1 9  3] ))           ;=> true

(wild-match?  {:a :*       :b 2}
              {:a [1 2 3]  :b 2})   ;=> true

Map Entries (Key-Value pairs)

Sometimes you want to extract the keys & values from a map for manipulation or extension before building up another map (especially useful for manipulating default function args). Here is very handy function for that:

(keyvals m)
 "For any map m, returns the keys & values of m as a vector,
  suitable for reconstructing via (apply hash-map (keyvals m))."

(keyvals {:a 1 :b 2})
;=> [:b 2 :a 1]
(apply hash-map (keyvals {:a 1 :b 2}))
;=> {:b 2, :a 1}

Default Value in Case of Exception

Sometimes you know an operation may result in an Exception, and you would like to have the Exception converted into a default value. That is when you need:

(with-exception-default default-val & body)
 "Evaluates body & returns its result.  In the event of an exception the
  specified default value is returned instead of the exception."

(with-exception-default 0
  (Long/parseLong "12xy3"))
;=> 0

This feature is put to good use in tupelo.parse, where you will find functions that work like this:

(parse-long "123")                  ; throws if parse error
;=> 123
(parse-long "1xy23" :default 666)   ; returns default val if parse error
;=> 666

Floating Point Number Comparison

Everyone knows that you shouldn’t compare floating-point numbers (e.g. float, double, etc) for equality since roundoff errors can prevent a precise match between logically equivalent results. However, it has always been awkward to regenerate "approx-equals" code by hand every time new project requires it. Here we have a simple function that compares two floating-point values (cast to double) for relative equality by specifying either the number of significant digits that must match or the maximum error tolerance allowed:

(rel= val1 val2 & opts)
 "Returns true if 2 double-precision numbers are relatively equal, else false.
  Relative equality is specified as either (1) the N most significant digits are
  equal, or (2) the absolute difference is less than a tolerance value.  Input
  values are coerced to double before comparison."

An extract from the unit tests illustrates the use of rel=

(is      (rel=   123450000   123456789 :digits 4 ))       ; .12345 * 10^9
(is (not (rel=   123450000   123456789 :digits 6 )))
(is      (rel= 0.123450000 0.123456789 :digits 4 ))       ; .12345 * 1
(is (not (rel= 0.123450000 0.123456789 :digits 6 )))

(is      (rel= 1 1.001 :tol 0.01 ))                       ; :tol value is absolute error
(is (not (rel= 1 1.001 :tol 0.0001 )))

Note that, for the :digits variant, 'equality' is truly relative, since only the N most significant digits of each value must match.

String Operations

Be sure to see the dedicated functions in the tupelo.string namespace!

Suppose you have a bunch of nested results and you just want to convert everything into a single string. In that case, strcat is for you:

(is (= (strcat "I " [ \h \a nil \v [\e \space (byte-array [97])
                      [ nil 32 "complicated" (Math/pow 2 5) '( "str" nil "ing") ]]] )
       "I have a complicated string" ))

Note that any nil values map to the empty string as with clojure.core/str.

Sometimes, you may wish to clip a string to a maximum length for ease of display. In that case, use clip-str:

(is (= "abc"             (clip-str  3 "abcdefg")))
(is (= "{:a 1, :"        (clip-str  8 (sorted-map :a 1 :b 2) )))
(is (= "{:a 1, :b 2}"    (clip-str 99 (sorted-map :a 1 :b 2) )))

Notice that clip-str will accept any argument type (map, sequence, etc), and convert it into a string for you. Also, it will work correctly even if the clip-length is an upper bound; shorter strings are returned unchanged.

Keeping & Dropping Elements of a Sequence

When processing sequences of data, we often need to extract a sequence of desired data, or, conversely, remove all of the undesired elements. Have you ever been left wondering which of these two forms is correct?

(let [result (filter even? (range 10)) ]
  (assert (or (= result [ 1 3 5 7 9 ] )     ; is it "remove bad" (falsey)
              (= result [ 0 2 4 6 8 ] ))))  ; or    "keep good"  (truthy) ???

I normally think of filters as removing bad things. Air filters remove dust. Coffee filters keep coffee grounds out of my cup. A noise filter in my stereo removes contaminating frequencies from my music. However, filter in Clojure is written in reverse, so that it keeps items identified by the predicate. Wouldn’t be nicer (and much less ambiguous) if you could just write the following?

(is (= [0 2 4 6 8]  (keep-if even? (range 10))
                    (drop-if odd?  (range 10))))

It seems to me that keep-if and drop-if are much more natural names and remove ambiguity from the code. Of course, these are just thin wrappers around the built-in clojure.core functions, but they are much less ambiguous. I think they make the code easier to read and the intent more obvious.

Keeping & Dropping Elements from a Map or Set

The two functions keep-if and drop-if can be used equally well in order to retain or remove elements from a clojure map or set. The semantics for sets look the same as for a sequence (vector or list). The predicate can be any 1-arg function:

(keep-if even? #{1 2 3 4 5} )
;=> #{4 2}
(drop-if even? #{1 2 3 4 5} )
;=> #{1 3 5}

Notice that the functions recognized the input collection as a set, and returned a set as the result. Very convenient.

For maps, each element is a MapEntry, which contains both a key and value. keep-if and drop-if understand maps, and will destructure each MapEntry. Thus, the predicate function can be any 2-arg function:

(def mm {10  0,   20 0
         11  1,   21 1
         12  2,   22 2
         13  3,   23 3} )

(is (= (keep-if   (fn [k v] (odd?  v))  mm)
       (drop-if   (fn [k v] (even? v))  mm)
        {11  1,   21 1
         13  3,   23 3} ))

(is (= (keep-if  (fn [k v] (< k 19))  mm)
       (drop-if  (fn [k v] (> k 19))  mm)
        {10  0
         11  1
         12  2
         13  3} ))

As with sets, the functions recognized that a map was supplied, accepted a 2-arg predicate function, and returned back a map to the user.

Both keep-if and drop-if will throw an Exception if the predicate function supplied has the wrong arity, or if the supplied collection is not one of either the sequential (vector or list), map, or set data types.

Extracting Only Values

The pervasive use of seq’s in Clojure means that scalar values often appear wrapped in a vector or some other sequence type. As a result, one often sees code like (first some-var) and it is not always clear that the code is simply "unwrapping" a scalar value, since there could well be remaining values in the sequence. Indeed, for a length-1 sequence it would be equally valid to use (last some-var) since first=last if there is only one item in the list.

To clarify that we are simply unwrapping a single value from the sequence, we may use the function only:

(only seq-arg)
 "Ensures that a sequence is of length=1, and returns the only value present.
  Throws an exception if the length of the sequence is not one.  Note that,
  for a length-1 sequence S, (first S), (last S) and (only S) are equivalent."

Getting Past Second Base

Clojure has the functions first, second, and requires the use of nth for any subsequent position. Sometimes it is handy to have a quick way to grab the 3rd item from a sequential collection. Tupelo provides the third function to fill this void:

(is= nil (third [       ]))
(is= nil (third [1      ]))
(is= nil (third [1 2    ]))
(is= 3   (third [1 2 3  ]))
(is= 3   (third [1 2 3 4]))

The Truth Is Not Ambiguous

Clojure marries the worlds of Java and Lisp. Unfortunately, these two worlds have different ideas of truth, so Clojure accepts both false and nil as false. Sometimes, however, you want to coerce logical values into literal true or false values, so we provide a simple way to do that:

(truthy? arg)
 "Returns true if arg is logical true (neither nil nor false);
  otherwise returns false."

(falsey? arg)
 "Returns true if arg is logical false (either nil or false);
  otherwise returns false. Equivalent to (not (truthy? arg))."

Since truthy? and falsey? are functions (instead of special forms or macros), we can use them as an argument to filter or any other place that a higher-order-function is required:

(def data [true :a 'my-symbol 1 "hello" \x false nil])
(filter truthy? data)
;=> [true :a my-symbol 1 "hello" \x]
(filter falsey? data)
;=> [false nil]

(is (every? truthy? [true :a 'my-symbol 1 "hello" \x] ))
(is (every? falsey? [false nil] ))

(let [count-if (comp count keep-if) ]
  (let [num-true    (count-if truthy? data)   ; <= better than (count-if boolean data)
        num-false   (count-if falsey? data) ] ; <= better than (count-if not     data)
    (is (and  (= 6 num-true)
              (= 2 num-false) )))))

Keeping It Simple with `not-nil?`

Clojure has the build-in function some to return the first truthy value from a sequence argument. It also has the poorly named function some? which returns the value true if a scalar argument satisfies (not (nil? arg)). It is easy to confuse some and some?, not only in their return type but also in the argument they accept (sequence or scalar). In keeping with the style for other basic test functions, we provide the function not-nil? as the opposite of nil?.

The unit tests show how not-nil? leads to a more natural code syntax:

(let [data [true :a 'my-symbol 1 "hello" \x false nil] ]
  (let [notties   (keep-if not-nil? data)
        nillies   (drop-if not-nil? data) ]
    (is (and  (= notties [true :a 'my-symbol 1 "hello" \x false] )
              (= nillies [nil] )))
    (is (every?   not-nil? notties))        ; the 'not' can be used
    (is (not-any?     nil? notties)))       ;   in either first or 2nd positon

  (let [count-if (comp count keep-if) ]
    (let [num-valid-1     (count-if some?    data)    ; awkward phrasing, doesn't feel natural
          num-valid-2     (count-if not-nil? data)    ; matches intent much better
          num-nil         (count-if nil?     data) ]  ; intent is plain
      (is (and (= 7 num-valid-1 num-valid-2 )
               (= 1 num-nil))))))

Identifying Sequences

Update 2016-6-13: Now included in clojure.core 1.9.0-alpha5!

In some situations, a function may need to verify that an argument is seqable, that is, will a call to (seq some-arg) succeed? If so, some-arg may be interpreted as a sequence of values. Clojure doesn’t have a built-in function for this (please note that seqable? is different from seq?), but we can copy an solution from the old clojure.contrib.core/seqable:

(is (seqable?   "abc"))
(is (seqable?   {1 2 3 4} ))
(is (seqable?  #{1 2 3} ))
(is (seqable?  '(1 2 3) ))
(is (seqable?   [1 2 3] ))
(is (seqable?   (byte-array [1 2] )))

(is (not (seqable?  1 )))
(is (not (seqable? \a )))

Change Log

Please see the the ChangeLog for details docs.

Other useful libraries

There are several other libraries that provide useful value-added functionality to clojure.core:

Medley
Plumatic Plumbing
Such Wow
The Clojure Toolbox - For a comprehehsive list of Clojure libraries

Requirements

Clojure 1.8.0
Java 1.8

License

Distributed under the Eclipse Public License, the same as Clojure.

Development Environment

Developed using IntelliJ IDEA with the Cursive Clojure plugin.

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

ToDo List (#todo)

types
schema (& schema-datomic)
re-work csv
kill y64?
Update all NS docstrings
zipcode distance testing
lein plugin
make CLJS compatible
more docs for other namespaces
add more test.check
add spy-let, spy-defn, spy-validate, etc
blog posts

Files

README.adoc

Latest commit

History