A Model of Interceptors

I’ve been gearing up for what I think will be a fun and fruitful project. I wanted to review some of the fundamental abstractions we use in Clojure for building web applications, and see if I can’t find something super robust for building on top of.

The “classical” model is Ring, which defines Adapters, Middleware, and Handlers. Briefly, Handlers are functions from request to response. Middleware are higher-order functions that take a Handler and return a new Handler. They’re Handler transformers. Finally, Adapters are used to insert the Ring model into a “host”. For instance, the Java Servlet system could be a host. A Servlet Adapter will convert the Servlet request to a Ring request, and also convert the Ring response to a Servlet response.

The Ring model is pretty good, but it has the problem that it makes it hard to do asynchronous servers. Besides Adapters, the only things that handle requests are the Handlers. Those are just functions, so they consume a whole thread. It’s one thread per request, which doesn’t scale well and doesn’t let you do asynchronous operations.

Enter Pedestal. Pedestal introduced the idea of Interceptors. In Ring Middleware, you could transform the request before the Handler got it, and/or transform the response after the Handler returned. Interceptors defined the transform-on-the-way-in (:enter function) and transform-on-the-way-out (:leave function) as two separate operations (both functions). This basically reifies the two uses of middleware into a distinct object. If you add core.async to the mix, you can get asynchrony. Pedestal also adds an :error function for the third use case of Middleware, which was wrapping handlers in try-catch to transform Exceptions into responses.

Pedestal has machinery for running a request through an Interceptor chain. This machinery uses a stack and a queue for holding the :leave and :enter functions. The stack and the queue are held in a Context map, and that is what gets passed through each Interceptor. Each Interceptor returns a modified Context, which could modify the HTTP Request or Response, or it could modify the stack and queue. This lets Interceptors add other Interceptors to the end of the chain.

One thing I miss about Ring when using Pedestal is the cleanliness of the composition model. Handlers are general purpose functions, and Middleware, which just transform the Handler in the first argument, are easily composed using a nice thread-first macro. In the end, you’ve got one big function to call.

In Pedestal, they say “everything is an Interceptor”. But it’s not quite true. While a lot of the functionality of your service is written as Interceptors, the Interceptor chain itself is not an interceptor. You also have the ability to add Interceptors to the chain from within an Iterceptor. This makes it hard to predict what’s going to happen when an Interceptor is called. Since it’s adding Interceptors to the end, but the Interceptor itself doesn’t know what’s after it, there’s room for lots of trouble.

What I do like about Interceptors is that they separate out the three main uses of Middleware (transform request, transform response, and handle Exceptions). They reify that concept and eliminate the bit of boilerplate every Middleware has to implement (the wrap/call).

A first pass

Let’s simplify and formalize the concept of Interceptors.

First, an Interceptor is made of three functions, the :enter:leave, and :error. Their types are as follows.

;; :enter :: Request  -> Request
;; :leave :: Response -> Response
;; :error :: Request  -> Exception -> Response

All of them are optional. Let’s leave aside the :error function for now, and deal only with :enter and :leave. We’ll add :error back later. That means we have two functions. Both are of the form “take a thing, return it transformed”.

Now, given two of these, how do we compose them? It’s obvious that we can use function composition.

(comp enter2 enter1) ;; call enter1, then pass the return to enter2

That’s a really good sign. We can make a function that uses comp to compose two interceptors:

(defn chain [i1 i2]
  {:enter (comp (:enter i2 identity) 
                (:enter i1 identity))
   :leave (comp (:leave i1 identity)
                (:leave i2 identity))})

I’m calling it chain because that’s the language from Pedestal. Notice that the :leave functions are composed in the oposite order from :enter functions. We notice that chain is a monoid because it just uses comp, which is a monoid. That’s sweet! People talk about monads all the time, but monoids are where it’s at.

And we definitely want chain to be a monoid. I imagine Interceptors being assembled in different places, then finally being assembled into one big interceptor. Monoids give us that property.

Also notice that I’ve put the Interceptors in the order we expect to see them in. They’re kind of backwards in function composition.

Now we need a function that will run this chain. It simply threads the value through the enter and leave functions.

(defn run [interceptor value]
  (let [{:keys [enter leave]} interceptor]
    (-> value
        enter
        leave)))

Let’s build a test to ensure that we maintain this associative property. (Monoids are associative).

First, we’ll need a generator for some simple functions.

(def gen-integer-fn
  (gen/elements
   [identity
    (constantly 0)
    (constantly 1)
    inc
    dec]))

Nothing too fancy, because we’ll be generating some long Interceptor chains.

Now to generate Interceptors themselves.

(def gen-interceptor
  (gen/let [enter gen-integer-fn
            leave gen-integer-fn]
    {:enter enter
     :leave leave}))

And the test:

(defspec chain-associative
  1000
  (prop/for-all [a gen-interceptor
                 b gen-interceptor
                 c gen-interceptor
                 x gen/int]
    (let [x (mod x 100)
          i1 (chain a (chain b c))
          i2 (chain (chain a b) c)]
      (= (run i1 x) (run i2 x)))))

(Note: all of this testing is done using clojure.test.check.)

Converting to Vectors

I simplified the model of Interceptors to get a first approximation. The function composition version really feels nice to me. However, it’s missing a lot of stuff. I did try to add error handling and asynchrony to the function composition model, but it was really complicated and I wasn’t sure how to add some features. I won’t go over that here. It was an interesting exercise but ultimately I think it’s a dead end.

What I decided to do instead was to keep the Interceptors separate, make the run function a little smarter. Instead of composing the functions, the run function loops through the Interceptors first from begining to end, running the :enter functions, then back again to the begining running the :leave functions.

I still need chain to be a monoid. I shouldn’t have to change the test. There’s a nice existing monoid that will serve our purpose well. It’s Clojure’s very own Vector. If we keep the Interceptors in a Vector, we can define chain in terms of Vector concatenation, which is also a monoid.

Let’s go the full monty and define chain using Clojure’s monoid pattern. But first, I want to define an Interceptor a little differently. Now, an Interceptor is a Vector of zero or more maps. Each map can have :enter and :leave functions. We’ll get to :error in a bit.

Because we’re now defining Interceptors as a Vector, I will define a function normalize that will convert different stuff to that format.

(defn normalize [i]
  (cond
    (nil? i)
    []

    (= {} i)
    []

    (seq? i)
    (vec i)

    (vector? i)
    i

    (fn? i)
    [{:enter i}]

    (map? i)
    [i]))

That should be pretty straightforward. The notable things are that the empty Vector is the identity of chain, and that functions alone are converted into :enter functions.

Now chain has the full monoid definition, which includes all arities.

(defn chain
  ([]
   [])
  ([i1]
   (normalize i1))
  ([i1 i2]
   (let [i1 (normalize i1)
         i2 (normalize i2)]
     (into i1 i2)))
  ([i1 i2 & is]
   (apply chain (chain i1 i2) is)))

I’m not sure that’s the most performant possible version, but I’m not concerned about performance here. You build the chain once and run it many times. run will need to be performant.

run is an interesting situation now. If I were in a different Lisp, I’d have tail call elimination, and I’d just jump between functions in tail position with almost zero cost.

But Clojure doesn’t have tail call elimination. So I’ll use two functions, one for calling :enter functions and one for calling :leave functions. While it’s moving in one direction, it can just recur in a tight loop. Once it reaches the end of the :enter side, it defers to the other function that handles the :leave functions.

(defn run-enter [i v idx]
  (if (contains? i idx)
    (let [e (:enter (get i idx) identity)]
      (recur i (e v) (inc idx)))
    (run-leave i v (dec idx))))
(defn run-leave [i v idx]
  (if (contains? i idx)
    (let [l (:leave (get i idx) identity)]
      (recur i (l v) (dec idx)))
    v))

These look pretty similar to each other. The difference is that run-enter defers to run-leave when it runs off the end. And run-leave returns the value when it runs off the beginning.

Now, run just defers to run-enter:

(defn run [i v]
  (run-enter i v 0))

Rerun the tests and they pass.

Adding error handling

One of the important use cases of Middleware is to wrap a handler in a try/catch and handle errors. We want to enable that with our Interceptors model.

We augment the Interceptor definition. Now it’s a Vector of zero-or-more maps, where each map has an :enter:leave, and :error function (all functions are optional).

Errors act differently depending on if you’re in the :enter or :leave side. If you’re entering, if an :enter function throws a Throwable (or returns a Throwable), we immediately begin the :leave side with the Throwable as the value.

(defn run-enter [i v idx]
  (if (contains? i idx)
    (let [e (:enter (get i idx) identity)
          v' (try
               (e v)
               (catch Throwable t
                 t))]
      (if (instance? Throwable v')
        (run-leave i v' idx)
        (recur i v' (inc idx))))
    (run-leave i v (dec idx))))

When we’re leaving, if we have a Throwable, we call the :error function. Otherwise, we call the :leave function. That means that at any point, the :error function can return a non-Throwable value and it will go back to the :leave functions.

(defn run-leave [i v idx]
  (if (contains? i idx)
    (let [status (if (instance? Throwable v) :error :leave)
          l (get-in i [idx status] identity)
          v' (try
               (l v)
               (catch Throwable t
                 t))]
      (recur i v' (dec idx)))
    v))

The tests run, but we aren’t testing the error path. We need an :enter function to throw. This test sets up a chain where the last Interceptor in the chain throws an error. The first Interceptor has an :error function that returns its own error. We can test that we get that error out at the end.

(defn constantly-error [_]
  (throw (ex-info "Error" {})))
  
(defspec chain-errors
  1000
  (prop/for-all [i (gen/vector gen-interceptor)
                 x gen/int]
    (let [t (ex-info "My error" {})
          i (chain {:error (constantly t)}
                   i
                   {:enter constantly-error})
          x (mod x 100)]
      (identical? t (run i x)))))

We also want to test what happens when we don’t catch the error.

(defspec chain-error-not-caught
  1000
  (prop/for-all [i (gen/vector gen-interceptor)
                 x gen/int]
    (let [i (chain {:enter constantly-error} i)
          x (mod x 100)]
      (instance? Throwable (run i x)))))

We could really test this a lot, but I’ll skip over that.

Adding async

Another thing that Pedestal has is asynchrony. If your Interceptors return a core.async channel, Pedestal’s machinery will consider that an async Interceptor.

I would rather use Manifold, which is a cool abstraction for asynchronous values. That means that in my Interceptors, if your function returns something Manifold understands as a deferrable thing, the Interceptor chain will be considered an async chain. This change is not too difficult:

(defn run-leave [i v idx]
  (if (contains? i idx)
    (let [status (if (instance? Throwable v) :error :leave)
          l (get-in i [idx status] identity)
          v' (try
               (l v)
               (catch Throwable t
                 t))]
      (if (d/deferrable? v')
        (-> (d/->deferred v')
            (d/chain #(run-leave i % (dec idx)))
            (d/catch #(run-leave i % (dec idx))))
        (recur i v' (dec idx))))
    v))

(defn run-enter [i v idx]
  (if (contains? i idx)
    (let [e (:enter (get i idx) identity)
          v' (try
               (e v)
               (catch Throwable t
                 t))]
      (cond
        (instance? Throwable v')
        (run-leave i v' idx)

        (d/deferrable? v')
        (-> (d/->deferred v')
            (d/chain #(run-enter i % (inc idx))
            (d/catch #(run-leave i %      idx))))

        :else
        (recur i v' (inc idx))))
    (run-leave i v (dec idx))))

We can test this by making sure async chains return the same value as non-async chains.

(defn make-async [i]
  (-> i
      (update :enter #(comp defer-val %))
      (update :leave #(comp defer-val %))))

(defspec chain-async
  1000
  (prop/for-all [i (gen/not-empty (gen/vector gen-interceptor))
                 x gen/int]
    (let [ai (mapv make-async i)
          x (mod x 100)]
      (= (run i x) @(run ai x)))))

(defspec chain-async-one
  1000
  (prop/for-all [i (gen/not-empty (gen/vector gen-interceptor))
                 idx gen/int
                 x gen/int]
    (let [idx (mod idx (count i))
          ai (update i idx make-async)
          x (mod x 100)]
      (= (run i x) @(run ai x)))))
      
(defn constantly-error-async [_]
  (let [d (d/deferred)]
    (d/error! d (ex-info "Async error." {:async true}))
    d))
      
(defspec chain-error-async
  1000
  (prop/for-all [i (gen/vector gen-interceptor)]
    (let [e constantly-error-async]
      (:async (ex-data @(run (chain i e) 0))))))

Adding early return

A final thing that Middleware could do was to not call the handler. It could do something different, instead. I’m calling this “early return”. What I want to happen is have some way for an :enterfunction to immediately begin the :leave phase instead of continuing to the next :enter function.

We’ll take a page from the Clojure playbook and use reducedreduced is used in reduce for early returns. It signals that the computation is over. reduce checks for it, then unpacks the reducedvalue. We’ll do the same. Only :enter functions can do this.

(defn run-enter [i v idx]
  (if (contains? i idx)
    (let [e (:enter (get i idx) identity)
          v' (try
               (e v)
               (catch Throwable t
                 t))]
      (cond
        (instance? Throwable v')
        (run-leave i v' idx)

        (d/deferrable? v')
        (-> (d/->deferred v')
            (d/chain #(run-enter i % (inc idx))))

        (reduced? v')
        (run-leave i (unreduced v') idx)

        :else
        (recur i v' (inc idx))))
    (run-leave i v (dec idx))))

We can test this by saying that a chain of a should be equivalent to a early returning before b. It’s clearer in code.

(defn make-reduced [i]
  (update i :enter #(comp reduced %)))

(defspec chain-early-exit
  1000
  (prop/for-all [a gen-interceptor
                 b gen-interceptor
                 x gen/int]
    (let [x (mod x 100)
          i1 (chain a)
          i2 (chain (make-reduced a) b)]
      (= (run i1 x) (run i2 x)))))

That’s it! Check out the code here.

Conclusions

I’m happy with where this wound up. I’ve got a nice, concise implementation of Interceptors, ready for use in async web servers–though it’s not specific to the web. I’ve also got a nice suite of tests. The model is not quite as formal as I’d like. Some of the tests seem a little too specific. I’d love for them to be a few simple properties that guarantee the behaviors I’m looking for. Right now they feel ad hoc. I probably spent too much time on this. But I really enjoyed it! I’m looking forward to hear what you have to think about it.

The post A Model of Interceptors appeared first on LispCast.

Permalink

Rails: The Struggle Is Real

I don’t know how many hours I’ve lost to searching for what should be the easiest thing but instead winds up with me scratching my head because I’m forced to do it a crazy complicated way because some danish dude thought it was a good idea. Don’t get it twisted yo, rails has some positive things going for it, but making sense to me is not one of them. Let me give you some concrete examples of some WTF moments I’ve dealt with repeatedly.

Custom form builder helpers

This is when you want to wrap up a few elements into something nice and neat like

<%= f.my_tag :user_id, @user.id %>

The code to make something like that was such a WTF moment for me that I had to physically take a step back from my overpriced macbook pro and take 3 solid, slow breaths before I could get my wits about me and dive into content_tag hell.

ActiveRecord syntax

I used to be a fan of ActiveRecord, I always thought it had that special something that made interacting with yucky mysql, postgres or sqlite palatable and sane. That is until I realized that it’s hurting me more than it’s helping me. I honestly don’t even know where to start. Wait… it’s coming to me, ok here we go, let’s start with something that should be easy, getting collections out of related objects:

current_user.todos.includes(:tags).where(completed: true).order(created_at: :desc).limit(30)

I mean, it seems like this does what I want, I want some todos and their associated tags based on the current logged in user, I expect this:

[
  {
    todo: {
      name: todo #1”,
      tags: [{name: tag}]
    }
  }
]

But this doesn’t happen, everything is wrapped up in an object and to get plain jane data out you have to introduce yourself to strange words like serializing and sometimes even gasp, marshaling. Honestly though, being a data marshal sounds like I’m about to have a wild west shoot out in the middle of town: it’s me, my keyboard and the gang of four.

ERB

This isn’t even related to custom form builder helpers, this is just straight up nonsense. If I had a nickel for every time I closed an ERB tag due to editor malfunction I sure as heck wouldn’t have to work anymore because I’d be a rich son of a gun. The idea of learning a whole other syntax on top of ruby and on top of rails’ conventions? I’m just trying to make a fricken website here not send a man to the moon, how many different things do I need to learn here?!

AR Callback hell

How do you get function reuse on a given event? Like for example when an object is created? Well you type after_create and then you call a method, sweet. Done. Of course what no one bothered to tell me is that the reason I even need these hooks in the first place is because we aren’t just calling regular functions to begin with, we’re calling a complex chain of methods so deep, it makes the marianas trench look like a kiddy pool. What’s the alternative you ask, how can we call a function every time a new database record is made. Well, it’s called a function, and you could just call that function where ever you want to save that thing, you don’t need a callback, you don’t need to learn another thing and get lost in debates about whether or not it’s after_commit on: :create or after_create. You can just call a function.

Criticism is the sincerest form of make a new framework

Which is what I did, I made yet another full stack framework, except this one isn’t in ruby, it’s in clojure. It also doesn’t have a lot of stuff, it doesn’t have a separate templating language, it doesn’t have an ORM, it doesn’t have AR callbacks or special content_tag methods, it uses functions and that’s pretty much it. If you made it this far you’ve got to be interested, check out the quickstart and let me know what you think!

Originally posted on medium

Permalink

PurelyFunctional.tv Newsletter 296: Design, Success, and Conj Speakers

Issue 296 – October 15, 2018 · Archives · Subscribe

Hi Clojurers,

Why is functional programming becoming more popular? I don’t know if we’ll ever tease out the real cause, but we can speculate.

Is it because the number of software developers in general is increasing, so of course the number of people pre-disposed to FP are also increasing?

Or is it because people are finally past the honeymoon phase of the last paradigm?’

What about the “rise” of multicore?

Or, as I currently believe, is it that the web has forced us to think about distributed systems? Functional programmers might not have solutions to the irreducible problems of distributed systems, but we have been asking these questions for a while.

What do you think?

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!


Hire Eric to work in your team

I have spots open for client work. I prefer full time engagements. If you or someone you know are interested, please reply to this email. We can get on a call to talk about your needs.


Clojure/conj Speakers Announced

Clojure/Conj 2018 is right around the corner and the speakers have been announced.


What is Success? YouTube

Evan Czaplicki, the creator of Elm, talks about what success means for a programming language, including how to build better abstractions on top of the web APIs and how to encourage a helpful community. Evan seems to be facing a lot of the same criticisms that Clojure faces. Go figure.


Towards a Theory of Conceptual Design for Software Paper

A nice paper on software design.


Clojure in Seattle: World Singles Networks

JUXT interviewed Sean Corfield about how World Singles uses Clojure.


Stu Halloway on Apropos Oct 18

We invited Stu Halloway to discuss the work he has been doing with exception reporting in Clojure. And he said yes! You can catch him live on October 18. We record the show live on YouTube. Subscribe and click on the bell icon to be notified when we go live. When you are there live, you can ask us questions. We record every two weeks on Thursdays.


Clojure 1.10.0-beta3

Clojure 1.10 is getting close to a release. Cognitect releases alphas, then betas, then release candidates. They ask that we test them against our codebases to find any problems before the release is published. If you’re using Java 8 or higher, you can run Clojure 1.10.

1.10 includes better error reporting, prepl alpha, and more. This is also the first release to require Java 8 or higher. Test out the latest pre-release on your code (you don’t have to run it in production) and report any problems.


Clojure Collections Currently recording

I’ve finished recording the lessons that are directly about the collections themselves. It was fun doing research for the less-frequently used collections. Each lesson goes over:

  • Literal syntax
  • Constructor functions
  • Evaluation semantics
  • Function call semantics
  • The patterns for usage

Here they the new lessons this week: Set, List, Queue, Sorted Map, and Sorted Set.

The course is already over five hours long. There are still a couple hours left to record.

The post PurelyFunctional.tv Newsletter 296: Design, Success, and Conj Speakers appeared first on PurelyFunctional.tv.

Permalink

Clojure.spec Beginner's FAQ

Clojure.spec has been available (in alpha) for some time, and there are great talks and resources like the rationale and official guide. This post supplements those with some common questions and issues I’ve seen.

These are mostly things I’ve experienced myself or seen on Stack Overflow, Clojurians Slack, r/Clojure or Twitter. The community is extremely helpful so reach out if you have questions — it’s not unusual to get an answer straight from a Clojure maintainer. And while this post follows question & answer format, my answers are extremely non-authoritative. Please report any issues you find!

Usage

Q: How should I integrate clojure.spec into my project?

A: However you like!

Clojure.spec usage is totally à la carte. You can use only the features you want to whatever extent you want. This is great for consumers, but a lack of prescribed patterns doesn’t help newcomers wondering how to “best” leverage spec in their work.

Here are some things I’ve used spec for:

  • Validating inputs to service endpoints, and using spec metadata to return helpful details for invalid inputs.
  • Generating sample inputs to service endpoints for API documentation.
  • Writing specs for important functions, and instrumenting them during development/testing as a kind of run-time type checker.
  • Using specs to get corresponding test.check generators “for free”, using the generated data for other purposes.
  • Using a s/or spec as a kind of discriminated union to dispatch based on different types of inputs.

Organization

Q: Should I put specs in the same namespace as my code, or in a separate namespace?

A: It’s up to you.

You may find having specs alongside relevant code is helpful. You can just as easily put specs for data in separate namespaces, and specs for functions next to their defn counterparts — and you can put the spec before or after the defn.

One consideration is whether you’re using qualified keywords in s/keys map specs. I think a potential benefit of having data specs in the same namespace as your regular code is that it’s natural to inherit that namespace for namespaced map specs:

(in-ns 'customer)
(s/def ::id int?)
(s/def ::name string?)
(s/def ::contact (s/keys :req [::id ::name]))
(defn print [{::keys [id name]}] (prn id name))
(print #:customer{:id 1 :name "Jane" :some.other/id "FhI-1"})

Instrumentation & Testing

instrument can be used to assert arguments to spec’d function invocations. See my post on function specs for more detailed examples. The 0-arity instrument will instrument every loaded, instrumentable function. There’s a cost associated with instrumentation, and you may want instrument to only affect particular functions:

(clojure.spec.test.alpha/instrument [`foo `bar])

And there’s an unstrument function for removing instrumentation.

Q: When and where should I call instrument?

A: It depends.

Your instrumentable functions must be loaded before you can instrument them. If you have per-profile entry points to your program e.g. -main, -dev-main, you might choose to call instrument in the dev entry point. If your project uses Leiningen, you could use :injections to call instrument on a per-profile basis:

:injections [(require 'lib.core) ;; all instrumented fns should be loaded here
             (require 'clojure.spec.test.alpha)
             (clojure.spec.test.alpha/instrument)]

Or you may only want to instrument functions during test runs, maybe as a clojure.test fixture or in the namespace itself.

Q: Should I use instrument in production builds?

A: Probably not.

From the guide:

Instrumentation is likely to be useful at both development time and during testing to discover errors in calling code. It is not recommended to use instrumentation in production due to the overhead involved with checking args specs.

Q: Why doesn’t instrument check my function return value?

A: It’s not meant to.

This is a common point of confusion. If you instrument a function spec’d with fdef or fspec, only its :args spec will be checked during each invocation. Why is that? I think one rationale might be that functional programs are generally a composition of functions where each output becomes an input to another function. If you have spec’d the :args and :ret of related functions (f (g x)), you’d be redundantly checking the same specs for the :ret of g and :args of f. For larger examples this could become very costly.

Q: So why do fdef/fspec accept :ret and :fn specs too?

A: They’re used to check the function.

checking a function involves generating many random inputs (from the :args spec) for the function, invoking the function with each input, and checking that the return value conforms to the :ret spec, and if :fn spec is defined that the relationship between the input and output values are satisfied. This is commonly called generative or property-based testing and spec relies on test.check for this.

You can use check as part of a test suite, for example with clojure.test:

(deftest foo-test
  (is (= 1 (-> (st/check `foo)
               (st/summarize-results)
               :check-passed))))

Sequences & Regex Specs

Q: What’s this odd nesting behavior with regex specs?

A: Nested regex specs compose to describe a single sequence.

Or as stated in the rationale:

These nest arbitrarily to form complex expressions.

I think this is much easier to understand through examples, which can be found in the official guide.

It’s easy to forget the s/& spec which is useful for adding additional predicates to regex specs, where using s/and would destroy the regex nesting behavior. Consider these two conforming examples where the only difference is s/& vs. s/and:

(s/conform
  (s/+ (s/alt :n number?
              :s (s/and (s/+ string?)
                        #(every? (complement empty?) %))))
  [1 ["x" "a"] 2 ["y"] ["z"]])

(s/conform
  (s/+ (s/alt :n number?
              :s (s/& (s/+ string?)
                      #(every? (complement empty?) %))))
  [1 "x" "a" 2 "y" "z"])

Q: How can I escape the regex spec nesting behavior?

A: Wrap the regex spec with s/spec.

This is also described in the official guide above, but here’s another example:

(s/conform
  (s/+ (s/alt :n number? :s (s/spec (s/* string?))))
  [1 2 3 ["x" "y" "z"]])
=> [[:n 1] [:n 2] [:n 3] [:s ["x" "y" "z"]]]

But this example might be better specified with s/coll-of or s/every:

(s/+ (s/alt :n number? :s (s/coll-of string?)))
(s/+ (s/alt :n number? :s (s/every string?)))

Map Specs

Q: How can I assign different specs to the same key in different maps?

A: That’s not supported (for qualified keys).

Spec encourages use of qualified map keys. If you control the shape of your map you should consider using qualified keys:

{:customer/name "Frieda" :company/name "Acme"}

Of course you may not have this luxury when working with external data sources where the same key may have different meanings at different paths:

{:name "Frieda" :company {:name "Acme"}}

In this case there’s a workaround if you’re using unqualified keys in your map specs. You can use qualified keys to name your specs, but create s/keys specs with unqualified keys e.g. :req-un and :opt-un:

(s/def :customer/name string?)
(s/def :company/name (s/nilable string?))
(s/def ::company (s/keys :req-un [:company/name]))
(s/def ::customer (s/keys :req-un [:customer/name ::company]))
(s/valid? ::customer {:name "Taylor" :company {:name nil}}) => true

Q: How can I make a s/keys spec that disallows extra keys?

A: You should reconsider that.

Map specs are meant to be open instead of closed. Adding data to a map spec should not make the map invalid. There are some cases where you might really want this, and of course it’s possible.

Generation

Q: Why does generation fail with Couldn’t satisfy such-that predicate after 100 tries?

A: It might be the default generator for a s/and spec.

Generators are created from specs automagically, but for s/and specs the generator is based solely on the first spec inside the s/and. This spec conforms string palindromes:

(s/def ::palindrome
  (s/and string? #(= (clojure.string/reverse %) %)))

But you may get an error message when trying to exercise its generator (or possibly indirectly if the spec is involved in a check‘d function):

(gen/sample (s/gen ::palindrome))
ExceptionInfo Couldn't satisfy such-that predicate after 100 tries.  clojure.core/ex-info (core.clj:4739)

This is because the generator for ::palindrome is actually a generator for (s/gen string?), which will just generate random strings that are very unlikely to be palindromes, and after 100 random tries it gives up. In these cases you can provide your own generator with s/with-gen:

(s/def ::palindrome
  (s/with-gen
    (s/and string? #(= (clojure.string/reverse %) %))
    ;; use fmap to create palindromes of the generated strings
    #(gen/fmap (fn [s] (apply str s (rest (reverse s))))
               (s/gen string?))))
(gen/sample (s/gen ::palindrome))
=> ("" "" "e" "PA6AP" "OUdTdUO" "k" "N" "0" "1T353T1" "D4V4D")

Some spec functions also take an optional overrides map of custom generators without needing to associate them with the spec definitions directly.

Q: How can I generate data with parent/child relationships?

A: Use test.check functions like fmap, bind, or let macro.

As in the previous example you can use fmap to create a new generator that alters the results of a generator, so for recursive structures you can make any modifications there. For more complex scenarios — perhaps involving multiple generators — test.check provides a let macro that’s syntactically similar to clojure.core let. (Note that clojure.spec.gen.alpha only aliases some of test.check’s functionality; you’ll need to require test.check namespaces directly for its let macro.) The RHS of the bindings are generators and the LHS names are bound to the generated values, and it all expands to fmap and bind calls.

Consider an example of a recursive map spec:

(s/def ::id uuid?)
(s/def ::children (s/* ::my-map))
(s/def ::parent-id ::id)
(s/def ::my-map
  (s/keys :req-un [::id]
          :opt-un [::parent-id ::children]))
(gen/sample (s/gen ::my-map))

We need a custom generator to ensure the :parent-id for child maps is accurate, using fmap again:

(defn set-parent-ids [m]
  (clojure.walk/prewalk
    (fn [v]
      (if (map? v)
        (update v
          :children #(map (fn [c] (assoc c :parent-id (:id v))) %))
        v))
    m))
(gen/sample
  (gen/fmap set-parent-ids (s/gen ::my-map)))

Also take a look at test.check’s recursive-gen for another approach.

Another example might be non-recursive relationships between values, like arguments to a function. Consider writing a function spec for clojure.core/subs:

(s/def ::subs-args (s/cat :str string? :i int?))
(s/fdef clojure.core/subs :args ::subs-args)
(s/exercise-fn `subs)
StringIndexOutOfBoundsException String index out of range: -1  java.lang.String.substring (String.java:1927)

This is because the generators for the strings and integer indices are independent and unrelated:

(gen/sample (s/gen (s/cat :str string? :i int?)))
=> (("" 0) ("" 0) ("" -1) ("C" 3) ("c9gD" 1) ("" 6) ("cfo2s7" -7) ("3fRj30" -2) ("W" 0) ("cEzS" -15))

If we want to ensure that the index argument refers to a position in the string input, we can use a custom generator:

(s/def ::subs-args
  (s/with-gen
    (s/cat :str string? :i int?)
    #(gen/let [str (s/gen string?)
               index (gen/large-integer* {:min 0 :max (count str)})]
       [str index])))
(st/summarize-results (st/check `subs))
=> {:total 1, :check-passed 1}

Q: Why is my recursive/sequential spec slow to generate or check?

A: The recursion may be too deep, or sequence spec generators may be unbounded.

Spec defines a dynamic binding *recursion-limit* that puts a “soft” limit on recursive generation depth. You may want to decrease this limit in cases where generated structures are unwieldy.

The other type of sizing/growth to be concerned about relates to sequences generated by specs like s/every, s/coll-of. The :gen-max option will limit the length of generated sequences:

(gen/sample (s/gen (s/coll-of string? :gen-max 2)))
=> (["" ""] [] ["1D" ""] ["I"] [] ["Ne4" "y6i"] ["93" "oe"] ["4wUue7"] [] [])

Specs as Data

The current version of spec doesn’t make it very easy to create or inspect specs programmatically. This is reportedly being worked on for a future release.

Q: How can I get the keys from a s/keys map spec?

A: Use s/form or s/describe on the spec.

s/form (and its abbreviated sibling s/describe) will return the original form of the spec:

(s/def ::my-map
  (s/keys :req-un [::id]
          :opt-un [::parent-id ::children]))
(->> (s/get-spec ::my-map)
     (s/describe)      ;; get abbrev. form of original spec
     (rest)            ;; drop `keys` symbol
     (apply hash-map)) ;; put kwargs into map
=> {:req-un [:sandbox/id]
    :opt-un [:sandbox/parent-id :sandbox/children]}

Q: How can I get the :args, :ret, or :fn specs from a function spec?

A: Same as the previous answer, for any type of spec.

And if for some reason you also needed to recreate one of those specs, you could use eval:

(s/fdef foo :args (s/tuple number?) :ret number?)
(eval (->> (s/get-spec `foo)
           (s/form) ;; get non-abbrev. form
           (rest)
           (apply hash-map)
           :args))  ;; get :args spec
(gen/sample (s/gen *1))
=> ([2.0] [-1.5] [0] [-2] [-1.5] [-1.5] [14] [1] [##-Inf] [-5.875])

Off-label Usage

Q: How can I use spec to parse strings?

A: Spec is not intended for string parsing.

Specs can be used as parsers in that specs can describe a syntax, and conforming valid inputs produces tagged outputs that can be treated as a syntax tree, so… why not use that on strings of characters? You’ll have an easier time using a purpose-built parser for string inputs. Spec’s regex specs are also more limited than typical regular expression libraries.

Q: I can use s/conformer to transform data with specs. Is that a good idea?

A: Not really.

Cognitect’s Alex Miller says:

spec is not designed to provide arbitrary data transformation (Clojure’s core library can be used for that). It is possible to achieve this using s/conformer but it is not recommended to use that feature for arbitrary transformations like this (it is better suited to things like building custom specs).

There’s also a JIRA with discussion on the topic. The official advice is to do this separately from spec, and there’s a library to assist with that.

I think some simple coercion isn’t terrible in limited, internal use cases e.g. conforming date strings to date values at your API boundary.

Permalink

Clojure libraries I recommend

Every now and then I see people asking what Clojure and ClojureScript libraries they should be using. In this post, I’ll share my list of libraries for building full-stack Clojure/ClojureScript web applications. I’ve been building this kind of applications for a while now and I believe you can rely on these libraries. Here’s what I’m looking for: Robustness. I don’t want to debug buggy libraries – I write enough bugs of my own.

Permalink

Clojure Gotchas: Surrogate Pairs

tl;dr: both Java and JavaScript have trouble dealing with unicode characters from Supplementary Planes, like emoji 😱💣.

Today I started working on the next feature of lambdaisland/uri, URI normalization. I worked test-first, you’ll get to see how that went in the next Lambda Island episode.

One of the design goals for this library is to have 100% parity between Clojure and ClojureScript. Learn once, use anywhere. The code is all written in .cljc files, so it can be treated as either Clojure or ClojureScript. Only where necessary am I using a small amount of reader conditionals.

#?(:clj
   (defmethod print-method URI [this writer]
     (.write writer "#")
     (.write writer (str edn-tag))
     (.write writer " ")
     (.write writer (prn-str (.toString this))))

   :cljs
   (extend-type URI
     IPrintWithWriter
     (-pr-writer [this writer _opts]
       (write-all writer "#" (str edn-tag) " " (prn-str (.toString this))))))

Example of a reader conditional

For this feature however I’m digging quite deeply into the innards of strings, in order to do percent-encoding and decoding. Once you get into hairy stuff like text encodings the platform differences become quite apparent. Instead of trying to smooth over the differences with reader conditionals, I decided to create two files platform.clj and platform.cljs. They define the exact same functions, but one does it for Clojure, the other for ClojureScript. Now from my main namespace I require lambdaisland.uri.platform, and it will pull in the right one depending on the target that is being compiled for.

(ns lambdaisland.uri.normalize
  (:require [clojure.string :as str]
            ;; this loads either platform.clj, or platform.cljs
            [lambdaisland.uri.platform :refer [string->byte-seq
                                               byte-seq->string
                                               hex->byte 
                                               byte->hex
                                               char-code-at]]))

The first challenge I ran into was that I needed to turn a string into a UTF-8 byte array, so that those bytes can be percent encoded. In Clojure that’s relatively easy. In ClojureScript the Google Closure library came to the rescue.

;; Clojure
(defn string->byte-seq [s]
  (.getBytes s "UTF8"))

(defn byte-seq->string [arr]
  (String. (byte-array arr) "UTF8"))


;; ClojureScript
(require '[goog.crypt :as c])

(defn string->byte-seq [s]
  (c/stringToUtf8ByteArray s))

(defn byte-seq->string [arr]
  (c/utf8ByteArrayToString (apply array arr)))

To detect which characters need to be percent-encoded I’m using some regular expressions. Things seemed to be going well, but when re-running my tests on ClojureScript I started getting some weird results.

;; Clojure
(re-seq #"." "🥀")
;;=> ("🥀")

;; ClojureScript
(re-seq #"." "🥀")
;;=> ("�" "�")

This, gentle folks, is the wonder of surrogate pairs. So how does this happen?

Sadly I don’t have time to give you a complete primer on Unicode and its historical mistakes, but to give you the short version…

JavaScript was created at a time when people still assumed Unicode would never have more than 65536 characters, and so its strings use two bytes to represent one character, always. This is known as the UCS-2 encoding.

Unicode has grown a lot since then, and now also has a lot of codepoints with numbers greater than 65536. These include many old scripts, less common CJK characters (aka Hanzi or Kanji), many special symbols, and last but not least, emoji!

So they needed a way to represent these extra characters, but they also didn’t want to change all those systems using UCS-2 too much, so UTF-16 was born. In UTF-16 the first 65536 codepoints are still encoded the same as in UCS-2, with two bytes, but the ones higher up are encoded with 4 bytes using some special tricks involving some gaps in the Unicode space. In other words, these characters take up the width of two characters in a JavaScript string. These two characters are known as a “surrogate pair”, the first one being the “high surrogate”, and the other one the “low surrogate”.

So this is what JavaScript strings do now, but the rest of the language never got the memo. Regular expressions, string operations like .substr and .slice all still happily assume it’s 1995, and so they’ll cut surrogate pairs in half without blinking.

ClojureScript builds on those semantics, so you are liable to all the same mess.

(seq "🚩 ")
;;=> ("�" "�")

I managed to work around this by first implementing char-seq, a way of looping over the actual characters of a string.

(defn char-code-at [str pos]
  #?(:clj (.charAt str pos)
     :cljs (.charCodeAt str pos)))

(defn char-seq
  "Return a seq of the characters in a string, making sure not to split up
  UCS-2 (or is it UTF-16?) surrogate pairs. Because JavaScript. And Java."
  ([str]
   (char-seq str 0))
  ([str offset]
   (if (>= offset (count str))
     ()
     (let [code (char-code-at str offset)
           width (if (<= 0xD800 (int code) 0xDBFF) 2 1)] ; detect "high surrogate"
       (cons (subs str offset (+ offset width))
             (char-seq str (+ offset width)))))))

I imagine this snippet might come in handy for some. Notice how it’s basically identical for Clojure and ClojureScript. This is because Java suffers from the same problem. The only difference is that there some of the language got the memo. So for instance regular expressions correctly work on characters, but things like substring or .charAt are essentialy broken.

Hopefully ClojureScript will eventually fix some of this mess, for instance by having a seq over a string return the real characters, but for performance reasons it’s likely they will want to stick closely to JavaScript semantics, so I wouldn’t count too much on this happening.

In the meanwhile what we can do is document the things you need to watch out for, and write cross-platform libraries like lambdaisland/uri that smooth over the differences. 👍

Permalink

BeakerX and Python for Data Visualization

Jupyter Notebooks provide data engineers with a formidable tool to extract insights from troves of data on the fly. Typically, Pythonistas use the notebooks for quickly compiling code, testing/debugging algorithms, and scaling program executions; a robust Javascript kernel (here) is also now available for the notebooks as IJavascript, but of course even notebook-used Javascript still adheres to the single-assignment constraint for variable declarations. As a result of Jupyter’s ascension with Python installation packages, Jupyter Notebooks have rapidly grown to be the default playground for Python learning and data science, and data visualization is becoming more of an accessible window into large-scale datasets and predictive mappings.

BeakerX is an open-source project initiated by Two Sigma, an investment management firm that foregrounds machine-learning and distributed computing for their investment decisions. One of Two Sigma’s many open-source projects, BeakerX is by far the most robust and community-contributed, with a multifaceted objective to expand the Jupyter ecosystem.

Whereas the project — more so a grouping of Jupyter extensions — at first glance appears to augment Jupyter Notebooks’ plotting and generalized data-visualization inventory, it also adds polyglot support to the Notebooks; with BeakerX’s installation, a user may initiate a notebook in JVM languages and others (Groovy, Scala, Clojure, Kotlin, Java, SQL, and, of course, Python). Or, as can be seen below, the mixture of Python and Javascript (via Jupyter’s magics and autotranslation) combines to create D3.js interactive visualizations from Python-manipulated datasets:

Python and Javascript in the same notebook — for a future post?

Which, sooner or later, gives us:

https://medium.com/media/fb68f40dbc95df29d834c4c653e2bb4b/href

However, the visualization and interactivity functionality of BeakerX is what the project was made for, and we’ll be walking through an interactive example to show its ease of use. (And, although many of the examples in BeakerX’s official binder are written in Groovy, we’ll be focusing on Python here.)

Installation

First, be sure PATH is correctly declared (this is in re MacOS installations). Python and Anaconda on MacOS may not execute as expected at times, especially if your machine is housing more than one version of Python (e.g., Python 2.7 & 3.6); Pandas may be in one location while the executable Python version is in another (in which case you can just alias the necessary version via the terminal). I’ve found that the following ensures things are pointing in the right direction before attempting to conda install, the preferred method of installation for BeakerX.

~ $ export PATH=~/anaconda3/bin:$PATH

Once that’s handled, run the following lines, one after another, and they should execute and install the relevant libraries without any issues:

~ $ conda create -y -n beakerx 'python>=3'
~ $ source activate beakerx
~ $ conda config --env --add pinned_packages 'openjdk>8.0.121'
~ $ conda install -y -c conda-forge ipywidgets beakerx

(NB. BeakerX works with Python versions > 3.4.)

From there, initialize your Jupyter Notebook at localhost:8888 from either the Anaconda Navigator or your terminal,

~ $ jupyter notebook
Take note of the available JVM languages, in addition to Python 3 and SQL

Once initialized, I like to run the following code to check everything’s in order:

And we’re ready to go…

The Project

The best way to get acquainted with BeakerX functionality is to parse the official binder and start with a familiar data-visualization technique, like time-series outputs, heatmaps, or histograms, all of which are probably familiar to data scientists through Seaborn’s matplotlib-based library or the Bokeh library. We’re going to focus on a smaller part of the Python-based Output Container example, here.

We’ll be using Python and Pandas to create an interactive scatterplot from about 300 rows of data, and it’ll ultimately look like this:

https://medium.com/media/679f4eebb615df65f2fea6e2ec3d1ffa/href

The data is BeakerX’s own, found at its GitHub repo, here. There’s a minimized version of the data for easier testing purposes, but I’ll be using the complete version, which only runs 314 rows long, so that a fuller plot is generated.

(One thing to briefly note: Bokeh’s interactivity is in itself robust and fulfilling, and bqplot — via Bloomberg — likewise presents standardized 2D visualization options. But BeakerX more suitably integrates with the Spark cluster-computing framework, and a brief tutorial shows the ease with which TensorFlow and neural network integration is achieved with BeakerX at the helm.)

Nonetheless, we’ll be focusing on a simple scatterplot, as mentioned, for introductory purposes. So first, we have to read the data, via Pandas, and print the first 5 lines to check if everything’s in order:

Did we import Pandas?

The astute reader will notice we only ran import beakerx (or, alternatively, from beakerx import *). Yet, we’re able to run Pandas without importing…

Looking at the source code reveals BeakerX incorporates Pandas into its build, which provides autocompiled functionality of pd.dataframe and other Pandas calls.

Without knowing the Pandas library, it might be easy to overlook a lot of the shorthand code allowed by BeakerX when Python-based Notebooks are generating visualizations or handling data. Thus, it helps to know what pd references, even though read_csv is fairly self-explanatory. (Just make sure the .csv is in a reachable directory!) BeakerX, in this context, relies on Pandas, but it also augments Pandas’ base methods with even more spreadsheet-like action. As can be seen below, greater flexibility is imported with BeakerX’s handling of the Pandas library, for (much) easier visualization purposes. This time, running df.head(20) for the first 20 rows of data:

https://medium.com/media/c6cb05ba2a31a1b52b039cf89260a312/href

Next, we’ll handle aesthetics:

And the code to handle the plotting:

Familiarity with Pandas certainly helps when it comes to plotting like this, but it’s a safe bet that if you’ve used Jupyter Notebooks, you’ve done the fundamentals when it comes to Pandas, its dataframes, and some kind of plotting. IPython was designed to handle such things with ease. But for a brief rundown…

We set our dataframe as df, and where we’re calling plot.add to construct the plot, we’re likewise calling as Points and Line arguments the x and y axes for the data. Thus, df.y1 is the y1 column in the df dataframe; df.y30 is the y30 column in the df dataframe. Etc. Also, we’re setting the setShowLegend boolean to True for illustrative purposes, and hovering over the items in the legend causes the respective data points to illuminate or hide, depending on the chosen actions.

Ultimately, this yields a neat, interactive (see .gif above) plot in just a few lines of Python, thanks to BeakerX:

(For the full notebook, see this Gist, or the official binder here.)

Conclusion

With polyglot functionality and Notebooks magics built in, BeakerX is a powerhouse of efficiency and workflow, and it’ll be great to see more dev towards neural network processing, especially via TensorFlow or distributed-computing frameworks. Such advances will allow more novice programmers to easily and intuitively understand the complex inner-workings of AI programming.

But in my opinion, what makes it most exciting is that it’s open source, one of many Two Sigma initiatives at work to further the data science community. A number of active issues are available for everyone to dig into and get to work with.


BeakerX and Python for Data Visualization was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.