XT16 - Sam Aaron: Communicative Programming

At JUXT we are big fans of Sam Aaron. I recall sitting on the back of his push bike as he pedalled me around the streets of Cambridge at high speeds so that I could catch my train, with a brief interlude of crashing in front of pack of enthused tourists.

Sam is bags of fun and excitement, as witnessed by the crowd at XT16 gathered to hear him talk about Communicative Programming.

In this day and age of raging debates about type systems, and frequent puzzlements about what Monads are, it's refreshing to hear Sam prioritise the lowering of cognitive load when it comes to accessing programming languages and our environments.

Code should be accessible to a five year old. Sam, who makes music by live coding at nightclubs, knows only too well the importance utilising a clear and simple programming language.

As he says:

Programming languages aren't important, ideas are important.

Enjoy his entertaining and philosophical discussion.

Permalink

Defn Podcast: Episode 18

Defn Podcast: Episode 18

Last week I joined Ray and Vijay on the Defn Podcast. We talked about Clojure as well as my latest project; crepl. I had a great time talking with them. You can listen to the episode below. Enjoy!

https://defn.audio/2017/02/20/episode-18-gijs-stuurman/ https://twitter.com/DefnPodcast Read more

Permalink

Problems with the JVM

Problems with the JVM

How Clojure deals with limitations of the JVM

Well, my last few articles might make you think I'm all rosy on the JVM. It's got all sorts of investment in performance, cross-platform compatibility, and backwards compatibility. It has industry adoption and a giant ecosystem of libraries and tooling. But there's still stuff that was just a mistake.

Null pointers

Null pointers are completely unnecessary. They muddy up the whole mess. You have to do null checks before method calls. Their inventor calls them "My Billion Dollar Mistake". Who hasn't been bitten by a NullPointerException?

Java has null pointers. And so does the JVM. Clojure would be a better language without null pointers (in my opinion), but being on the JVM, you're going to get one sooner or later. So Clojure has made nil a common, first-class value. That means you deal with it all the time. I think I use nil way more in Clojure than I ever did in Java. And it's less dangerous, since Clojure uses extensive nil punning.

For instance, in Clojure nil has a type (which is nil), while Java's null has a weird type without a name. nil is a seq and an empty associative collection. It's not perfect, but nils aren't as big of a problem as nulls are in Java.

Tail call elimination

Tail call elimination (TCE) (also incorrectly called tail call optimization) is a standard technique when compiling functions, but the JVM does not do it.

Tail calls are function calls that are the last thing a function will ever do. For instance, in this function:

(defn math [x y z]
  (+ x (* y z)))

The last thing the function will do is add, so that call to + is a tail call. The multiplication isn't a tail call because there's more work to do.

Because it's the last thing the function will do, you don't really need to remember to come back to the function after the addition is calculated. You can skip that and go straight to where the call to math would return. Typically, programming languages will use a stack to store the return address. You can save a lot of memory in the stack by eliminating these unnecessary return addresses. Moreover, you save so much that you eliminate a whole class of stack overflow errors. Languages that use a lot of recursion take advantage of TCE. Recursive definitions are difficult or impossible without it.

It's one of the central techniques Guy Steele and Gerald Sussman talk about in the Lambda the Ultimate papers. Guy Steele has been campaigning for tail call elimination since Java was originally released. Here is a succint and recent plea based on an older essay.

Rich Hickey has commented a bit about the lack of tail calls when asked in presentations. I can't really find the link, but the gist is that he considers function/method calling behavior to be part of the platform. He's not going to hack something in. And once the JVM and JavaScript support tail call elimination, Clojure will get it, too.

However, Clojure has made minor practical compromises to eliminate what would be a problem in a language focused on recursion. First of all, the Clojure compiler will set locals to null just before doing a tail call (called locals clearing). This lets the garbage collector get started on memory that otherwise would have a reference from the stack.

Another compromise is the recur form. It is known as explicit tail recursion. A big reason to have TCE is to allow for tail self-recursion, that is, a function calling itself from a tail position. The recur form does exactly that. It's only allowed in the tail position (the last function called from a function) and it does self-recursion without adding to the stack. It's an interesting compromise and it solves one of the biggest uses of TCE. However, pervasive TCE is still important and un-addressed.

When there is a lot of mutual tail calls (meaning functions that call each other in tail position), Clojure has another feature called trampolines. The solution is clever: instead of doing a tail call, just return a function that would do the tail call. Ugh. It's hard to explain in English, easy in code. Let's say you had this setup:

(defn even? [x]
  (if (zero? x)
    true
    (odd? (dec x))))

(defn odd? [x]
  (even? (dec x)))

Alright, this is totally contrived, and it only works for positive integers. But just notice that odd? and even? are in tail positions. If I call this with a big number (even? 100000000), I get a stack overflow. Each call to even? or odd? adds another frame to the stack. I can't use recur because they aren't tail calls. I have to use a trampoline.

(defn even? [x]
  (if (zero? x)
    true
    (fn [] (odd? (dec x)))))

(defn odd? [x]
  (fn [] (even? (dec x))))

Now I can just call (trampoline (even? 100000000)) and it works just fine. It's slower than TCE, but it doesn't take all of the memory of non-TCE function calls.

All trampoline does is check the value you pass it. If it's a function, it calls it and loops with the return value. If it's not a function, it returns it. So it loops until it doesn't have a function any more. It's quite a simple thing. Check out the source.

Notice also that, if you know it, you could use the # syntax to turn those tail calls into functions, which makes it look a little nicer:

(defn even? [x]
  (if (zero? x)
    true
    #(odd? (dec x))))

(defn odd? [x]
  #(even? (dec x)))

The last thing I want to mention about tail call elimination with Clojure is that someone did try to add TCE to the Clojure compiler by doing source code transformations. I don't recommend you use it, but it might be interesting to check out.

Startup times

Are JVM startup times really slow? This article has gone long, so I'll deal with this next time.

Meanwhile, if you learned something about Clojure and the JVM and want to learn more, check out JVM Fundamentals for Clojure. It's a five-hour course covering lots of JVM stuff that I use every day when programming in Clojure. It is not a complete JVM course, of course, just the important stuff for using Clojure. Buy it today or buy a membership.

Get this lesson and ten more in your inbox. Sign up below.

Permalink

crepl: Link to examples

crepl: Link to examples

crepl is a collaborative editor where you can write and run ClojureScript together. You can now create links to new crepl pads that will be loaded with the code from an external source. Once you are in a crepl you can invite others to join you by sending them the url in the browser navigation bar. For instance try out these examples:

Read more

Permalink

JShell: The Java Shell and the Read-Eval-Print Loop

Let's talk about JShell. We can explore it with the JDK 9 Early Access Release. As of now, the general availability of JDK9 is scheduled for 27 July, 2017, and the JShell feature was proposed as part of JEP 222. The motivation behind it is to provide interactive command line tools to quickly explore the features of Java.

From what I've seen, it is a very useful tool to get a glimpse of Java features very quickly, which is especially useful for new learners. Already, Java is incorporating functional programming features from Scala. Consider the move to a REPL (Read-Eval-Print Loop) interactive shell for Java, just like Scala, Ruby, JavaScript, Haskell, Clojure, and Python.

Permalink

Where Should Tests Live?

I came across a design question when working on elm-test recently:

What if you could write tests in the same file as the code they're testing?

There are a few testing systems that support this:

  • clojure.spec (to vastly oversimplify) lets you define tests inline in the same file as your business logic.
  • Rust allows putting tests in the same file as the code being tested, although the documentation recommends putting them in a separate file (without elaboration as to why).
  • Doctests (such as in Python, Elixir, Go, and Haskell) allow writing tests in the same file—theoretically for documentation purposes, although potentially for testing purposes instead.
  • EUnit in Erlang runs all functions in a given module that end in _test as tests, including business logic modules.
  • Racket supports inline tests.
  • Pyret supports "testing blocks" which can live in any source file.
  • Test::Inline adds support for this to Perl.

I've never used any of these, but I'm wondering how the experience would be in practice. Is it amazing? Unexciting? An outright bad idea? What are the pros and cons compared to putting tests in a separate file?

If you've tried writing tests like this before, what did you think?

Permalink

Bayadera = Bayes + Clojure + GPU (slides for my Bob Konferenz 2017 talk)

Hi Clojurians! I wished I had time to write a few posts since EuroClojure, but I got carried away by some other stuff (there are a few cool things coming up in the next version of Neanderthal, and some Clojure CUDA stuff after that!). The good news is that I have more interesting themes to write about soon!

For now, I hope that the slides for my upcoming talk at Bob Konferenz, that takes place on 24th of February in Berlin will give you an interesting preview of Bayadera, a Clojure library for high performance Bayesian Data Analysis.

The slides are available here, just open them in the web browser. You can also read the abstract of the talk here.

Please note that there are two directions to browse: down/up, and left/right. Down/up arrows move you through slides, while left/up arrows move you through sections. Normally, you'd press down-down-down until there are no more slides, and then right to cross to the next section. ESC gives you the bird's view of the structure.

Permalink

Java Generational Garbage Collection

Java Generational Garbage Collection

How the JVM makes Clojure's data structures possible.

Persistent data structures require really good garbage collection. Lisp has always had persistent data structures. The cons list is persistent because it can share structure. When Clojure came out, it featured immutable persistent data structures. And not just the list. It also has vectors, maps, and sets.

Because they're immutable, once an object is instantiated, it is never changed. That means Clojure has to use a "copy-on-write" discipline. Instead of modifying the object, you make a modified copy. That means you've got lots of garbage. You're making copies all the time.

Luckily, because the data structures are persistent, it means that they share a lot of their structure. The amount of garbage memory per modification can be quite low relative to the amount of memory that can be reused.

But it's still a lot of garbage. Every seq allocates. Every time you modify a vector or a map, it allocates. Garbage is the name of the game in Clojure. That's why garbage collection was invented. A lot of research has gone into garbage collection, particularly for reducing the amount of time the GC has to pause.

One of the coolest ways to reduce that pause time is to consider the age of the objects. Most objects that are old (they were instantiated a while ago) tend to stick around. So why look at them frequently to check? It also turns out that most objects are discarded very soon after being used. So you get rid of most of your garbage very quickly. This type of memory management is called Generational Garbage Collection.

Generational GC is one of the reasons Clojure can be so fast. Clojure creates so much "ephemeral" garbage, it's important to be able to allocate and collect it quickly. Java's memory management has been tuned and worked on for years. I tried to find numbers to quantify how much effort has been put into it, but I couldn't. I think it's safe to say that it's in the millions of dollars. Allocation and collection are down to a handful of instructions per object.

Clojure's data structures exercise the JVM's fast GC. While using them is never going to be as fast as pure Java, the JVM does allow Clojure to be practical. Of course, the memory usage still needs to be configured and managed, but the JVM allows us to program at a much higher level and to take advantage of shared-memory parallelism.

If you're not experienced in the JVM, all of those details can be overwhelming. People have told me flat out that they would love to write Clojure but they're afraid of the JVM. This is a shame, because the JVM is what enables Clojure. I wanted to create something that would help people feel comfortable with the JVM without spending years gaining experience.

So I created a five-hour course called JVM Fundamentals for Clojure. It won't turn you into a JVM expert, but it will give you an in-depth tour. It teaches you all of those things that I do day-to-day that I've seen people get stuck with. How do you deal with out of memory errors? How do you navigate this huge standard library? All of that stuff is covered.

You can get the course by buying either a watch online or a watch-online-and-download version. Bot give you lifetime access. Another way to get it is to buy a membership. That way, you get the JVM course and more than 30 hours of other material on everything from web development to macros.

Of course, not everything is rosy on the JVM. Besides the complexity, there are a few things that are not so great that actually make it a less-than-ideal host. We're going to get into some of those next time.

Get this lesson and ten more in your inbox. Sign up below.

Permalink

Literate [Clojure] Programming: Publishing Documentation In Multiple Formats

This blog post is the sixth, and the last, of a series of blog posts about Literate [Clojure] Programming in Org-mode where I explain how I develop my [Clojure] applications using literate programming concepts and principles.

This last post introduce a tool that leverage the side effect of having all the codes neatly discussed: it gives the possibility to automatically generate all different kinds of project documentation with a single key-binding. The documentation that we will generate are:

  1. Human readable HTML general project documentation
  2. Programming API documentation
  3. Book style complete project documentation in PDF

This series of blog posts about literate [Clojure] programming in Org-mode is composed of the following articles:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file
  4. Tangling all project files
  5. Publishing documentation in multiple formats (this post)
  6. Unit Testing

Documentation, Documentation, Documentation!

Not all documentation is equal and depending on what you are looking for, different kinds and different formats of documentation may work better depending on a task. If I want to read in depth the architecture of a project, if I want to understand the underlying assumptions and decisions that lead to a certain design, if I want to read text with figures about data structures I would prefer reading article like documentation in form of a HTML page or a PDF document.

However, if I am programing something using a certain API, what I will be looking for is the documentation of the API with maybe some supporting material as required. I will be looking to understand how a specific function works, how its parameters are meant to be used, etc. I may also simply be looking for a consolidated list of available function calls. That kind of documentation is quite different than the previous one.

The beauty of Literate Programming is that as a software developed, I don’t have to write each of these types of documentation independently. I have everything I need in my literate programming files (in this case, in my Org-mode files) to generate all kinds of different purposes documentation.

Note that everything that is discussed in this series of blog posts has been applied to the org-mode-tests-utils project. I would strongly suggest you to browse that project along with its folder structure to see how this works in a real [tiny] project.

Now, let’s see how this can be accomplished.

publish.org

What we want to do to generate all the different kinds of documentation is simply having to open a Org-mode file, and to click C-c C-v t which will run all the code blocks in the file that will generate all the documentation we want. That file is called publish.org and is located in the /org/ root folder.

This file is split into three general sections:

  1. Generate HTML Documentation
  2. Generate API Documentation
  3. Generate PDF Documentation

Each of these sections contains the ELisp code blocks that are used to generate the different kind of documentation.

Generate HTML Documentation

What the HTML documentation generator does is to search and find [recursively] all the .org files that exists in the /org/ folder of your project. Then it will generate a sitemap.org file if it is not already existing that it will use to generate the HTML documentation.

Themes

Different themes and styles can be defined for the generated HTML pages. To enable a theme, you simply have to select the proper :html-head setting.

The main themes come from the org-html-themes extension. The theme currently being used is called readtheorg.

Publishing Options

A series of settings can be configured to create the documentation the way you want. Here are the main settings:

  • Settings
    • :base-directory
      • The base directory is the current directory which is the [project]/org/ directory where all the Org files are defined
    • :recursive
      • We specify that we want Org-mode to generate HTML files for each Org file in all children folder (recursively)
    • :publishing-directory
      • We specify where we want to publish the HTML documentation from the Org files
    • :publishing-function
      • We specify that we want to publish everything in HTML
    • :section-numbers
      • We don’t want any kind of section numbers generated by Org-mode
    • :with-toc
      • We want to include a table of content for each generated documentation file
    • :auto-sitemap
      • We want to generate a sitemap automatically. The file is named sitemap.html
    • :html-head
      • We want to specify a style sheet that will be used by each generated HTML file. It should be located in doc/html/css/

Additional settings and configurations are available from these two web pages:

Publish

To publish in HTML, you simply have to run the following code blocks:

(defun org-publish-org-sitemap-includes (project &optional sitemap-filename)
  "Create a sitemap of pages in set defined by PROJECT.
Optionally set the filename of the sitemap with SITEMAP-FILENAME.
Default for SITEMAP-FILENAME is `sitemap.org'."
  (let* ((project-plist (cdr project))
         (dir (file-name-as-directory
               (plist-get project-plist :base-directory)))
         (localdir (file-name-directory dir))
         (exclude-regexp (plist-get project-plist :exclude))
         (files (nreverse
                 (org-publish-get-base-files project exclude-regexp)))
         (sitemap-filename (concat dir (or sitemap-filename "sitemap.org")))
         (sitemap-title (or (plist-get project-plist :sitemap-title)
                            (concat "Sitemap for project " (car project))))
         (sitemap-sans-extension
          (plist-get project-plist :sitemap-sans-extension))
         (visiting (find-buffer-visiting sitemap-filename))
         file sitemap-buffer)
    (with-current-buffer
        (let ((org-inhibit-startup t))
          (setq sitemap-buffer
                (or visiting (find-file sitemap-filename))))
      (erase-buffer)
      (insert (concat "#+TITLE: " sitemap-title "\n\n"))
      (while (setq file (pop files))
        (let ((link (file-relative-name file dir))
              (oldlocal localdir))
          (when sitemap-sans-extension
            (setq link (file-name-sans-extension link)))
          ;; sitemap shouldn't list itself
          (unless (equal (file-truename sitemap-filename)
                         (file-truename file))     
            (let ((entry
                   (org-publish-format-file-entry
                    org-publish-sitemap-file-entry-format file project-plist)))
              (insert (concat "* " entry "\n"
                              "#+INCLUDE: " link "\n"))))))
      (save-buffer))
    (or visiting (kill-buffer sitemap-buffer))))
(setq org-publish-project-alist
      '(("org-mode-clj-tests-utils--doc-html"
         :base-directory "."
         :publishing-directory "../doc/html"
         :publishing-function org-html-publish-to-html
         :section-numbers nil
         :recursive t
         :exclude "fulldoc\\.org\\|project\\.org\\|tangle\\-all\\.org\\|setup\\.org\\|publish\\.org"
         :with-toc t
         :auto-sitemap t
         :sitemap-function org-publish-org-sitemap-includes

                                        ; ReadTheOrg Theme
         :html-head "<link rel=\"stylesheet\" type=\"text/css\" href=\"themes/styles/readtheorg/css/htmlize.css\"/>
<link rel=\"stylesheet\" type=\"text/css\" href=\"themes/styles/readtheorg/css/readtheorg.css\"/>
<script src=\"https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js\"></script>
<script src=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js\"></script>
<script type=\"text/javascript\" src=\"themes/styles/lib/js/jquery.stickytableheaders.js\"></script>
<script type=\"text/javascript\" src=\"themes/styles/readtheorg/js/readtheorg.js\"></script>")))
(setq org-publish-use-timestamps-flag nil)
(setq org-export-html-style-include-scripts nil
      org-export-html-style-include-default nil)

(org-publish-all)

Once everything is properly configured, (org-publish-all) is run and the HTML documentation get generated in the /doc/html/ folder. You can see what the generated HTML documentation looks like here.

Generate API Documentation

Programming API documentation is also easy to generate. In this case, what we are doing is to use the tangled code to generate the API documentation. Here we are using Clojure’s Codox API code generator to generate the documentation, but we could have used any other such libraries to do the same.

To generate the API documentation, we simply have to run this Clojure code block and use Codox’s API to generate the full documentation.

(use 'codox.main)

(generate-docs {:output-path "doc/api"})

The generated documentation will appear in the /doc/api/ folder. You can see what the generated API documentation looks like here.

Generate PDF Documentation

Finally, we can leverage Org-mode’s internal (org-publish-current-project) internal function to generate a PDF version of each Org-mode file within the /org/ folder (recursively). We leverage the org-latex-publish-to-pdf publishing function to generate the files in PDF.

(setq org-publish-project-alist
      '(("org-mode-clj-tests-utils--doc-pdf"
         :base-directory "."
         :publishing-directory "../doc/pdf"
         :publishing-function org-latex-publish-to-pdf
         :recursive t
         :section-numbers nil
         :with-toc t
         :auto-sitemap t)))
(org-publish-current-project)

The generated PDF documentation will appear in the /doc/pdf/ folder. You can see what the generated API documentation looks like here.

Conclusion

This is what conclude my series of six blog posts about how I do Literate Programming [in Clojure using Org-mode]. As I mentioned in another article, the problem of writing readable code which is well commented, well documented and well tested is that ideally we would have to focus on all these aspects at the same time, but given the development environments used by most people, it is not possible. You will plan an aspect of your program and write the code. Then if you are really lucky and you will find (or take) the time to write some documentation and create some unit tests. The problem is that each of these tasks are siloed: they are performed in isolation with 4 different states of minds, at 4 different times and hopefully within 4 weeks. The worse happens when you start fixing bugs or improving the code: comments, documentation and unit tests will often remain unchanged and lagging behind.

This is what Literate Programming is for me: a way to perform all these tasks at once, with the same state of mind, at the same time. This is a process to put in place, a new way to work. The problem is to put in place that process, that way to work, that enables you do to all this at once. I hope I have been able to put in place and explains a Literate Programming process that can work for you and shows the benefits of doing so.

I have the feeling that it will become more and more important to write readable code and to write about the thought process that lead to that written code. Much of the code we are writing in these days is code that manipulates and transform data, code that implement machine learning workflows and such. The kind of code that greatly benefit to be readable by many people other than the ones that write the code.

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.