Full Stack Engineer

Full Stack Engineer

Ladder | Palo Alto, CA
We started Ladder to fundamentally re-design life insurance to better serve co
$105000 - $150000

Want to work on something that matters? Ladder is the smart, modern way to insure your life. We started Ladder to fundamentally re-design life insurance to better serve consumers. Life insurance properly designed is an amazing product for families and communities.

Transforming an industry requires the best people. At Ladder, we are building something big. We love our work. The most fundamental ingredient? The people on our team.

Our team is small but strong with folks from Dropbox, Google, Harvard, and Stanford. We're a growing team of innovators going after a huge market. We're committed to building something important through innovative technology and beautiful design---alongside world-class financial and insurance expertise.

What You'll Do:

  • You'll be the foundation of our eng team and you'll help us reimagine how life insurance is done
  • Build full stack features
  • Build internal tooling / 3rd party integrations that accelerate our engineering output
  • Help create an office environment conducive to creativity and growth

Day to Day:
- Whiteboard architecture with other engineers
- Have solid chunks of "maker time" to get stuff done
- Triage / root cause / fix problems that arise in production
- Wear product hat and help figure out what problems we need to solve and solutions that actually address those problems
- You need to be able to interface easily with design, growth, backend partners

Must haves:
- Ability to execute independently on projects with lots of ambiguity- Ability to work in small teams with product, design, and risk to solve problems
- Internalize our company goals and think like we do
- Willingness / flexibility to plug other unknown engineering holes that will arise
- Be a selfless team member and help unblock other team members
- Be willing to become an industry expert in our vertical (life insurance)

Nice to Haves:
- Familiarity with docker, kubernetes, terraform, bash, linux, AWS
- React, GraphQL, Relay, Clojure, JVM, Datomic experience
- Knowledge of HTML / DOM / CSS / Browser APIs

Permalink

The State of ClojureScript Compilation in Lumo

Lumo has shipped with experimental support for compiling ClojureScript projects entirely without the JVM since the beginning of 2017. Starting with the newly released version, the Lumo build API has been greatly enhanced and much more stable! Read on for a rundown of the state of ClojureScript compilation in Lumo.

I recently gave the first public talk about Lumo at ClojuTRE in Finland (video). Meeting people who are using Lumo daily, either in their day jobs or simply to play with Clojure(Script) always does a really good job at keeping me motivated me to continue working on Lumo.

Current state of affairs

The current Lumo build API is mostly a prototype that I put together demonstrating that we could have a JVM-less ClojureScript compiler. A great number of features is lacking, and it can only compile very simple projects.

Today!

Since ClojuTRE, I’ve been hard at work, and today I’m proud to announce that Lumo’s build API is, with one exception1, at feature parity with the ClojureScript JVM compiler.

Most notably, it also features the ability to process JavaScript modules, including those from NPM (in a node_modules installation).

I encourage you to update Lumo to the newly released 1.8.0-beta2 version and try out the revamped build API. Feedback is most welcome!

The road ahead

The key to unlocking feature parity with the JVM ClojureScript compiler has been the JavaScript port of the Google Closure Compiler. Without it, neither JS module processing nor sophisticated optimizations would be possible in Lumo. However, it is also there that lies the last hurdle to truly achieving feature parity with ClojureScript on the JVM: the ability to perform code splitting and dynamic chunk loading through Google Closure modules.

In the next few months, we’ll be working hard to iron out some internal details, as well as research the possibility of adding code splitting support to the Google Closure Compiler JavaScript port.

Thanks for reading!


P.S.: Lumo is built on my personal time, without the backing of a big corporation. Its development and long term sustainability rely on financial support from the community. If you or your company are using Lumo, please consider supporting the project in its OpenCollective page. We would like to thank JUXT for their recent sponsorship of Lumo.


1 At the time of this writing, the JavaScript port of the Google Closure Compiler can't yet handle Google Closure Modules (Code Splitting). ↩
2 The beta version can be installed normally from NPM. If you're using Homebrew, this beta version can be installed with brew install --devel lumo. ↩

Permalink

Global Mutable State

One of the biggest problems in software is global mutable state. It makes your code difficult to work with, and once you go down the road, it keeps getting worse. Reducing the amount of global mutable state in your program is one of the best ways to improve the quality of your code, regardless of whether it's procedural or functional.

Definition

Global mutable state has three words, and each is important:

Global means that it's accessible from any other point in your code. This ties all of your code together. You have to reason about the whole program instead of reasoning about a small part, because any other part can touch it.

Mutable means that it can be changed. You'll usually see that anyone who can read the value can also change it. Two reads right next to each other in the code might return different values. Or, worse, the data structures they return themselves are changing, even after a read.

State is harder to define. But it basically means that the value depends on the history of the program. How far into the history? Well, in the worst case (namely, global mutable state), it means the entire history. You have to know everything about how the program was executed, including how threads were interleaved.

When you combine global, mutable, and state, you get a big mess. When people say "it's hard to reason about", what they really mean is "it's got bugs and you can't tell by reading the code".

The nice thing is that you can systematically remove those same three aspects. And you can remove them more or less separately. I like to say that it's possible to program functionally in any language, even the most procedural languages out there. One way to do that is to reduce the amount of global mutable state as close to zero as you can.

Identifying Global Mutable State

Some telltale signs: multiple variables in the global scope (in Clojure: multiple atoms in the toplevel of a namespace), reading and writing to the globals with no clear patterns (or reading from the globals multiple times in a small piece of code). The variable could have changed values between reads.

Cleaning up

It's actually hard to get rid of global mutable state once it's in there. Its usage will spread if it's not tied down. Global mutable state is so useful that it can actually be used for many different purposes. After a while, it's hard to see what the usage patterns are and how you would go about replacing them. But we can tackle each of the naughty aspects in turn.

1) Does the variable need to be global?

Maybe you can rework the code so that an object is passed into
functions instead of being a global variable. That would mean you can create a new instance each time you run the code, which at least guarantees that it is starting from a known value each time and that you are encapsulating the mutation in different executions.

In other words, turn global variables into local variables. The best is local to the function doing the mutation (or smaller scope, if possible). Next best is an instance variable on a local object.

It's very tempting to use globals because they're an easy way for different parts of the code to work together. Here's an example:

var file;                            // the dreaded global variables
var recordCount;

function readFile() {
  file = openFile("input.txt");      // global mutation here
}

function countRecords() {
  recordCount = 0;
  for(var c in file.lines()) {       // global read
    recordCount++;                   // global mutation here
  }
}

function generateOutput() {
  for(var c in file.lines()) {       
    print(c + "," + recordCount);
  }
}

function processFile() {
  readFile();                        // these lines have to be in this order
  countRecords();
  generateOutput();
}

Let's try to make the variables less global using the technique above.

// got rid of the globals
function readFile(state) {                // functions now take the state
  state.file = openFile("input.txt");
}

function countRecords(state) {            // see, the state is now an argument
  var x = 0;                              // use a local here, instead of storing
  for(var c in state.file.lines()) {      //   intermediate values in the global
    x++;
  }
  state.recordCount = x;                  // then assign the state once
}

function generateOutput(state) {          // state as argument, again
  for(var c in state.file.lines()) {
    print(c + "," + state.recordCount);   
  }
}

function processFile() {
  var state = {};                         // the state is now local (still mutable)
  readFile(state);                       
  countRecords(state);                   
  generateOutput(state);
}

The biggest transformation we do is to pass a state object to each of the methods. It is no longer global. Each time we run processFile we will generate a new instance. We start from a known initial state and we know we won't have any contention for that object.

The other transformation we did was to rely more on local variables for accumulating intermediate values. This may seem trivial, but it means that at no point does our state object contain inconsistent data. It either does not contain the data or it's correct.

2) Does it need to be mutable?

Are there functions that read from but don't write to the variable? They could be changed to take the current value as an
argument. Reducing the amount of code that relies on those particular variables is a good thing.

In other words, do as much work as possible using only the arguments and return values of your functions. Isolate the mutation of the variable to a small portion of your code.

Let's apply this technique to code we just modified.

function readFile() {
  return openFile("input.txt");     // instead of mutating state,
}                                   //    just return the value

function countRecords(file) {       // take just the state you need as arguments
  var x = 0;
  for(var c in file.lines()) {
    x++;
  }
  return x;                         // return the value you calculate
}

function generateOutput(file, recordCount) { // take the two values you need
  for(var c in file.lines()) {               //     as arguments
    print(c + "," + recordCount);
  }
}

function processFile() {
  var file = readFile();     // then use local variables
                             //    (initialized but never mutated)
  var recordCount = countRecords(file);
  generateOutput(file, recordCount);
}

We've translated code that wrote to a mutable argument into code that merely returns the value it calculates. Then we use local variables to hold the return values for later. Notice how readFile is doing so little work now (it's just a function call) that maybe we will want to remove it and just call the openFile directly. That is up to you to decide, but it's one of the things I notice a lot when removing mutation: functions become trivial to read and write, and often they are so trivial you will want to inline them.

function countRecords(file) {
  var x = 0;
  for(var c in file.lines()) {
    x++;
  }
  return x;
}

function generateOutput(file, recordCount) {
  for(var c in file.lines()) {
    print(c + "," + recordCount);
  }
}

function processFile() {
  var file = openFile("input.txt"); // we can just inline this one-liner
  var recordCount = countRecords(file);
  generateOutput(file, recordCount);
}

3) Does it need to be state?

Can the algorithms be reworked so that their natural input and outputs (arguments and return values) are used instead of writing to a location? For instance, maybe you're using the variable to count stuff. Instead of the function adding to a variable, maybe it could just return the total count instead.

Programs need state. But do we need to rely on the state to get the right answer? And does our state need to depend on the whole history of the program?

Let's go through step by step in our code, removing state.

function countRecords(file) {
  var x = 0;                    // here's our state
  for(var c in file.lines()) {
    x++;                        // it changes each time through the loop
  }
  return x;
}

The variable x is state. Its value depends on how many times the loop body has executed. Usually, this kind of counting loop is unnecessary because the standard library can already count a
collection.

function countRecords(file) {
  return file.lines().length();  // we prefer not having to deal with the state
}

Wow! There's no state, now. And in fact, it's so short we can just inline it. It's called once in processFile. Let's inline it there.

function processFile() {
  var file = openFile("input.txt");
  var recordCount = file.lines().length(); // inline the one-liner (optional)
  generateOutput(file, recordCount);
}

That's better. But we still have state. It's not terribly much, but let's continue with the exercise. Notice how we rely on the state of recordCount to pass to generateOutput. What's to guarantee that the count we provide isn't different from the count of file? One possible direction to go is to move the recordCount calculation into generateOutput. Why should generateOutput trust someone else when it could just calculate it itself?

function generateOutput(file) { // eliminate an argument that needed to be kept in sync
  var recordCount = file.lines().length(); // calculate it ourselves
  for(var c in file.lines()) {
    print(c + "," + recordCount);
  }
}

function processFile() {  // now our process is two steps
  var file = openFile("input.txt");
  generateOutput(file);
}

And now we don't need that little local variable called file.

function processFile() {
  generateOutput(openFile("input.txt")); // it can be written as one step
}

Conclusion

I've taken this simple example to an extreme. And, yes, this was a trivial example. But my experience with real world code tells me that you see the same kind of improvements when you remove global mutable state in real systems. The code becomes easier to reason about (because you're reasoning locally). It becomes easier to refactor. It becomes easier to eliminate code.

Reducing global mutable state is one of the hallmarks of functional programming. But it's also just good coding. You can (and should) do this kind of refactoring in any programming language or paradigm. If you're interested in going deeper with functional programming, I recommend the PurelyFunctional.tv Newsletter. It's a weekly email about Functional Programming, with a focus on Clojure. I'll also send you some great information about learning Clojure.

Permalink

A Guide to Transitioning from GUI Editors to Vim

Until recently, I had done all of my coding in IDEs and GUI text editors. From Notepad++ to Visual Studio and Xcode, I felt like my text-editing toolset was more than adequate for the work I needed to do. I knew about Vim and Emacs, but they both seemed like esoteric, rocket science editors that only became relevant when a Git merge forced me into the file of a commit message.

Beginning with my first project here at Atomic, however, Vim has become an indispensable tool for developing on remote machines and quickly editing miscellaneous files. In fact, it’s slowly but surely becoming my editor of choice. This is a guide for those who are transitioning to Vim from GUI editors and need to know the bare minimum before steadily becoming an expert. Hopefully, after reading this article, you will be able to use Vim almost as effectively as Sublime or VS Code.

Before We Begin

  • Keep in mind that Vim has different modes. The relevant ones here are Normal mode, which lets you move around the file and enter commands, and Insert mode, for actually adding text and backspacing.
  • As you’re learning Vim, pay close attention to the mnemonics associated with different commands. They aren’t always obvious (for example, `y` typically means “yank,” which is equates to “copy”), but they will reduce the learning curve significantly.
  • Don’t forget that this is a guide to using Vim like a GUI editor. That’s not quite the same thing as using Vim as effectively as possible. Over time, try to break your usual habits in favor of more Vim-specific ones. For example, you can usually prefix a command with a number to execute it as a batch, instead of simply repeating the command over and over.
  • Throughout this article, I’ll be referencing keys associated for macOS, like `cmd` (⌘) and `opt` (⌥). Translating to Windows-specific keys should be fairly trivial.

Movement

Ideally, all of the action in Vim takes place on the keyboard. Using your mouse or trackpad to move the cursor is strongly discouraged, and for good reason: Vim provides a plethora of commands for navigating a file efficiently.

In a GUI editor, most navigation occurs via (1) the arrow keys, (2) simple shortcuts like `cmd ⇦` and `opt ⇨`, and (3) scrolling/clicking with the mouse or trackpad. It’s easy to achieve the same movements in Vim:

  • The arrow keys are always available for use, whether in Normal or Insert mode.
  • While in Normal mode, use `b` to navigate to the beginning of the word to the left of the cursor, or `e` to navigate to the end of the word to the right of the cursor.
  • Use `0` to move to the beginning of the line, and `$` to move to the end of the line.
  • Since you really should avoid the mouse/trackpad, scrolling through a file can be accomplished pretty effectively with `cmd u` and `cmd d`, which move the cursor half a page up and down the file respectively. For full-page jumps, use `cmd b` and `cmd f`.
  • To jump up or down by a certain number (#) of lines, use `#j` or `#k`. Or, jump directly to any line with `#G` (one of my favorite shortcuts).

Inserting Text

Unlike GUI editors, you can’t simply start typing in Vim to insert text. First, you have to get into Insert mode. There are several convenient ways to do that:

  • `i` is the simplest – it puts you into Insert mode without moving the cursor.
  • `o` first opens a new line beneath the current one, then starts Insert mode.
  • `O` works like `o`, but the new line is opened above the current line.
  • `A` moves the cursor to the end of the line before starting Insert mode.

It may take a while to get used to working with Vim’s different modes. To make it easier, get back to Normal mode (by using esc) any time you are not actively inserting text. Otherwise, you’ll inevitably start trying to enter commands before realizing you’ve simply inserted them into the file.

Deleting Text

Here are some common ways to delete text in a GUI editor, along with the corresponding commands in Vim. Keep in mind that there are always multiple ways to accomplish the same thing, and that these are just what I consider the most straightforward options.

  • Simple backspacing: `x` will delete the character under the cursor when you’re in Normal mode.
  • Deleting backwards, one word at a time (like `opt dlt`): Use `db`. Instead of typing two characters each time when deleting multiple words in a row, use `.` to repeat the deletion command.
  • Deleting to the beginning of the line (like `cmd dlt`): Use `d0`. In general, typing `d` and then some movement command will delete everything between the present location and the movement.
  • Deleting a line: In GUI editors, I usually move to the end of the line (`cmd ⇨`) and then delete backwards (`cmd dlt`). In Vim, it’s as simple as `dd`.
  • Highlighting a bunch of text and then backspacing: Vim also has a Visual mode, which is analogous to highlighting in GUI editors. Use `v` to get into Visual mode, then use any Movement commands (or the mouse if you love bad habits!) to highlight what you need to. Then just enter `d` to delete it all.

Finding and Replacing

Vim has a number of commands for finding and replacing strings. Here are the most essential ones:

  • Instead of `cmd f` to find a string, type `/some` string then hit enter to go to the next occurrence (if any) of “some string,” and then `n` to scroll through subsequent occurrences.
  • To search upwards through the file, begin with `?` instead of `/`.
  • To replace all occurrences of a string with another, use this form: `:%s/original/replacement/g`. Of course, there are all kinds of options for replacing more effectively.

Unfortunately, it’s a bit trickier to rename a method, variable, etc. across multiple files. To do that, I would recommend using the Bash command sed, and having good unit tests in place to quickly confirm that you didn’t break anything. This is also a good opportunity to appreciate the power of IDEs.

Working Across Files

Two nice features of GUIs are seeing/tabbing between open files and navigating through directories. While you may miss out on the visual layouts of these features, here are the closest approximations in Vim (plus some other common commands):

  • To list all open buffers (files), use `:ls`.
  • To go to an open buffer, use `:b filename`. You’ll find autocomplete via tab to be useful here.
  • Visualizing a directory is not so easy. My strategy in this scenario is to put Vim into the background with `ctrl z`, and then search around the directory within the terminal before foregrounding Vim with `fg`.
  • Open up a new buffer with `:e filename`.
  • To save the current buffer, `:w`. To close, `:q`. And to save and close in one fell swoop, `:wq`.

See this nice Stack Overflow answer for some other techniques.

Going Forward

Once you can use Vim as easily as Sublime, don’t stop! There’s a whole world of commands, utilities, and customizations to learn. Here are a few resources to keep the momentum going.

Good luck and happy editing!

The post A Guide to Transitioning from GUI Editors to Vim appeared first on Atomic Spin.

Related Posts

Permalink

DevOpsDays India 2017 Conference Notes

This was originally published in my personal blog.

DevOpsDays is a technical conferences covering topics of software development, IT infrastructure operations, and the intersection between them. Topics often include automation, testing, security, and organizational culture.

It is a place where people and companies come together and share their experiences on how they handled different challenges related to DevOps.

DevOpsDays India 2017 happened in Bangalore on September 16 and 17th and it was my first time attending DevOpsDays India conference and I took lot of notes. Here are the digitised version of my notes of the different talks as I have written down/remember them.

Day 1

Keynote by Nathen Harvey

Nathen Harvey (@NathenHarvey) is the VP of Community Development at Chef Software. If you want to learn Chef, head out to learn.chef.io for a nice series of tutorials and courses.

  • We engineers are
    • Not tied down to legacy, technical debt.
    • Not spin up cloud servers.
    • Not just build systems.

  • Your primary job is to "Delight your customers”.

  • Build ➡ Scale ➡ Service your customers.

  • Code / Chef / ansible / whatever you use, always put it in a VCS.

  • Writing Tests (good) vs Nagios Alerts (bad).
    • Writing any test cases is good.
    • Writing NO test cases is bad.
    • Test cases will evolve as you evolve. So doesn't matter if they are not comprehensive.

  • Always do Static Analysis
    • cookstyle - tests the style guide for ruby scripts.
    • foodcritic - tests the style guide for chef scripts.

  • Integration Testing
    • Start a Virtual Machine.
    • Get your code onto it.
    • Test.

  • Use the FOUR Eye Rule
    • Always have four eyes look at the code before pushing to production.
    • Code review is an important part of the testing process.

  • Source Repo ➡ Artifact ➡ Artifact Repo.

  • Information Security
    • 81% IT Professionals think infosec team inhibits speed of deploy/production.
    • 77% Infosec Professionals also think so.

  • For collaboration, make things visible.
    • make it clear where our constraints & bottlenecks are.

  • Software Success Metrics:
    1. Speed
    2. Efficiency
    3. Risk

  • Try to Detect Failures Early on.

  • Continuous Improvement
    • Learn Actively.
    • Share information - metrics/reports.
    • Align incentives & Objects of all teams.

  • Post Mortem - Always ask these 2 questions:
    • How could we have detected this sooner?
    • How could we avoid it in the future?

  • Know your customer.

  • Make work visible.

  • Measure Progress.

  • Participate in the community.

My important takeaway from this talk is "Delight your Customer". The only reason you are being hired to do the job you do is to solve your customer's problems.

As engineers/developers we are able to create something new. But whatever we create, if it doesn't solve a customer's pain point, it is of no use.

Managing your Kubernetes cluster using BOSH

Ronak Banka (@ronakbanka) is working with Rakuten

Principle at Rakuten:

  1. Always improve. Always Advance.
  2. Passionately professional.
  3. Hypothesise -> Practice -> Validate -> Shikunika.
  4. Maximise Customer Satisfaction.
  5. Speed! Speed! Speed!

Pain points of growing automation tools:

  • High availability
  • Scaling
  • Self healing infra
  • Rolling upgrades
  • Vendor lock-in

Using BOSH, release engineering, deployment, lifecycle management, etc., becomes easier.

Moving 65000 Microsofties to Devops on the Public Cloud

Moving to One Engineering System

By Sam Guckenheimer (@SamGuckenheimer) (Visual Studio Team Services)

  • In Microsoft before 2011s, Each team had their own Enginnering, own code base, No dependencies amongst teams.
  • Satya Nadella introduced One Engineering System. More open.
  • Productivity is more important any other feature.
  • PR ➡ Code Review ➡ Test cases ➡ Security Test ➡ Merge to master ➡ Continuous Integration.
  • Runs 60237 tests in 6:39 minutes.
  • There is a limit of 8 min for running test cases. Previously was 10 min.
  • 12 hours limit to do code review. Else the PR expires. Dev has to resubmit a new PR/Code review.
  • Git Virtual File System
    • Linux Kernel 0.6GB
    • VSTS 3GB
    • Windows 270GB
    • Overall 300x improvement from cloning to commiting.

  • Metrics to track
    • Live Site Health
    • Velocity
    • Engineering
    • Usage

  • Publish Root Cause Analysis (RCA) in a blog for more serious issues - more transparency.

  • Assigning to teams
    • Each team lead has 2 mins to pitch about his team and why people need to join their team.
    • developers can choose 3 teams they are interested to work in and their priority.
    • Match developers to teams based on choice and availability.
    • > 90% match rate.

  • Sprints
    • Don't think more than 3 sprints ahead.
    • too much uncertainty and stale features.


Metrics to track

My takeaway: Even a huge organization like Microsoft can be made to follow best practices. There is hope for the small startups that do stuff adhoc.

Devops at scale is a Hard problem

By Kishore Jalleda (@KishoreJalleda), worked in IMVU, Zynga, currently a Senior Director at Yahoo!

  • Democratize Innovation
    • Do not have pockets of Innovation. Eg: Innovation Lab Division.
    • You can't time and control innovation within a specific team.

  • No permissions required to ship features to 1% of users.

  • Move fast. Have very high velocity.
    • But Privacy & Security is a non-negotiable.

  • You wrote it; you own it.

  • You wrote it; you run it.

  • Alerting
    • Alert ➡ team a ➡ team b ➡ team c ➡ dev.
    • Alert ➡ dev. (much better)
    • Ideal: <2 alerts/shift, so that RCAs are possible.
    • Saying No is hard, but powerful.
    • Logs are better than Emails for alerts.

  • Commits go to production:
    • With human involvement.
    • Without human involvement. (better)
    • Needs TDD.
      • Takes time. Lot of opposition. Not every one buys in.
      • Have very very high velocity
      • Test coverage will never be 100% - but its ok. Just like there are bugs in softwares, it doesn’t matter.



  • Automation
    • Don’t reward bad behaviour.
    • Don’t allow developers to NOT automate their stuff. Restarting a server every day at 3am is bad behaviour.
    • Automation allows you to do “Higher value work”.
      • What do you want to do? Restart servers? or write production code?

    • Stop saying things that don’t matter.



  • Public cloud vs private data centers:
    • Go for hybrid approach, buy or build based on time and use case
    • If you do things that are not aligned to the rest of the company : "break the rules but break them in broad daylight".
    • Build new apps with cloud in mind.

  • Incentivise teams to automate.

  • Reward good behaviour.

  • Move fast to stay relevant.

  • Don’t aim for 100% compliance.

  • It is the process that matters.

My takeaway: You can't get 100% compliance from day 1. Getting everyone on board to follow automation, test cases, code review, CI/CD is hard. Eventually people (if they are smart) will get there.

Also, the main incentive to automate your stuff is that you can go ahead and work on much bigger and better challenges. Solving problems for the next 1 Billion users will be more interesting than restarting servers any day.

Lessons Learned from building a serverless distributes & batch data architecture

Raj Rohit (@data__wizard)

  • Always return from your lambda functions. Else it is an error and AWS retries that function again.
  • Don't use server to monitor serverless.
  • Serverless Distributed system ➡ has to be self healing.
    • when processing hundreds of servers, debugging a lost file is hard.

  • have proper load balancing.

  • Don’t try micro optimizations with distributed systems.

  • Check out Distributed Tracing:

My takeaway: Tools for Distributed Tracing looks interesting. I do want to begin using these so that it will be easier to debug the different microservices.

Day 1 ended with a few lightning talks and then a workshop on Kubernetes.

Day 2

Building Thinking System - Machine Learning Dockerized

By Abhinav Shroff - (@abhinavshroff)

  • Building Applications Smarter - with data.
  • ML Systems require feedback - This requires the right Devops Strategy and tooling.
  • Language
    • Python
    • R

  • Devtools
    • Eclipse
    • Jupyter
    • RStudio

  • Why Containerize?
    • Tuning
    • Bundled
    • Self-contained
    • Clustering
    • Defined Interface
    • Cloud Microservice

  • Container Orchestration Options
    • Docker Swarm
    • Kubernetes
    • Apache Mesos
    • (or)
    • Container Deployment Cloud (Oracle Developer Cloud Service)

  • Microservices
    • Jupyter Kernel Gateway
    • Create a REST service in your jupyter notebook.


Machine Learning Systems Flow
Reference Architecture for Machine Learning in Cloud
DevOps process for Machine Learning System

Mario Star Power Your Infrastructure: Infrastructure testing a la Devops with Inspec

By Hannah Madely (@DJPajamaPantz) & Victoria Jeffrey (@vickkoala) from Chef Software.

As I am not a Chef/related software kind of guy (Ansible FTW), I didn't pay too much attention on this talk. But they had the best slides in the entire conference. Go Mario Go.

  • Inspec Test runner over
    • SSH
    • WinRM
    • Docker

  • Inspec Shell

  • Shorter feedback loop is very important

  • Ready to use profiles available on
    • Github
    • Supermarket

Reliability at Scale

By Praveen Shukla (@_praveenshukla) from GoJek Engineering

Reliability translates to Business Profits. But pitching this to business people is hard. This is the story of how the GoJek Engineering team made their systems reliable at scale.

  • 100 instances to 8000 instances
  • 1 Million (Internal) Transactions per second
  • Reliability
    • Uptime - 4 nines
    • MTBF - Uptime/breakdown
    • Failure per year (AFR)
    • QoS
    • SLA - 2ms response vs 2s response, which is reliable?

  • To define reliability, define failure first.

  • Failure: Systems operating outside specified parameters

  • Failure according to Business: Users are complaining

  • 2015
    • 4 products
    • 10 microservices
    • 100 instances
    • 50+ tech people
    • velocity vs stability

  • 2017
    • 18 products
    • 250+ microservices
    • 8000+ instances
    • 3 data centers
    • AWS, GCE, Own datacenter
    • 350+ tech people

  • Issues

  • CI/CD
    • Jenkins
    • Pipeline access management.
    • custom deployment - have to goto the devops/SRE team.
    • DSL repository management. code and CI lives in two diff repos.
    • No branch based deployment

  • Configuration Management
    • Create a Cookbook for each microservice.
    • 350+ cookbooks.

  • Alerting & Monitoring
    • Alerts getting lost.
    • Not getting alerts to the right person.
    • too many pagers to too many people.
    • who is responsible to take action on an alert?

  • Solutions

  • CI/CD
    • Jenkins ➡ Gitlab CI.
    • bitbucket ➡ Gitlab.
    • CI is now an YAML file and is part of source code.
    • branch & tag based deployments.

  • Configuration Management
    • Use 4 languages
      • ruby
      • golang
      • clojure
      • jruby

    • Create a master cookbook for each language

    • 4 cookbooks instead of 350



  • Alerting & Monitoring
    • Smart Alert Router
      • Each product belongs to 1 Group
      • Each group has many microservices
      • Each microservice runs on multiple servers
      • Each member belongs to multiple groups
      • When an alert happens on a server, the corresponding group alone gets the alert

    • Configuring an alert
      • create an yaml file
      • push it to repo
      • CI configures the alert from the yaml file



  • Reliability and dependency
    • 1 service: R:99%
    • 1 ⬅ 3 services: R: 97%
    • 1 ⬅ 3 ⬅ 3 services: R:88%
    • Fail Fast.
    • Use circuit breakers.
    • 99.99% uptime for 30 microservices gives only 99.7% uptime for the entire system.
    • 0.3% of 1 Billion requests ➡ 3 Million requests failure
    • 2+ hours of downtime every month

  • Queueing Delay
    • Scaling up can’t solve everytime
    • DDOS will kill it
    • Instead throttle your system

  • Reliability is an iterative process


Smart Alerts Router Architecture

Move fast with Stable build Infrastructure

By Sanchit Bahal (@sanchit_bahal) from Thoughtworks

Context:

Thoughtworks was asked to build a mobile app for an Airline Baggage system.

They usually use Git + GoCD for CI/CD. But the client didn’t want a public cloud. So build machines were in house - mix of Macs and linux VMs.

They began experiencing long wait times for the builds, sometimes ~1.5 days.

This causes a delayed feedback for the developer and by the time the build result is generated, the developer has moved ahead. This began causing mostly Red Builds and finally developers stopped looking at the builds.

Journey:

  • Automate the provisioning
    • Using Ansible.
    • installation of xcode, android sdk.
    • use local file server for heavy downloads.

  • Pre-baked golden image
    • OSX + Ansible ➡ image.
    • Use deploy studio.
    • New machine takes 30 mins instead of 2 days.

  • Homogenous Build agents
    • all machines ➡ all build types.
    • better load balanced.
    • better work allocation.
    • resilience.

  • 1 Build agent per machine
    • Simplified set up.
    • easier allocation of work load.

  • Emulator
    • Geny motion ➡ Android Emulator.
    • Genymotion was getting stuck sometimes. Indeterministic.
    • Spin up & spin down of Android Emulator on every test suite.
    • Run in headless mode.
    • save on licensing code. $30000/year for 80 people.
    • Android emulators allow to start in a clean slate and end in a clean slate.

  • Devops Analytics
    • Measure Improvements.
    • Continuous Monitoring.
    • Actionable Insights.
    • Telegraf + influxdb + grafana.

  • Dashboard
    • Build wait time - time to assign a build agent to a scheduled job.
    • Deploy ready build - time from commit to build ready for QA.
    • build machines health - load, CPU, memory, disk, etc.

  • Outcomes
    • Machine provisioning time: 2 days ➡ < 30 mins.
    • Build wait time: 6 hours ➡ < 10secs.
    • Stable, robust, resilient build infrastructure.
    • faster feedback cycle ➡ more commits/day.
    • better developer productivity.

My takeaway: Developer productivity is very important and quicker feedback loop allows you to commit code faster.

Prometheus 2.0

This was a lightning talk by Goutham (@putadent) a GSoC student who was working on Prometheus. He was talking about the lower memory usage and other improvements in v2.0.

It is still in beta, but since Prometheus pulls metrics, Have both 1.x and 2.0 on your servers. Switch versions when it becomes stable.

Prometheus has been on my radar for quite some time and I think I should play around with it to see how it works and where I can plug it in.

Finishing Keynote: Sailing through your careers with the community

By Neependra (@neependra) - CloudYuga

Community! Community! Community!

  • What can community do to your career
    • Satisfaction.
    • Get visibility.
    • Get the job you love.
    • Start something of your own.

  • What stops us from being part of community
    • Time commitment.
    • Company culture.
    • FEAR.
    • Comfort zone.

  • Build a brand for yourself.

That was the end of the DevOpsDays 2017 Conference. Overall most of the talks were pretty good and I did have at least one key takeaway from each talk. From the different talks, it shows that lot of people have struggled to bring about the change in their companies to bring the DevOps mindset of automation, TDD, code reviews, etc.

Permalink

PurelyFunctional.tv Newsletter 243: Specialization, McLuhan, Lumo

Issue 243 – September 18, 2017

Hi Clojurtaters,

Clojure SYNC update:

I’m filling up the speaker lineup quickly; I’ll be announcing more speakers this week on the mailing list, so please sign up.

Thanks to everyone who helped out, the Diversity Statement is much stronger than it was.

Oh, and I forgot to announce the end of the Early Bird ticket price last week. To be fair, I’m extending the $50 discount until Sept 22. Buy tickets, please.

You may have noticed that I have not published new videos in a few weeks. I mentioned a bit ago that I’d be working mostly on improving existing courses for a little bit. Well, I’ve been doing that. I wanted it to produce more visible output for you, but it has been mostly behind the scenes. Besides the extended notes and complete GitHub repo for the Web Dev in Clojure course, I haven’t released anything yet. That gives me some anxiety because I want you to know what’s going on.

So what have I been working on? In short, an overhaul of my first-ever Clojure course, Introduction to Clojure. The original course was ~1.5 hours, divided into three videos. I’ve completely rewritten all of the notes (now 100 pages!), reworked some of the flow of it, and started re-recording it. I’ve learned a lot about how to make these courses since I made this one over 4 years ago. The material from just the original, first 34-minute lesson is now expanded to over 1.5 hours just by itself! I’ve been busy getting all of this together. I will be publishing it as it comes out. Look for it this week 🙂

Finally, I’m speaking at Clojure/conj next month. I need to practice my talk. See you there?

Please enjoy the issue.

Rock on!
Eric Normand <eric@purelyfunctional.tv>

PS Want to get this in your email? Subscribe!


Phrase GitHub

A new library for creating human readable messages from specs specifically used for form validation. It lets you easily pattern match on validation errors to give very specific, context-senstitive error messages for invalid input in a form. Having great validation messages is an important part of form usability.


Specialization vs Collaboration with Aria Stewart Podcast

I’ve been really down on “Agile” recently and this podcast discussion gave me a new mental model for explaining what I’ve experienced. So many times, the dev team is compartmentalized or chooses to compartmentalize itself from the rest of the company. They create an API for giving the developers work (ticket tracker) and expect to be treated like a black box. The discussion in this podcast at least points a direction for re-healing the rift.


On Abstraction YouTube

Zach Tellman speaks about two things in this talk. The first is what he’s found trying to research the meaning of the word abstraction in software. And the second is the problem with decomplecting everything down to little bits: will they form a cohesive whole? Can the code be flexible to respond to changing demands? I can’t wait for him to speak at Clojure SYNC. He’s researching his book full-time now.


Build a Better Monster YouTube

I can’t resist a Maciej Ceglowski talk. This ones is about the uncontrolled collection of personal data.


Clojure-Flavoured Devops YouTube

Jon Pither presents build tools written in ClojureScript. Mach is a repalcement for Make and Roll wraps Terraform.


Bootstrapping a Standalone ClojureScript Environment YouTube

Antonio Monteiro has created a Node.js-based ClojureScript environment, called Lumo, that you can use to script from the command-line. This talk goes into some of the cool parts of Lumo.


Have Your Cookie and Eat it Too YouTube

Josef Svenningsson explains stream fusion, using Haskell, for Clojurists. Stream fusion lets the compiler optimize away the creation of intermediate lists like when you chain maps and filters together.


The Father Of Mobile Computing Is Not Impressed Fast Company

Brian Merchant interviewed Alan Kay for his book The One Device.


Become — How I went from selling food in the street to working for top firms in techBecome — How I went from selling food in the street to working for top firms in tech Medium

Alvaro Videla tells the story of how he learned to code.

Permalink

Functional Data Modeling

ClojureScript is a language designed for working with data. In this post, we’ll look at how to use the builtin collection library as well as a few design patterns to model a domain. ClojureScript places a strong emphasis on relying on generic collection types and the standard functions that operate on them rather than creating highly specialized functions that only work on a single type of object. The object-oriented approach, which most mainstream languages encourage, is to create objects that encapsulate both the data and behaviour of a specific type of “thing”. The practice that ClojureScript encourages, however, is to separate functions and data. Data is pure information, and functions are pure transformations of data.

Functions and Data

Functions and Data

Modeling a Domain

Say that are creating an analytics app. Before we get started, we want to model the type of objects that we will be working with. If we were using a statically typed language, we would probably start by writing type definitions. Even if we were working in JavaScript, we would likely define “classes” for the objects that we will be working with. As we define these objects, we would have to think about both the data that they contain and the operations that they support. For example, if we have a User and a ClickEvent, we might need the operation, User.prototype.clickEvent().

Users and Actions

Our analytics domain deals users and their actions

With ClojureScript, we will consider data and functions separately. This approach ends up being flexible, as we will see that most of the operations that we want to perform on the data are simple and re-usable. In fact, it is common to find that the exact operation that you need is already part of the standard library. Ultimately, the combination of the concision of code and the richness of the standard library means that we write fewer lines of code than we would in JavaScript, which leads to more robust and maintainable applications.

Domain Modeling with Maps and Vectors

We are now quite familiar with what maps and vectors as well as some of the collection and sequence operations that can be used on them. Now we can put them in practice in a real domain: an analytics dashboard. The main concepts that we need to model are user, session, pageview, and event, and the relationships between these models are as follows:

  • A user has one or more sessions
  • A session has one or more pageviews and may belong to a user or be anonymous
  • A pageview has zero or more events

We now know enough to create some sample data. Let’s start at the “bottom” with the simplest models and work our way up to the higher-level models. Since an event does not depend on any other model, it is a good place to start.

Modeling Events

An event is some action that the user performs while interacting with a web page. It could be a click, scroll, field entry, etc. Different events may have different properties associated with them, but they all have at least a type and a timestamp.

Modeling an Event

(def my-event {:type :click               ;;  <1>
               :timestamp 1464362801602
               :location [1015 433]       ;;  <2>
               :target "#some-elem"})
  1. Every event will have :type and :timestamp entries
  2. The remaining entries will be specific to the event type

When we think of data types like event in ClojureScript, we usually create at least a mental schema of the data type. To enforce a schema, we could use clojure.spec or plumatic/schema, but for now we will just enforce the “shape” of our data structures by convention. That is, we will ensure that whenever we create an event, we create it with a timestamp and a type. In fact, it is a common practice to define on or more functions for constructing the new data types that we create. Here is an example for how we might do this with events:

Using a Constructor Function

cljs.user=> (defn event [type]
              {:type type
               :timestamp (.now js/Date)})
#'cljs.user/event

cljs.user=> (event :click)
{:type :click, :timestamp 1464610050488}

This function simply abstracts the process of creating a new object that follows the convention that we have established for events. We should also create a constructor function for click events specifically:

cljs.user=> (defn click [location target]
              (merge (event :click)
                     {:location location, :target target}))
#'cljs.user/click

cljs.user=> (click [644 831] "#somewhere")
{:type :click,
 :timestamp 1464610282324,
 :location [644 831],
 :target "#somewhere"}

The only thing about this code that might be unfamiliar is the use of the merge function. It takes at least two maps and returns a new map that is the result of adding all properties from each subsequent map to the first one. You can think of it as conj-ing every entry from the second map onto the first.

Modeling Pageviews

With events done, we can now model pageviews. We will go ahead and define a constructor for pageviews:

Modeling a Pageview

cljs.user=> (defn pageview
              ([url] (pageview url (.now js/Date) [])) ;; <1>
              ([url loaded] (pageview url loaded []))
              ([url loaded events]
                {:url url
                 :loaded loaded
                 :events events}))

cljs.user=> (pageview "some.example.com/url")          ;; <2>
{:url "some.example.com/url",
 :loaded 1464612010514,
 :events []}

cljs.user=> (pageview "http://www.example.com"         ;; <3>
                      1464611888074
                      [(click [100 200] ".logo")])
{:url "http://www.example.com",
 :loaded 1464611888074,
 :events [{:type :click,
           :timestamp 1464611951519,
           :location [100 200],
           :target ".logo"}]}
  1. Define pageview with 3 arities
  2. pageview can be called with just a URL
  3. …or with a URL, loaded timestamp, and vector of events

Just like we did with events, we created a constructor to manage the details of assembling a map that fits our definition of what a Pageview is. One different aspect of this code is that we are using a multiple-arity function as the constructor and providing default values for the loaded and events values when they are not supplied. This is a common pattern in ClojureScript for dealing with default values for arguments.

Modeling Sessions

Moving up the hierarchy of our data model, we now come to the Session. Remember that a session represents one or more consecutive pageviews from the same user. If a user leaves the site and comes back later, we would create a new session. So the session needs to have a collection of pageviews as well as identifying information about the user’s browser, location, etc.

cljs.user=> (defn session
              ([start is-active? ip user-agent] (session start is-active? ip user-agent []))
              ([start is-active? ip user-agent pageviews]
                {:start start
                 :is-active? is-active?
                 :ip ip
                 :user-agent user-agent
                 :pageviews pageviews}))

cljs.user=> (session 1464613203797 true "192.168.10.4" "Some UA")
{:start 1464613203797, :is-active? true, :ip "192.168.10.4", :user-agent "Some UA", :pageviews []}

There is nothing new here. We are simply enriching our domain with more types that we will be able to use in an analytics application. The only piece that remains is the User, which I will leave as an exercise for you.

We now have a fairly complete domain defined for our analytics application. Next, we’ll explore how we can interact with it using primarily functions from ClojureScript’s standard libary. Below is a sample of what some complete data from our domain looks like at this point. It will be helpful to reference this data as we move on.

Sample data for an analytics domain

;; User
{:id 123
 :name "John Anon"
 :sessions [

   ;; Session
   {:start 1464379781618
    :is-active? true
    :ip 127.0.0.1
    :user-agent "some-user-agent"
    :pageviews [

      ;; Pageview
      {:url "some-url"
       :loaded 1464379918936
       :events [

         ;; Event
         {:type :scroll
          :location [403 812]
          :distance 312
          :timestamp 1464380102036}

         ;; Event
         {:type :click
          :location [644 112]
          :target "a.link.about"
          :timestamp 1464380117760}]}]}]}

Permalink

What's nice about Clojure numerical computing with new Neanderthal 0.16.0

I've spent some quality time with my Emacs sipping some CIDER, and it is a good moment to introduce the release 0.16.0 of Neanderthal, the linear algebra and numerical computing library for Clojure. The time spent over the summer on refactoring the foundations for 0.15.0, pays the dividends now. It has been much easier for me to add many new features and polish the old ones. And the best news is that I expect this to continue giving for the upcoming releases.

Where to get it

I guess it is not difficult to Google it, but here is the Neanderthal homepage.

What's new

After adding a wide choice of specialized matrix structures to the last release (triangular, symmetric, banded, packed, etc.), I've decided to round up the roster with a few more sparse types:

  • Diagonal matrix (GD)
  • Tridiagonal matrix (GT)
  • Diagonally dominant tridiagonal (DT)
  • Symmetric tridiagonal (ST)

What's so great about them, you might ask? Remember that numerical linear algebra is a highly demanding when it comes to computing resources. Having some knowledge about the special structure of your data, and having the actual tools in your programming language of choice (Clojure, of course) to use that knowledge, enable you to make impossible task possible, and slow tasks fast.

That's not all

Recently Neanderthal entered the territory that few, if any of the libraries on JVM and similar platforms claimed. NumPy and similar libraries are in wide use and are quite good, but you won't find such a breadth of specialized matrices there. And, like in some infomercial, that's not all. Not only that Neanderthal offers a broad selection of matrix structures, it also offers deep support for specialized polymorphic operations for those specialized matrices. It would be bragging if I said that Neanderthal comes with more high quality implementations for more operations than other performant Clojure/Java libraries even have an API for. But it's not bragging if it's reality.

There's even more

There's many things that Neanderthal now supports, but there's more to be added. Having pretty much cornered the data structures themselves, it's now much easier for me to add more advanced operations, which you've probably already seen if you explored the new stuff in the previous release.

Don't forget that there's also support for GPU computing, and even there Neanderthal goes beyond what other libraries do, and supports not only Nvidia and CUDA, but AMD and Intel and Nvidia with OpenCL!

There will be even more

Depending on the demand, next in the waiting line are two more major things that are not there (yet):

  • Unstructured sparse matrices (structured matrices are already supported!).
  • Tensors (which is a much better thing than "NDArray").

If you are among the people who need some of these two, please share some of your views on how you intend to use them. That might help me make them better.

This stuff seems hard to learn…

Although many programmers are (thanks to the rising AI/ML hype) interested in the stuff that libraries like Neanderthal provide, most of them are frustrated by the apparent high barrier to entry due to the heavy reliance on math-heavy theory.

I'll be direct: despite what various blogs say, you can't cheat here. Long time ago, there was a popular line of computer books titled "Learn X in 21 days". Then it became popular to give them titles like "Learn Y in 21 hours". Nowadays, the thing in vogue is a "A 5 minute blog post about Z". You can learn to blindly call some API quickly, but I can bet that you can not learn to use any of the libraries in this field by reading 2, or 5 blog posts. You have to learn at least the basics of linear algebra, and the more advanced parts depending on the need, to be able to enter the ML/DL/DA field and do things effectively.

There's good news though: Neanderthal can help you tremendously on this path. It's API is designed to do automatically as many things as possible, while still giving you the full control to be able to achieve the maximum performance. It's also designed to have a clear and logical correspondence to the math-y theoretical stuff from the textbooks.

And, I have a series of blog posts on this blog that show you how to connect the dots from the textbooks to the top-performance code. I already have several more written and queued to be published, and even more are currently brewing.

This requires lots of time, so, you can help me by sharing more of what you do in this area. Write some beginner's tutorials so I could concentrate on writing about more advanced stuff instead. Tell people about how you use Clojure for high performance stuff, so I can spend more time on adding features. Do more comparisons with other tools, so more users get to know about our great platform that (almost) nobody knows about. Hey, find some bugs, and write tests that demonstrate them, so I can spend more time fixing those, instead of hunting them myself.

By the way, did you know that at this point Neanderthal has 3773 hand written tests? Yes, I wrote and rewrote those many times, by hand - that's one of the reasons why Neanderthal's API is so ergonomic, polished, and comprehensive. These tests are also a good way to learn about specific functionality of Neanderthal when the need arise. Don't miss them.

So, I do not have illusion that this will make everyone an AI expert, but I am sure that it makes Clojure a fairly good platform to start the journey for many bright people. Including you, of course!

Permalink

Clojure Numerics, Part 3 - Special Linear Systems and Cholesky Factorization

In the last article we have learned to solve general linear systems, assuming that the matrix of coefficients is square, dense, and unstructured. We have also seen how computing the solution is much faster and easier when we know that the matrix is triangular. These are pretty general assumptions, so we are able to solve any well-defined system. We now explore how additional knowledge about the system can be applied to make it faster. The properties that we are looking for are symmetry, definiteness, and bandedness.

Before I continue, a few reminders:

The namespaces we'll use:

(require '[uncomplicate.neanderthal
           [native :refer [dge dsy dsb dsp]]
           [linalg :refer [trf trs]]])

Symmetry

Recall from the last article that a general square system is solved by first doing LU factorization to destructure the system to two triangular forms (L and U), and then solve those two triangular systems (\(Ly=b\), then \(Ux=y\)). To fight with inherent instability of floating-point computations (as a non-perfect approximation of "ideal" real numbers), the algorithm does pivoting and row interchange. It is not a burden only because of the additional stuff (pivots) to carry around, but also because it requires additional memory reading and writing (we remember that this is more expensive that mere FLOPS). That's why we would like to use computational shortcuts that do less pivoting, or no pivoting at all.

One such shortcut is that if \(A\) is symmetric and has an LU factorization, \(U\) is a row scaling of \(L^T\). More precisely, \(A=LDL^T\) where \(D\) is a diagonal matrix. Since the system is symmetrical, \(A=UDU^T\), too. From theoretical perspective, it might not make much difference, but in implementation it is important. If the symmetric matrix data is stored in the lower triangle, that triangle will be (re)reused for storing the factorization. Likewise for upper symmetric matrix. Those two are equal but are obviously not the same (= vs identical? in Clojure).

It is simpler to do with Neanderthal than it looks from the previous description. Practically, there is nothing required of you, but to create a symmetric matrix. When you call the usual factorization and solver functions, this will be taken care of automatically.

(let [a (dsy 3 [3
                5 3
                -2 2 0]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       3.00    *       *
   →       5.00    3.00    *
   →      -2.00    2.00    0.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.LUFactorization(:lu #RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       3.00    *       *
   →       5.00    3.00    *
   →       1.00   -1.00    4.00
   ┗                               ┛
  :ipiv #IntegerBlockVector(int  n:3  offset: 0  stride:1)(-2 -2 3)  :master true  :fresh #atom(true 0x6a306829)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -0.53    0.50
   →       0.47    0.00
   →       0.88   -1.25
   ┗                       ┛
)

As you can see, it's all the same as for general dense matrices, with Neanderthal taking care to preserve the optimized symmetrical structure of :lu.

Positive definite systems and Cholesky factorization

The previous shortcut is cute, but nothing to write home about. Fortunately, there is a much better optimization available for a special subset of symmetric matrices - those that are positive definite.

A matrix \(A\in{R^{n\times{n}}}\) is positive definite if \(x^TAx>0\) for all nonzero \(x\in{R^n}\), positive semidefinite if \(x^TAx\geq{0}\), and positive indefinite if there are \(x_1,x_2\in{R^n}\) such that \((x_1^TAx_1)(x_2^TAx_2)<0\). Huh? So, is my system positive definite? How would I know that?

Before I show you that, let me tell you that the reason why symmetric positive definite matrices are handy is that for them, there is a special factorization available - Cholesky factorization - which preserves symmetry and definiteness. Now, it turns out that discovering whether a matrix is positive definite is not easy. That's why I will not try to explain here how to do that, nor it would help you in practical work. The important (and fortunate) thing is that you don't have to care; Neanderthal will determine that automatically, and return the Cholesky factorization if possible. If not, it will return the \(LDL^T\) (or \(UDU^T\))!

Let's see how to do this in Clojure:

(let [a (dsy 3 [1
                1 2
                1 2 3]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    *       *
   →       1.00    2.00    *
   →       1.00    2.00    3.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealUploMatrix(double  type:sy  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    *       *
   →       1.00    1.00    *
   →       1.00    1.00    1.00
   ┗                               ┛
  :master true  :fresh #atom(true 0x8cf263e)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -2.50    8.00
   →       0.00   -3.00
   →       1.50   -1.00
   ┗                       ┛
)

Notice how the code is virtually the same as in the previous example. The only thing that is different is the data. In this example, Neanderthal could do the Cholesky factorization, instead of more expensive LU with symmetric pivoting. Later it adapted the solver to use the available factorization for solving the linear equation, but everything went automatically.

In fact, Cholesky factorization is a variant of LU, just like \(LDL^T\) is. The difference is that L and U in Cholesky are \(G\) and \(G^T\). Notice: L is G, U is a transpose of G, and there is no need for the D in the middle. Also, no pivoting is necessary, which makes Cholesky quite nice; compact and efficient.

There are more subtle controls related to this in Neanderthal; look at the treasure trove of API documentation and tests. Writing these blog posts takes time and energy, and now I don't feel like taking too much time delving more into details related to this. :)

Of course, sv is also available, as well as destructive variants of the functions I've shown, and with them, too, you can rely on Neanderthal to select the appropriate algorithm automatically. I didn't show those here because that is used virtually in the same way as in the examples I've shown in previous articles. Let it be a nice easy exercise to try them on your own.

Banded Systems

Are there more optimizations? Sure! One of them is for banded systems. A matrix is banded when all of its zero elements are concentrated in a narrow (relative to the whole matrix) band around the diagonal. Of course, the implementation does not care how narrow: even completely dense matrix could be stored as banded in a sense that the band covers the whole matrix. However, for the implementation to have more performance instead of much less, the band should not be too obese. For example, if the matrix is in \(R^{100\times{100}}\) a band 5 diagonals wide is obviously exploitable, while the band 50 elements wide is probably not (but test that yourself for your use cases).

Now, the cool thing is that the stuff I've shown you for the general, triangular, and symmetric matrices also work with banded matrices:

  • The most desirable case is triangular banded matrix, since it does not need to be factorized at all
  • Then, if the matrix is symmetric banded, Neanderthal offers Cholesky if possible, with the LU fallback.
  • And, for general banded matrices, it does the banded LU.

The best of all, it's all automatic (I'm only showing the symmetric case here):

(let [a (dsb 9 3 [1.0 1.0 1.0 1.0
                  2.0 2.0 2.0 1.0
                  3.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0 1.0
                  4.0 3.0 2.0
                  4.0 3.0
                  4.0])
      fact (trf a)
      b (dge 9 3 [4.0  0.0  1.0
                  8.0  0.0  1.0
                  12.0  0.0  0.0
                  16.0  0.0  1.0
                  16.0  0.0  0.0
                  16.0  0.0 -1.0
                  15.0  1.0  0.0
                  13.0  1.0 -2.0
                  10.0  2.0 -3.0]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealBandedMatrix(double  type:sb  mxn:9x9  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ↓       ↓       ─
   ↘       1.00    2.00    3.00    4.00    4.00    ⋯
   ↘       1.00    2.00    3.00    3.00    3.00
   ↘       1.00    2.00    2.00    2.00    2.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ┗                                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealBandedMatrix(double  type:sb  mxn:9x9  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ↓       ↓       ─
   ↘       1.00    1.00    1.00    1.00    1.00    ⋯
   ↘       1.00    1.00    1.00    1.00    1.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ↘       1.00    1.00    1.00    1.00    1.00
   ┗                                               ┛
  :master true  :fresh #atom(true 0x39b524ea)) #RealGEMatrix(double  mxn:9x3  layout:column  offset:0)
   ▥       ↓       ↓       ↓       ┓
   →       1.00    1.00    1.00
   →       1.00   -1.00    0.00
   →       ⁙       ⁙       ⁙
   →       1.00   -1.00    0.00
   →       1.00    1.00   -1.00
   ┗                               ┛
)

I created a banded symmetric matrix a, with dimensions \(9\times 9\). The band width is the main diagonal and 3 sub-diagonals. When it comes to storage, it means than instead of storing all 81 elements, only \(9+8+7+6=30\) non-zero elements are stored (even though the band was not particularly thin). When Neanderthal prints the matrix, it prints the diagonals horizontally (to avoid printing a bunch of zero entries). To avoid confusion, notice how Neanderthal prints the ↘ symbol for the printed rows and ↓ for columns to indicate that (in the case of column-major matrices) diagonals are printed horizontally, and columns vertically. Also note how this particular system has been positive definite, and we get a nice Cholesky, which preserves the band!

Packed matrices

As a bonus, let me mention that Neanderthal supports packed dense storage, which can come handy when the memory is scarce. If we work with dense symmetric or triangular matrices that can not be compressed with band because they do not have many zero elements, we can still save half the space by only storing lower or upper half and not storing the upper/lower part that is not accessed.

(let [a (dsp 3 [1
                1 2
                1 2 3]
             {:layout :row :uplo :lower})
      fact (trf a)
      b (dge 3 2 [-1 4
                  0.5 0
                  2 -1]
             {:layout :row})]
  [a fact (trs fact b)])
'(#RealPackedMatrix(double  type:sp  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    .       .
   →       1.00    2.00    .
   →       1.00    2.00    3.00
   ┗                               ┛
 #uncomplicate.neanderthal.internal.common.PivotlessLUFactorization(:lu #RealPackedMatrix(double  type:sp  mxn:3x3  layout:row  offset:0)
   ▤       ↓       ↓       ↓       ┓
   →       1.00    .       .
   →       1.00    1.00    .
   →       1.00    1.00    1.00
   ┗                               ┛
  :master true  :fresh #atom(true 0x65376e65)) #RealGEMatrix(double  mxn:3x2  layout:row  offset:0)
   ▤       ↓       ↓       ┓
   →      -2.50    8.00
   →       0.00   -3.00
   →       1.50   -1.00
   ┗                       ┛
)

Hey, this example is virtually the same as when we used dense symmetric matrix! That's right - Neanderthal can sort these kind of things without bothering you! Now, how cool is that? I don't know, but it is certainly very useful…

Use packed storage with caution, though: it saves only half the space, while many important operations such as matrix multiplication are noticeably slower than when working with straight dense triangular or symmetric matrices. Some operations can be faster, so YMMV, and experiment with the use cases that you are interested in.

(Tri)diagonal storage

Everything described by now can be compacted and exploited even more. As an exercise, please look at diagonal (gd), tridiagonal (gt), diagonally dominant tridiagonal (dt), and symmetric tridiagonal (st) matrices, and try them out with linear solvers. Yes, they are also supported…

What to expect next

We've taken a look at solving dense, banded and packed systems described with general rectangular matrices, symmetric matrices, and triangular matrices. Now we can quickly and easily solve almost any kind of linear system that has unique solution. Hey, but what if our system has less equations than there are unknowns, or if we have too many equations? Stay tuned, since I'll discuss these topics soon. Somewhere in the middle, I'll probably squeeze in a post where I'll explain the details of those storage and matrix types, and how to read these symbols when matrices are printed (although I think it is intuitive enough that you've probably already picked the most important details up).

Until then, create a learning playground project, include Neanderthal, and have a happy hacking day!

Permalink

Copyright © 2009, Planet Clojure. No rights reserved.
Planet Clojure is maintained by Baishamapayan Ghose.
Clojure and the Clojure logo are Copyright © 2008-2009, Rich Hickey.
Theme by Brajeshwar.