Signals in Elm

12 February 2016

(Disclaimer: This post was written about Elm 0.16. Signals, the mechanism described in this post, have since been deprecated. The concepts in this post may still help understand how the Elm Architecture works internally, but the actual code has changed significantly)

I have rewritten a webapp from React/Redux to Elm over the last few weeks and am really enjoying the language so far. Elm is a compile to Javascript, purely functional language that was built specifically to create web UIs with it. It is inspired by several functional programming languages, especially Haskell and OCaml. I have participated in the Elm google group quite a bit lately and I noticed that even though the Elm docs are really good, there are some concepts that are a bit hard to understand and to differentiate from each other. I am therefore starting a mini-series of posts about different concepts in Elm. This first one is about Signals - and why you don’t see them much in many smaller Elm programs even though they are always there.

These blog posts assume that you already know a little bit about Elm, e.g. you have gone through the great primer “Road to Elm” by Lambdacat and then studied the Elm Architecture Tutorials a little. OTOH, if you already use Tasks and Ports extensively you will find most of this a bit boring :). If you already know about Signals, you may want to jump ahead to the next post about tasks and effects or the one after that about ports

What are Signals?

Elm uses a very nice unidirectional architecture for the flow of:

displayed website ➞ user actions ➞ new application state ➞ displayed website

All the code you write in a typical Elm program comes together in just two pure functions: update and view. Update takes a user action and the previous application state and creates a new application state, and view takes an application state and renders it into a virtual dom representation (that then gets efficiently displayed by batching diffs to the DOM a la React). For more background on unidirectional UIs in general see André Staltz’ excellent blog post Unidirectional User Interface Architectures

One of the key concepts in Elm is that of a Signal. A Signal is a value that can change over time. One of the conceptually simplest signals is the current time - every second the signal “fires” with the new time value (seconds passed since epoch or whatever). Another example could be the coordinates of the mouse cursor in the current window. When the mouse is still, no values are fired by the Signal - but whenever the user moves the mouse, new values are sent. (This is actually one of the examples at elm-lang.org/examples). Signals are a bit similar to EventEmmitters or Observables in Javascript - the key difference is that they exist for the entire lifetime of the program and always have an initial value.

Signals don’t provide any access to their history - they only provide the current value. But even a simple counter example needs to track the number of clicks that happened so far. Since Elm is a pure language with no mutable state, how do you keep track of the current click-count? We’ll come back to that question, but first let’s look at how StartApp.Simple works that you probably know from the Elm Architecture Tutorials.

StartApp hides a bit of wiring from you, but I think it helps to understand how StartApp.start does its magic. What the start function does is hook up a Mailbox (something where messages go when you send them to the address e.g. in an onClick handler) so that they lead to a new html emmited to main. This is the heart of the unidirectional UI approach. The user clicks a button, this leads to a message being sent to the mailbox. The mailbox has a Signal that fires whenever a message is sent to it. This Signal is of your applications Action type, i.e. it fires Action values. Eventually this leads to a full cycle through your update/view functions and thus to a new version of your Virtual Dom.

Let’s look at the intermediary types more closely: we start with a Signal of Actions. This then gets turned into a Signal of Models (so everytime a new Action is fired this action value is run through update together with the last model state to get the new model state). This finally gets turned into a Signal of Html (whenever the Signal of Model fires, we run it through the view function to arrive at a new Html to display). This is then handed to main so that the Elm runtime can display it for us.

Note that when I wrote about the update function, I said “together with the last model state”. This brings us back to our question from above - how can you work with history or state in Elm? The answer is in the function called Signal.foldp (“fold from the past”). If you aren’t familiar with folds yet, they are another name for reduce functions (as in map/reduce). It logically reduces all the values from the entire history with a given function and returns the value - in our case it uses the inital Model and reduces all Actions that were sent to our program to arrive at a current Model. (the implementation actually just caches the last current value and works from there of course).

At this point, if you want to really dive into it, let’s look at how StartApp.Simple is actually implemented (I added comments and type annotation to every named value)

start : Config model action -> Signal Html
start config =
  let
    -- create the main Mailbox. It is of type "Mailbox (Maybe action)" and is
    -- initialized with an "empty" value of Nothing
    actions : Mailbox (Maybe action)
    actions =
      Signal.mailbox Nothing

    -- here the address is set up. Since the Mailbox is of Maybe action,
    -- everything that is sent to address is "wrapped" in the Just type
    -- constructor and forwarded to the Mailbox
    address : Address action
    address =
      Signal.forwardTo actions.address Just

    -- This local version of update just wraps config.update to
    -- take care of the Maybe part of the action that will be
    -- processed (so that the update function the user provides)
    -- can simply operate as Action -> Model -> Model)
    update : Maybe action -> model -> model
    update maybeAction model =
      case maybeAction of
        Just action ->
            config.update action model

        Nothing ->
            Debug.crash "This should never happen."

    -- set up a signal of new models that foldp-s over
    -- the actions signal. This is the central piece
    -- that makes the elm architecture work the way it does.
    -- The update function will process one Action and
    -- the old Model state to the new model State, the
    -- Signal that triggers it all is the Mailbox' Signal
    -- we set up at the top
    model : Signal model
    model =
      Signal.foldp update config.model actions.signal
  in
    -- Finally, map over it with the view function. This
    -- turns the Signal of Models into a Signal of Htmls
    -- that can be rendered    
    Signal.map (config.view address) model

StartApp.Simple is quite clever in how it uses a Signal under the hood but as a user who just wants to write some interactive web app you never need to deal with Signals directly and can just supply update and view. It all works fine until you need to message back and forth with native javascript. For that, you will need to use Ports, and understanding Signals first will be very helpful for that. The other thing StartApp.Simple does not let you do is perform long running operations, like e.g. send XHR requests.

The next post in the series deals with Tasks and Effects, while the final post is all about ports. I hope you enjoyed the article and if you have any feedback please let me know!

Slides for my talk about Elm at Berlin.js

11 February 2016

Last month I had the oppertunity to talk about Elm at the very friendly Berlin.js Javascript usergroup. If you don’t know about elm, it’s a pretty new, small language designed to create websites. The language is purely functional, nicely small, and modelled on Haskell and OCaml. Have a look at the slides if you are interested and follow the links on the third-to-last slide for more information! (If you use the arrow keys for navigating don’t miss the slides that extend downwards - or just use the spacebar to see all).

Modelling a possible Gene Drive in Mosquitoes

06 February 2016

A while ago I told a fauna-nerd about CRISPR/Cas9 and the potential it has - for genetic modification as well as for diagnostics and similar. I have previously posted articles about CRISPR on Ich ahne Zusammenhänge, a blog I co-author, but I got into a little argument with her about the efficiency and it led me down an interesting path and I wanted to write about it in more detail. Before I start, a little disclaimer: I am by no means an expert on any of this, so please take anything below with a grain of salt. If you notice any errors or have suggestions etc, I am always happy about feedback!

First, some context: with traditional techniques it was already possible to create a single mosquito that is unable to carry the parasite that causes Malaria. But since the parasite has no detrimental effect on the mosquito there would be no evolutionary pressure that would preferable select this particular version of the gene (aka this allele) of our “mutant” mosquito. Since there are so many “wild type” mosquitoes, you would need to outnumber the wild type mosquitoes to have a decent chance to get rid of the wild type trait and it is unrealistic to breed so many mutant mosquitoes.

The really interesting thing about CRISPR is that the tool that performs the genetic modification can itself be coded as DNA. Thus, if you create a mosquito egg cell that has the instructions to build CRISPR/Cas9 with the right targeting and replacement sequence, this cell will modify its own genome (both copies of the target chromosomes) and all the cells that divide from it would have the same gene-altering code. This also means that, if such a mutant mosquito were to mate with a wild type mosquito, the usual laws of Mendellian inheritance would no longer apply - instead of the usual 50:50 chance of getting the mutant allele or the wild type one, almost 100% of the children would get the mutant allele (in practice the mechanisms seems to break down every now and then and initial results suggest that the effectiveness may be around 95-99%). This concept is called a Gene Drive, and as far as I know CRISPR is the only general purpose technique we know of that makes such a thing possible.

The argument that started this article was over the number of generations required for the mutant allele to become expressed in more than half of a local population if you only start with one mosquito. My guess was that this should happen within 20 generations, while she thought it would take significantly more.

I tried to google this but found no good estimates for a very low number of initial mosquitoes. The best I could find where estimates that started with 10% of the population.

So I started thinking about how to arrive at a ballpark estimate. I quickly realized that my single mosquito would very likely die before it would be able to mate. The German saying “Die sterben wie die Fliegen” (they die like flies), may have something to it after all. But if you increase the number to a still very low 200 initial mutant individuals, the likelihood for at least a few successful matings starts to be reasonable.

I ended up modeling the whole idea in a spread sheet. I know almost nothing about either modeling populations of insects or the genetic particulars of actual CRISPR methods as they are applied, so all of this is very, very rough. But it’s still some interesting results. I wanted to see how the numbers work out for really big “local” populations. I have no idea if these numbers are anywhere close to reasonable, and there seems to be remarkably little data on the number of mosquitoes in a given area. So I came up with 4 Billion mosquitoes, which I wildly guessed might be the number that live on a small island in the middle of the wet season in a mosquito infested area (so that you can meaningfully talk about a single “population” that is not constantly mixing with mosquitoes from the surroundings).

My modeling has several dramatic weaknesses, e.g. it assumes perfectly random matings without regard to geographic proximity etc which is clearly unrealistic. Another thing is that the numeric factors for successful matings & the fraction of eggs that develop into successfully reproducing adults were tweaked to arrive at stable populations whereas real mosquito populations surely vary wildly in size depending on external factors.

I originally wanted to use Guesstimate to enter ranges of plausible values and have it estimate the final ranges via Monte Carlo simulation (drawing random numbers with given distributions for every ranged variable and calculating the spreadsheet formulas with those), but Guesstimate is still too cumbersome to use for repetitive calculations, so I had to do it with vanilla google docs and with single precise values for the individual factors.

The result for those particular numbers with my guessed parameters was that it would take about 26 generations for those 200 initial mutants to spread their malaria resistance allele to be expressed in the majority of the population (and pretty much totally replace the wild type mosquitoes 2 generations later).

When I was done with this, I was pretty appalled at how hard it is to audit spreadsheets. No wonder Reinhart/Rogoff made a really embarrassing Excel error back in 2010… Since I have recently started to work with the programming languages Haskell and Elm, I wondered if it would be easier for a non-programmer to understand and verify source code in Elm than it would be to verify excel formulas. I was also looking for something that can be written and run on the web without installing anything. So I wrote the same modeling again in Elm and published it as a gist on github. To run it, copy the code and paste it into elm-lang.org/try. The results are slightly different because I made the Elm code a little more precise (it doesn’t allow for fractions of a mosquito to mate ;) ).

There are a number of other routes to try. My requirements are that it has to work in the web without installing anything (so that others can easily play with different parameters) and that it should be easy to audit. In my humble opinion, this second requirement excludes vanilla Javascript - a language that, among other things, allows you to redefine its “undefined” value is just too hard to audit.

Purescript would have been an interesting choice since they had a self-hosted compiler that was compiled into JS at one point, but it was so much slower than the native one they gave up on it. So like the elm version they now have a Try Purescript website with a compiler running on a hosted machine.

The next thing I actually want to try is wolfram alpha which has a slightly nastier syntax but brings many high level functions that might allow me to write a version with Monte Carlo sampling from probability distributions for all parameters in an understandable way without writing too much code. I am also considering writing the code to do a Monte Carlo simulation in Elm but since “Try Elm” doesn’t let you use any additional libraries, I’d have to write a lot of supporting code to be able to perform the simulation.

I’ll blog again about how well Wolfram Alpha performed if it turns out to be a viable alternative. If you have any comments on my methods or ideas for good other ways to simulate this, please let me know in the comments on on Twitter!

Self documenting data manipulation with R-Markdown

12 December 2015

The company I worked for over the last few years provides a lot of data cleaning/data manipulation services, mostly with proprietary tools that I and another developer created over the last few years. One of the things I introduced before I left was a bridge between the proprietary datasets that are used inside that company and the R project. My main motivation for this was to enable self-documenting workflows via R-Markdown and in this blog post I want to talk about the advantages of this approach.

R Markdown is a syntax definition and a set of modules for R that make it very straightforward to write normal text documents that embed R code. When compiling the text files, the R code is executed and the result embedded into the compiled document. These results can just be the textual output of R functions (like summary() that describes a couple of important metadata of a data set) or even graphics.

As the name suggests, R Markdown uses the markdown syntax for formatting text, so you would write something between stars to make it bold etc. Markdown is pretty neat in that it is both easy to read as plain text but also easily compiled to html to be viewed with actual formatting in a browser.

It’s probably easier to understand with an example, so here is a simplified version of what this looks like:

This is a sample r-markdown script that plots Age vs Income as a 
Hexbin plot. This text here is the natural language part that can 
use markdown to format the text, e.b. to make things **bold**.

```{r Income vs Age - Hexbin}
# The backticks in the line above started an R code block. 
# This is a comment inside the R block. We now load the hexbin 
# library and plot the data2008 dataset (the code for loading 
# the dataset was ommited here)

library(hexbin)
bin <- hexbin(data2008[, 1], data2008[, 2], xbins = 50, xlab = "Alter", ylab = "Einkommen")
plot(bin)
```

And this is what the compiled html looks like (embedded here as a screenshot)

The great thing about inlining R code in a markdown document in this way is that you can create a new workflow that is much more maintainable because the focus shifts to documenting the intention. Instead of focusing on writing R code to get a job done and then documenting it a little with some comments or as text in a separate document, the analyst starts the work by describing, in plain text, what it is she wants to do. She then embeds the code to do the transformation, and can even generate graphs that show the data before and after.

This idea to document changes by embedding graphs was my original trigger for writing the bridge code. I had implemented the weighting code in our proprietary tool but the textual output describing the changes in the weights was a bit terse. It was clear that a graphical representation would be easier to understand quickly, but introducing a rich graph library into our proprietary DSL would have been a major undertaking. By making it fast and easy to get our data sets into R and back out again, we quickly got a way to create graphs plus it enabled the self-documenting workflow described above.

Another big plus is that since all transformations are described in natural language as well as in code, auditing data manipulations becomes a lot easier and quicker. I can thus wholeheartedly recommend this workflow to everyone who works with data for a living.

Laws - the source code of society

25 May 2014

Today the citizens of EU will elect a new parliament and this seemed a good opportunity to write down some of my thoughts on lawmaking. As the title suggests, I think that law texts are very similar to source code. Of course, source code is a lot stricter insofar as it defines exactly what the computer will do. Laws on the other hand describe general rules for behaviour as well as the punishment for violations of those - ultimately though, both are expressed as text. Yet where programmers have developed sophisticated tools to work with source code, laws are still developed in bizarre workflows that necessitate a huge support staff. In this post I want to describe one set of tools used by programmers to work on texts; how I think they could be useful for lawmaking; and what our society would gain if our lawmakers would adopt them.

When non-programmers write, they often realize that it would be beneficial to save old versions of their texts. This leads to document file names with numbers attached (“Important text 23.doc”) and then the infamous “final”, “final final”, “really final” etcetera progression. Programmers instead rely on a set of tools that are known as Distributed Version Control Systems (DVCS). The most famous of these is probably Git, which is used in many open source efforts. What these tools do is manage the history of the text documents registered with them, and allowing easy sharing and merging of changes.

In practice, after changing a couple of lines in one or more documents these changes are recorded as one “Changeset”. These changesets can be displayed as a timeline and one can go back to the state of the documents at any point in its history.

A sample of the timeline of several changes in a DVCS

This in itself is alreday clearly useful, but what really makes DVCS magnificent tools is the ability to manage not just a simple linear progression of changes but different “Branches”. This allows several people to make changes to the text, share their changesets and let the system automatically combine their changes into a new version.

Because changes by many people creating many branches can become confusing, there are some great tools to visualize these changes, as well as complex workflows that allow others to review and authorize changes via their cryptographic signature.

So how would these tools be useful in creating law texts? The main benefit would be to clearly document which politician introduced which line into a law, and which edits they made. Others in the working group could create branches with their favoured wording, and these could then be combined into the final version that is voted on by parliament.

One very useful tool is the blame tool (sometimes also called praise or annotate), which displays a document in such a way that each line is tagged with the person who last changed it. I think that it could be quite revealing to see who changed what in our laws, a process that at present would be very time consuming.

The website Abgeordnetenwatch already tracks the voting behaviour of the members of the German parliament, and it would be a great extension of this effort if the genesis of all law texts were plain to see for everyone as well. Bundesgit already puts the final German law texts into a Git database - but because these are only the final texts and Git is only accessed by a single person instead of the real authors, the real power of a DVCS can’t be used. For things like the blame tool to work, all the small changes during the draft stage in the parliament’s working group would have to be made inside a DVCS by their respective authors.

I am sure that there are many more potential improvements to the process of lawmaking that would be possible if lawmakers used DVCS tools. But the main and immediate advantage would be an increase in transparency, which in the end is what democracy is all about. Laws are the source code of our societies. Let’s make sure they are made with the best tools available.