Modelling a possible Gene Drive in Mosquitoes

A while ago I told a fauna-nerd about CRISPR/Cas9 and the potential it has - for genetic modification as well as for diagnostics and similar. I have previously posted articles about CRISPR on Ich ahne Zusammenhänge, a blog I co-author, but I got into a little argument with her about the efficiency and it led me down an interesting path and I wanted to write about it in more detail. Before I start, a little disclaimer: I am by no means an expert on any of this, so please take anything below with a grain of salt. If you notice any errors or have suggestions etc, I am always happy about feedback!

First, some context: with traditional techniques it was already possible to create a single mosquito that is unable to carry the parasite that causes Malaria. But since the parasite has no detrimental effect on the mosquito there would be no evolutionary pressure that would preferable select this particular version of the gene (aka this allele) of our “mutant” mosquito. Since there are so many “wild type” mosquitoes, you would need to outnumber the wild type mosquitoes to have a decent chance to get rid of the wild type trait and it is unrealistic to breed so many mutant mosquitoes.

The really interesting thing about CRISPR is that the tool that performs the genetic modification can itself be coded as DNA. Thus, if you create a mosquito egg cell that has the instructions to build CRISPR/Cas9 with the right targeting and replacement sequence, this cell will modify its own genome (both copies of the target chromosomes) and all the cells that divide from it would have the same gene-altering code. This also means that, if such a mutant mosquito were to mate with a wild type mosquito, the usual laws of Mendellian inheritance would no longer apply - instead of the usual 50:50 chance of getting the mutant allele or the wild type one, almost 100% of the children would get the mutant allele (in practice the mechanisms seems to break down every now and then and initial results suggest that the effectiveness may be around 95-99%). This concept is called a Gene Drive, and as far as I know CRISPR is the only general purpose technique we know of that makes such a thing possible.

The argument that started this article was over the number of generations required for the mutant allele to become expressed in more than half of a local population if you only start with one mosquito. My guess was that this should happen within 20 generations, while she thought it would take significantly more.

I tried to google this but found no good estimates for a very low number of initial mosquitoes. The best I could find where estimates that started with 10% of the population.

So I started thinking about how to arrive at a ballpark estimate. I quickly realized that my single mosquito would very likely die before it would be able to mate. The German saying “Die sterben wie die Fliegen” (they die like flies), may have something to it after all. But if you increase the number to a still very low 200 initial mutant individuals, the likelihood for at least a few successful matings starts to be reasonable.

I ended up modeling the whole idea in a spread sheet. I know almost nothing about either modeling populations of insects or the genetic particulars of actual CRISPR methods as they are applied, so all of this is very, very rough. But it’s still some interesting results. I wanted to see how the numbers work out for really big “local” populations. I have no idea if these numbers are anywhere close to reasonable, and there seems to be remarkably little data on the number of mosquitoes in a given area. So I came up with 4 Billion mosquitoes, which I wildly guessed might be the number that live on a small island in the middle of the wet season in a mosquito infested area (so that you can meaningfully talk about a single “population” that is not constantly mixing with mosquitoes from the surroundings).

My modeling has several dramatic weaknesses, e.g. it assumes perfectly random matings without regard to geographic proximity etc which is clearly unrealistic. Another thing is that the numeric factors for successful matings & the fraction of eggs that develop into successfully reproducing adults were tweaked to arrive at stable populations whereas real mosquito populations surely vary wildly in size depending on external factors.

I originally wanted to use Guesstimate to enter ranges of plausible values and have it estimate the final ranges via Monte Carlo simulation (drawing random numbers with given distributions for every ranged variable and calculating the spreadsheet formulas with those), but Guesstimate is still too cumbersome to use for repetitive calculations, so I had to do it with vanilla google docs and with single precise values for the individual factors.

The result for those particular numbers with my guessed parameters was that it would take about 26 generations for those 200 initial mutants to spread their malaria resistance allele to be expressed in the majority of the population (and pretty much totally replace the wild type mosquitoes 2 generations later).

When I was done with this, I was pretty appalled at how hard it is to audit spreadsheets. No wonder Reinhart/Rogoff made a really embarrassing Excel error back in 2010… Since I have recently started to work with the programming languages Haskell and Elm, I wondered if it would be easier for a non-programmer to understand and verify source code in Elm than it would be to verify excel formulas. I was also looking for something that can be written and run on the web without installing anything. So I wrote the same modeling again in Elm and published it as a gist on github. To run it, copy the code and paste it into elm-lang.org/try. The results are slightly different because I made the Elm code a little more precise (it doesn’t allow for fractions of a mosquito to mate ;) ).

There are a number of other routes to try. My requirements are that it has to work in the web without installing anything (so that others can easily play with different parameters) and that it should be easy to audit. In my humble opinion, this second requirement excludes vanilla Javascript - a language that, among other things, allows you to redefine its “undefined” value is just too hard to audit.

Purescript would have been an interesting choice since they had a self-hosted compiler that was compiled into JS at one point, but it was so much slower than the native one they gave up on it. So like the elm version they now have a Try Purescript website with a compiler running on a hosted machine.

The next thing I actually want to try is wolfram alpha which has a slightly nastier syntax but brings many high level functions that might allow me to write a version with Monte Carlo sampling from probability distributions for all parameters in an understandable way without writing too much code. I am also considering writing the code to do a Monte Carlo simulation in Elm but since “Try Elm” doesn’t let you use any additional libraries, I’d have to write a lot of supporting code to be able to perform the simulation.

I’ll blog again about how well Wolfram Alpha performed if it turns out to be a viable alternative. If you have any comments on my methods or ideas for good other ways to simulate this, please let me know in the comments on on Twitter!