# Easy derivation of Kimura's approximation for the probability of fixation of a mutation

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Kimura's approximation for the probability of fixation of a mutation under selection finds recurrent use in population genetics models till date. I am trying to understand the mathematical basis of this equation but none of the textbooks or online resources that I have checked, provide an easy derivation of this approximation but rather simply cite Kimura's 1962 paper.

\$\$P_ ext{fix} approx frac{1-e^{-4Nsp} }{1-e^{-4Ns}} qquad (1)\$\$

So, I was reading the original paper but the provided derivation doesn't appear clear to me.

## Details

Kimura starts with definition of probability of change in allele frequency as:

\$\$u(p,t+delta t) = int f(p+delta p; delta t) u(p+delta p,t) d(delta p) qquad (2)\$\$

where (quoted exactly)

• \$u(p,t)\$ is the probability that an allele will be fixed in a time interval \$t\$ given that it's initial frequency is \$p\$.
• \$f(p+delta p; delta t)\$ is the probability density of the change from \$p\$ to \$p+delta p\$

Then he uses Taylor series approximation to obtain an equation of this form:

\$\$frac{partial u(p,t)}{partial t}= frac{V}{2}frac{partial ^2u}{partial p^2}+Mfrac{partial u}{partial p} qquad (3)\$\$

He defines \$M\$ and \$V\$ as mean and variance of change of \$p\$ per generation. These are formally defined as:

\$\$M=lim_{delta t o 0} frac{1}{delta t}int (delta p). f(p+delta p; delta t). d(delta p)\$\$

\$\$V=lim_{delta t o 0} frac{1}{delta t}int (delta p)^2. f(p+delta p; delta t). d(delta p)\$\$

(\$V\$ actually should be just the second moment as per the mathematical definition and not variance)

Then he solves equation 3 at steady state with boundary conditions \$u(0,t)=0\$ and \$u(1,t)=1\$ to obtain this:

\$\$u(p)=frac{displaystyleint_0 ^p G(x) dx}{displaystyleint_0 ^1 G(x) dx} qquad (4)\$\$

where:

\$\$G(x)=expleft(-int frac{2M}{V}dx ight)\$\$

I understood the derivation till this point.

Then he just puts:

\$\$M=sx(1-x)\$\$ \$\$V=x(1-x)/2N\$\$

and obtains equation 1.

## In short

Is there an easy derivation for equation 1?
If not, can someone explain me how M and V were approximated as above?

Presumably you've solved this, but, in case not, it's because the PDE is a Kolmogorov backward equation, so the first and second order coefficients are the mean and variance of the underlying stochastic process being modeled.

In detail, consider a stochastic differential equation (which has a solution given by an Ito diffusion process): $$dp_t = mu(p_t,t) dt + sigma(p_t,t) dW_t$$ then the following system holds (under some conditions): $$-frac{partial}{partial t} u(p,t) = mu(p,t)frac{partial}{partial p}u(p,t) + frac{1}{2}sigma^2(p,t)frac{partial^2}{partial p^2} u(p,t)$$ where $$u$$ is the density of $$p$$ at $$t$$.

Note that the drift (infinitesimal mean) $$M=mu(p,t)$$ and diffusion coefficient (infinitesimal variance) $$V=sigma^2(p,t)$$ are as in the paper (except for the negative sign, which I assume is ignorable since he mostly seems to care only about the case when $$partial_t uapprox 0$$ anyway). Indeed, they are equivalently written: egin{align} mu(p,t) &= lim_{delta t ightarrow 0}frac{1}{delta t} mathbb{E}left[ p_{t + delta t} - p_t mid p_t=p ight] =: M sigma^2(p,t) &= lim_{delta t ightarrow 0}frac{1}{delta t} mathbb{E}left[ (p_{t + delta t} - p_t)^2 mid p_t=p ight] =: V end{align} as Kimura writes.

Note that a useful approximation of the transition density is given by: $$mathbb{P}[p_{t+delta t} mid p_t] approx mathcal{N}(p_{t+delta t}mid p_t + mu(p_t,t),delta t, sigma^2(p_t,t) ,delta t) ag{TD}$$

Ok, so everything above is just basic stochastic processes theory. Thus, if we have a stochastic model for the population dynamics, we can derive values for $$M$$ and $$V$$ from it (by computing its moments), and those will carry over to the backward Kolmogorov equation, upon which Kimura's work rests.

Here is where my ignorance of population dynamics shows. However, since Kimura mentions Fisher and Wright, I looked up the Wright-Fisher model. It seems like Kimura is using the diffusion process approximation of the Wright-Fisher model. This appears to be a well-studied and storied model that I cannot fully describe here; instead, I found the work by Tataru et al, Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data to be an excellent description of it, though I do not pretend to understand much of it.

What's important, however, is that the change in genes (transition density) can be described by a binomial distribution. This can be approximated by a normal distribution: $$mathbb{P}[p_{t+delta t} mid p_t] approx mathcal{N}(p_{t+delta t}mid p_t + a(p_t) delta t,, p_t (1-p_t) delta t )$$ using the standard approximation to the binomial. This then gives us a forward Kolomogorov equation (not backward) written: $$frac{partial}{partial t} u = - frac{partial}{partial p}left[a(p_t) u(p_t) ight] + frac{1}{2}frac{partial^2}{partial p^2} left[ p_t(1-p_t)u(p_t) ight]$$ This basically implies that $$V=p(1-p)$$.

( I noticed that another way to prove this is to notice that the Wright-Fisher approximate diffusion (without any selections, etc… so $$aequiv 0$$) has an infinitesimal generator given by: $$mathfrak{G} f(p) = p(1-p)partial_{tt} f(p) / 2$$. This immediately implies $$V=p(1-p)$$. But may be less straightforward to understand. )

Confusingly, however, the paper has changed timescales (variables), so that $$delta t leftarrow Delta t / (2N)$$, and then set $$delta t$$ to $$1$$ (probably so that they would not have to write $$2N$$ everywhere). If we undo this transform, we get $$mathbb{P}[p_{t+delta t} mid p_t] approx mathcal{N}(p_{t+delta t}mid p_t + a(p_t) delta t,, p_t (1-p_t) delta t/(2N) )$$ If you compare this to our approximate transition density above (equation (TD)), you will see that this implies: $$sigma^2 = V = p(1-p)/[2N]$$ as desired.

Now, what is the infinitesimal mean, i.e. $$a$$ or $$M$$? Clearly this depends on the selection model, since it controls how the "environment" deterministically affects the process. Kimura describes this as a "constant selection advantage" with coefficient $$s$$. The Tataru paper notes that the diffusion approximation to Wright-Fisher under genetic drift, mutation, and selection is given by: $$a(p) = - u p + xi(1-p) + 2N au p(1-p)[h-(1-2h)p]$$ If we (1) ignore mutation by setting $$u=xi=0$$, (2) remove allelic dominance effects by setting $$h=1/2$$, and (3) define $$s:= N au$$, we get: $$a(p) = s p(1-p) =: M$$ which of course we see by noting $$M=a(p)$$ matches $$mu$$ in the equation (TD) above. (Note that the $$2N$$ transform also occurred here, but it was hidden inside $$s$$).

Thus, we have derived where Kimura's $$M$$ and $$V$$ come from, albeit probably not in the simplest possible manner.

All that remains is to derive the (steady-state) equation for $$u$$. I guess I'll do it for completeness.

Ignoring the steady-state subscripts, we get: egin{align} G(x) &= expleft( -int frac{2M}{V} dx ight) = expleft( -int 4sN dx ight) = expleft( -4sNx ight) [0.15cm] u(p) &= frac{displaystyle int_0^p G(x) dx}{displaystyle int_0^1 G(x) dx} = frac{displaystyle frac{1}{4Ns}left[ expleft( -4sNx ight) ight]_0^p}{displaystyle frac{1}{4Ns}left[ expleft( -4sNx ight) ight]_0^1} = frac{displaystyle -left[ expleft( -4sNp ight) - 1 ight]}{displaystyle -left[ expleft( -4sN ight) -1 ight]} &= frac{1 - exp(-4Nsp)}{1 - exp(-4Ns)} end{align} as required.

Apologies for any errors. (I am neither a population dynamics modeler nor a mathematician, so please point out any problems).