Theorycrafting 101: Markov Chains

A few weeks ago, I was asked on Twitter whether I could provide the proof of the “proc buff uptime formula.” For those that aren’t aware, there’s a popular formula for calculating the uptime $U$ of a proc-based buff effect given the buff duration $D$, proc chance $p$, and proc trigger interval $\Delta t$. That formula is:

$\large U = 1-(1-p)^N,$

where $N=D/\Delta t$ is the number of proc triggers that occur during the duration of the buff.

In situations where $N$ becomes very large (and $p$ small), it is common to approximate this with Poisson statistics, giving

$\large U \approx 1-e^{-pN}.$

I had assumed that the derivation for this was easily available. After all, the result is widely used in theorycrafting circles. But some googling didn’t turn up a formal proof. The closest I found was a rough summary of the procedure in, of all places, a wowhead comment from Wrath of the Lich King.

Since the derivation is not overly complicated, I wrote it up formally in the appendix of this pdf. However, it seems like it would be convenient to have this written up in an easily-linkable form, so I’m going to reprise the derivation here.

Assumptions

First, we need to be clear about what it is we’re modeling. We’re considering a proc-based buff, such as what’s granted by many trinkets or enchants. We’ll assume that we have proc triggers occurring at equal intervals of $\Delta t$, with a proc chance per trigger of $p$. Consequently, the probability of an event not generating a proc is $q=1-p$. A proc event grants a buff of duration $D$, and a proc that occurs during an existing buff simply refreshes the buff to its full duration. We’re assuming that the granted buff does not itself modify $p$ for now; at the end of the post I’ll demonstrate how the derivations is modified in that case.

Markov Chains

We start by assuming the system can be modeled by a time-homogenous Markov Chain. This means that the process is random and memoryless, or that the time-evolution of the system depends only on the current state (i.e. no hysteresis effects). It should be noted that these are all very good assumptions, as the proc chance usually has no dependence on prior or future states of the system.

Since we’ve divided time up into equal blocks of size $\Delta t$, and the effect has duration $D$, this uniquely defines the states of the system. We have $n=1+D/\Delta t$ possible states, each one having a specific duration remaining on the granted buff. For state $x_i$, the time remaining is $d=D-(i-1)\Delta t$. For example, if the buff duration is $D=15$ seconds, then we have states:

$\large x_1 = \text{15 sec remaining on buff}$
$\large x_2 = 15-\Delta t\text{ sec remaining on buff}$
$\large x_3 = 15-2\Delta t\text{ sec remaining on buff}$
$\large \vdots$
$\large x_{n-1} = \Delta t\text{ sec remaining on buff}$
$\large x_n = \text{0 sec remaining on buff}$

In a Markovian system, the probability of a transition between states $x_i$ and $x_j$ is independent of previous states of the system, thus it is constant. In other words, it only matters that you are in state $x_i$, not how you got there. Consequently, if you find yourself in state $x_i$ you have a unique and fixed probability of transitioning to any other state $x_j$. Let us define the probability of transitioning from state $j$ to state $i$ as $P_{ij}$. We can then write the $n \times n$ transition matrix describing this Markov process:

$\mathbf{P} = \left[ \begin{array}{cccc}
P_{11} & P_{12} & \cdots & P_{1n} \\
P_{21} & P_{22} & \cdots & P_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
P_{n1} & P_{n2} & \cdots & P_{nn}
\end{array} \right]\large$

To find the steady-state solutions, one simply has to solve the equation

$\large \mathbf{P}\mathbf{X}=\mathbf{X} \hskip5in \text{(1)}$

where $\mathbf{X}$ is the vector of possible states $(x_1, x_2,…,x_n)$.

Solving the system

There are a number of ways to solve this system. For example one could recognize that for an arbitrary starting vector $\mathbf{b}$, the steady state solution is $\mathbf{X=S= P}^\infty {\bf b}$. Since $\mathbf{b}$ is arbitrary, if a steady state solution exists for matrix $\mathbf{P}$ the matrix $\mathbf{P}^\infty$ must be a rank-one matrix whose columns are the steady-state distribution $\mathbf{S}$. Or in other words,

$\large \displaystyle \lim_{k \rightarrow \infty} {\bf P}^k = {\bf S1}$

where $\mathbf{1}$ is the $1\times n$ row vector $[1,1,1, …, 1]$. So one could numerically compute the solution by simply calculating $\mathbf{P}^k$ for very large $k$. This technique is particularly useful for very large state spaces.

However, if the matrix is fairly sparse, it is often possible to find an analytic solution. This happens to be the case for the simple proc effect we’re considering. For any choice of initial state $x_i$, there is a probability $p$ that the system transitions to state $x_1$ – i.e., that the effect procs and the duration of the buff is set to $D$. The only other transition out of state $x_i$ is to state $x_{i+1}$, meaning that the effect did not proc and the duration of the buff has been reduced by $\Delta t$. From this, we can define a subset of the transition probabilities $P_{ij}$:

$\large P_{1j} = p$
$\large P_{j+1,j} = q$

(remember that $P_{ij}$ represents the probability from state $j$ to state $i$). There is only one other nonzero transition, which is the possibility that the system is in state $x_n$ and remains in state $x_n$ (meaning the buff was not active and a proc did not occur, leaving the state unchanged). This, of course, also happens with probability $q$, such that $P_{nn}=q$. All other elements of the transiton matrix are zero, giving it the following form:

$\large \mathbf{P} = \left[ \begin{array}{ccccccc}
p & p & p & \cdots & p & p & p\\
q & 0 & 0 & \cdots & 0 & 0 & 0\\
0 & q & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & q & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots\\
0 & 0 & 0 & \cdots & q & 0 & 0 \\
0 & 0 & 0 & \cdots & 0 & q & q
\end{array} \right]$

For a more concrete example, let’s assume that $\Delta t=3$~seconds and $D=15$~seconds, which yields the following $6 \times 6$ matrix $\mathbf{P}_6$:

$\large \mathbf{P}_6 = \left[ \begin{array}{cccccc}
p & p & p & p & p & p\\
q & 0 & 0 & 0 & 0 & 0\\
0 & q & 0 & 0 & 0 & 0 \\
0 & 0 & q & 0 & 0 & 0 \\
0 & 0 & 0 & q & 0 & 0 \\
0 & 0 & 0 & 0 & q & q
\end{array} \right]$

To solve equation (1), which is a simple eigenvector equation with an eigenvalue $\lambda=1$, we subtract $\mathbf{X}$ from both sides:

$\large \left ( \mathbf{P}-\mathbf{I}_n \right ) \mathbf{X}=0$

where $\mathbf{I}_n$ is the identity matrix of dimension $n$. This gives us the following matrix:

$\large \mathbf{P}-\mathbf{I}_n =
\begin{bmatrix}
p-1 & p & p & \cdots & p & p & p \\
q & -1 & 0 & \cdots & 0 & 0 & 0\\
0 & q & -1 & \cdots & 0 & 0 & 0 \\
0 & 0 & q & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & q & -1 & 0 \\
0 & 0 & 0 & \cdots & 0 & q & q-1
\end{bmatrix}$

or, for our special case of $\Delta t=3$~s, we have:

$\large \mathbf{P}_6-\mathbf{I}_6 =
\begin{bmatrix}
p-1 & p & p & p & p & p\\
q & -1 & 0 & 0 & 0 & 0\\
0 & q & -1 & 0 & 0 & 0 \\
0 & 0 & q & -1 & 0 & 0 \\
0 & 0 & 0 & q & -1 & 0 \\
0 & 0 & 0 & 0 & q & q-1
\end{bmatrix}$

For this equation to have solutions other than the trivial solution $\mathbf{X}=0$, the matrix $\mathbf{P}-\mathbf{I}_n$ must have a nullity greater than zero (i.e. at least one row must be linearly dependent upon the others). It is a trivial exercise to show that this is indeed the case for this matrix.

Since this matrix is very sparse, we can simply solve the system of equations given to us by $\left ( \bf{P}-\bf{I}_n \right ) \bf{X}=0$ to find the steady-state solutions:

$\large 0 = (p-1)x_1 + px_2 + px_3 + … + px_n$
$\large 0 = qx_1 – x_2$
$\large 0 = qx_2 – x_3$
$\large \vdots$
$\large 0 = qx_{n-2} – x_{n-1}$
$\large 0 = qx_{n-1} + (q-1)x_n$

The first equation can be re-written as:

$\displaystyle \large x_1 = p\sum_{i=1}^{n} x_i = p$

where we have invoked the normalization condition of the vector $\mathbf{X}$ (the probabilities of being in any given state must sum to one).

From here it is quite simple to find the rest of the steady-state probabilities. We plug $x_1=p$ into $qx_1=x_2$ to find $x_2=pq$, plug that into $x3=qx_2$ to find $x_3=pq^2$, and so on to get $x_{n-1}=pq^{n-2}$. The last equation deviates from the pattern slightly, and plugging in our expression for $x_{n-1}$ gives us

$\begin{align}
\large x_n &=\left ( \frac{q}{1-q}\right ) \\
x_{n-1} &= \left ( \frac{q}{p} \right ) \\
x_{n-1} &= \left ( \frac{q}{p} \right ) pq^{n-2} \\
x_n &= q^{n-1}
\end{align}$

Thus, our complete steady-state solution of the system is

$\large \mathbf{X} = (x_1, x_2, …, x_n) = (p, pq, pq^2, pq^3, …, pq^{n-2}, q^{n-1})$

Note that when the system is in any of the states $x_1$ to $x_{n-1}$, the buff is active. It is only while in state $x_n$ that the buff is not active. Thus, the downtime of the buff is equal to the probability that we are in state $x_n$, or $q^{n-1}$. We also note here that we have $N=n-1$ chances to refresh the buff during the duration $D$, so we could equivalently write this downtime as $q^N$. The uptime is formally the sum of the probabilities corresponding to the other states:

$\displaystyle \large U=\sum_{i-1}^{n-1}x_i$

But invoking the normalization condition $\displaystyle \sum_{i=1}^n x_i = 1$ once more, we can simplify this to

$\large U=1-x_n = 1-q^N$

Which is the often-quoted expression we were after.

We were conveniently vague in our definition of the parameter $\Delta t$. In general, it is chosen according to the proc trigger of interest, such that $N$ is clearly defined. In cases where proc triggers are unevenly spaced (such as is common with RPPM effects), we can still get good agreement by choosing a $\Delta t$ that is small compared to the mean interval between proc events, with $p$ adjusted accordingly.

Procs that modify $p$

We stated earlier that we were making the assumption that the buff being granted by the effect did not modify the proc chance $p$. However, there are a number of cases where that does happen, and it would be nice to have an expression to model those uptimes. To do that, we start with the same system as before, with base proc chance $p$, buff duration $D$, and trigger interval $\Delta t$. However, in this case we allow the buff to modify the proc chance, raising it to $p+dp$ while the buff is active.

The problem proceeds as before, with the same state space definition $\mathbf{X}$. However, the transition matrix must be adjusted slightly. It now has the form:

$\mathbf{P} = \left[ \begin{array}{ccccccc}
p+dp & p+dp & p+dp & \cdots & p+dp & p+dp & p\\
q-dp & 0 & 0 & \cdots & 0 & 0 & 0\\
0 & q-dp & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & q-dp & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots\\
0 & 0 & 0 & \cdots & q-dp & 0 & 0 \\
0 & 0 & 0 & \cdots & 0 & q-dp & q
\end{array} \right]$

which accurately represents the increase in proc rate for all states with an active buff. We again solve

$\mathbf{PX}=\mathbf{X}$

which gives us the following system of equations:

$\large x_1 = (p+dp)x_1 + (p+dp)x_2 + (p+dp)x_3 + … + (p+dp)x_{n-1} + px_n$
$\large x_2 = (q-dp)x_1$
$\large x_3 = (q-dp)x_2$
$\large ~~ \vdots$
$\large x_{n-1} (q-dp)x_{n-2}$
$\large \displaystyle x_n = \frac{q-dp}{1-q}x_{n-1}$

The first and last equations are each unique, while the middle equations all have the form $x_j=(q-dp)x_{j-1}$ for $2\leq j\leq (n-1)$. Taking into account all $n-2$ of these equations, we can write

$\large x_{n-1}=(q-dp)^{n-2}x_1$

and substitution into the last equation reveals that

$\displaystyle \large x_n=\frac{(q-dp)^{n-1}}{1-q}x_1 = \frac{(q-p)^N}{p}x_1 $

The first equation can be written more simply in summation form, and then simplified with state normalization rules:

$\large \displaystyle x_1 = p\sum_{i=1}^{n}x_i + dp\sum_{i=1}^{n-1}x_i$
$\large x_1 = p + dp(1-x_n)$
$\large x_1 = p + dp – dp x_n$

We then substitute our expression for $x_n$ and perform some simple algebra to get $x_1$:

$\large \displaystyle x_1 = p+dp-\frac{dp(q-dp)^N}{p}x_1$

$\large \displaystyle x_1 \left [ 1+\frac{dp(q-dp)^N}{p} \right ] = p+dp$

$\large \displaystyle x_1 =\frac{p(p+dp)}{p+dp(q-dp)^N}$

and therefore the expression for $x_n$ is

$\large \displaystyle x_n = \frac{(q-p)^N}{p}x_1$

$\large \displaystyle x_n = \frac{(q-p)^N}{p}\frac{p(p+dp)}{p+dp(q-dp)^N}$

$\large \displaystyle x_n = \frac{(p+dp)(q-dp)^N}{p+dp(q-dp)^N}$

It can be shown that, when performing the calculation this way, the steady-state solution vector is automatically normalized. We leave this as an exercise for the reader, because we’re evil like that (or check the pdf version). However, once we’re satisfied that $\mathbf{X}$ is normalized, our value for $x_n$ is the appropriate steady-state probability we need.

The uptime is $U=1-x_n$ as before, which is simply

$\large \displaystyle U=1-\frac{(p+dp)(q-dp)^N}{p+dp(q-dp)^N}$

It should be noted that as $dp\to 0$, $U\to 1-q^N$ as before.

This entry was posted in Theck's Pounding Headaches, Theorycrafting and tagged , , , , , . Bookmark the permalink.

17 Responses to Theorycrafting 101: Markov Chains

  1. Helistar says:

    Even before reading it: good stuff (as usual from your blog :).

  2. Wrathblood says:

    Thanks for putting this up, Theck. I was just trying to teach myself how to do parts of that a few weeks ago and it looks like I was, as I suspected, doing it wrong.

  3. Tanek says:

    TL;DR: Yes Theck can indeed prove the “proc buff uptime formula.” In order to do so he uses a lot of symbols and numbers that you probably don’t know or understand. Just believe him :)

  4. Weebey says:

    Theck says

    “We start by assuming the system can be modeled by a time-homogenous Markov Chain. This means that the process is random and memoryless, or that the time-evolution of the system depends only on the current state (i.e. no hysteresis effects). It should be noted that these are all very good assumptions, as the proc chance usually has no dependence on prior or future states of the system.”

    Which raises an issue that has been bugging me for a while: is the Markovian assumption actually valid? Obviously for traditional PPM procs with no ICD it is, but of course more procs are shifting to RPPM. The only official source I have that seen on RPPM that goes into any sort of detail is the original announcement, which contains the following:

    “It can proc from any damage/healing event. It keeps track of the last time it had a chance to proc for that enchant.
    It calculates the difference in time since the last chance to proc. It uses that time to determine the chance for that event to trigger a proc.
    For example, if you have 22% Haste, it was 1.4sec since the last chance to proc, and you’ve got Windsong, then the chance to proc is 2(ppm) * 1.22(haste) * 1.4(time since last chance) / 60 (sec per min) = 5.693%.
    The ‘time since the last chance to proc’ is capped at 10sec, so that your first attack of a fight isn’t a guaranteed proc.”

    From http://us.battle.net/wow/en/forum/topic/6893549789#1

    Now, you can obviously modify the problem to render it Markovian–just make it so that your state space tracks not only the remaining time on the buff, but also the time since your last proc. Since the time since the last proc is capped, one still gets a finite Markov process, but the solution is going to be a bit different.

    It is also possibly, of course, that the post I quoted from is incorrect as to the functioning of RPPM procs on live.

    • Duncan says:

      I have absolutely no clue as to what’s being said here (at best, I was a B+ Calc student), but I did see this tweet fro Ghostcrawler:
      – Greg Street ‏@Ghostcrawler
      We also added some protection to the ToT RPPM trinkets to avoid streaks of bad luck and buffed some to boot. Details soon.

      Not sure if that’s even relevant or important, but I thought to pass it along.

    • Theck says:

      It’s still reasonably valid for RPPM as long as you have a large number of events (i.e. that you’re in the Poissonian regime). The proc chance on any given event fluctuates from event to event based on the time between events, of course, which means it’s not strictly Markovian anymore. But as long as the time between events is small enough that the proc chance per event is also small, it’s reasonable to make the assumptions I have in this post – primarily that we can use a fixed, average time between events to model the effect.

      Where it starts to fall apart is when you get large variances in the time interval between proc triggers. For example, if you have a 15-second gap and then a flurry of events, followed by another large gap and a flurry of events, and so on. Generally this isn’t how it works in combat, so by and large these approximations should be valid for all combat situations in WoW.

      Another thing that would throw it off is if the proc chance depended on the time since the last *proc* rather than the last trigger event – that definitely violates the Markov approximation. However, that’s not how RPPM effects work in WoW, so we don’t need to worry about that.

  5. Weebey says:

    Oh, silly me. I didn’t read carefully, and thought that the time factor was referring to the last proc (hence acting as a kind of forced mean reversion factor), rather than the last proc trigger event. You are obviously correct that with combat model considered here–with proc triggers occurring at regular intervals–the process is Markovian.

  6. Weebey says:

    Amusingly, it appears that they have just hotfixed in changes to make at least some RPRM procs track time since your last proc, in addition to the last proc trigger.

    • Theck says:

      Yeah, I saw that. Hamlet and I briefly discussed how that changes things on twitter. In short, it certainly breaks the Markov approximation, but it shouldn’t actually cause a large deviation in the average results. The model outlined in this post is thus a “first-order” approximation, and to get more accurate results we’d include the new mechanism as a perturbation. Standard perturbation theory should work to give us the second-order contribution (and higher-order contributions, if desired).

  7. sarosnar says:

    Hi thech. Its been like 3 month i follow your blog. YOu do incredibly great job. Thx :)
    I play war and in my free time i like to do some tc on my class.

    Sorry for my wording, English is not my native tongue :p

    Currently i try do simulate the average stats that 5.2 str trinkets gives based on a variable Haste value.
    This post gave me a lot of idea to make those modelisation :)

    I Just managed to get the Average str that the “primordius Talisman of rage” give. I used a 5n transition matrix where each state had: a time remaining and a proc value (1 proc, 2 proc, 3proc, …)
    With some math thats looks like yours i did it.

    Now i want to modelize the average crit that give the “gaze of the twins” trinkets. It looks easier because this trinket has only 3 state of the proc possible which give a 3n matrix transition …. BUT when we got a crit proc it change the probability to change the state.
    p0 = p = probability with 0 proc
    p1 = p + dp =probability with 1 proc
    p2 = p + 2 dp = probability with 2 proc
    p3 = p + 3 dp = probability with 3 proc

    I really can’t find a solution to solve the system because of all those dp, 2dp, 3dp i can’t factor … Do you have any clue for me ?

    • Theck says:

      Yeah, when you start to extend this to stackable effects, it gets ugly quickly. I started working through the problem for an arbitrary stack size (so a (M*n)x(M*n) matrix), but had to set it aside to work on some other things and haven’t found time to pick it back up again.

      The procedure you’re using is basically correct though. The form will be a 3n+1 x 3n+1 matrix comprised of 9 individual nxn sub-matrices describing the transitions within a given stack level and the “0-stack” column. Each sub-matrix has the same form as the P matrix in this post – (p+N*dp) along the first row and (q-N*dp) along the sub-diagonal. The last column would be (p+(N-1)dp,0,0, …, 0, 0, q-(N-1)*dp).

      I’m not sure the system will be easy to solve, but it’s the right approach. One possible simplification is to condense the n states of each sub-matrix into 1 state, such that a buff that stacks to 5 would be described by a 6×6 matrix. In that case the elements of the transition matrix need to be the solutions to the nxn system we’re condensing into a single state.

  8. Frobes says:

    Is there anyone that does warlock thoerycrafting like you do for Pallies?

    • Surutcra says:

      Any specific request ?

      Most specific questions (like is that piece of gear better than this one, or how much DPS is that set bonus worth) you can figure out using simcraft and an APL suitable to your issue. In general it can be pretty difficult to come up with an analytical answer to a very specific problem because of how complex the “dps system” is, resulting in simulation being a better approach than modelisation.

      However I’m not againts the idea of doing some math to answer some more general questions (I spent quite some time on modelling the RPPM system during 5.2 PTR for example).

Leave a Reply