Vengeance, With A Vengeance

The developers have been hinting that a major info dump is coming soon™, and that probably includes some more detail about how Vengeance will work in Warlords of Draenor. If you’re a long-time reader of the blog, you probably know that we’ve been pretty hard on Vengeance several times in the past. But with a new expansion, there’s new hope for an implementation that actually works well.

The Good

One of the things we do know about the new version of Vengeance is that it won’t affect our DPS output. They’re finally severing the connection between damage intake and damage output. After years of complaining about the pitfalls and frustrations of that mechanic, I’m considering this a moral victory.

More importantly, it means that for the first time in a long time I’m really enthusiastic about Vengeance. If you recall, most of the objections that Meloree and I have made about Vengeance over the years have centered around the damage output component. We’ve pointed out the backwards logic of encouraging tanks to take more damage to increase their DPS, the huge discrepancy between damage output in solo play and raids, the feeling of uselessness while off-tanking, the frustration of having little to no control over your DPS output (and thus no way to properly evaluate it), and the way it encourages cheesy tricks like one-tanking and /sit-tanking to game the mechanic. All of that is going away, hopefully for good.

This change is probably the one thing I’m looking forward to most in Warlords. Partly out of a feeling of vindication, but mostly just because of functionality. I can’t wait to put out respectable damage in solo and small group content without having to switch to Retribution.

Lessons To Be Learned

Blizzard frequently talks about how they iterate on mechanics, applying the lessons they learn from previous incarnations to improve new versions. I think that severing the DPS connection is obviously one of those cases. But I don’t think that’s all that the devs can stand to learn from the 5.x implementation of Vengeance.

To illustrate that thought, I want to show you an excerpt from one my my recent heroic Thok logs. In particular, I want to consider a period in the last phase of the encounter immediately after I taunt Thok.

Here’s the attack power graph for that section of the fight:

Attack Power plot for a portion of the 25H Thok encounter.

Attack Power plot for a portion of the 25H Thok encounter.

I start at 300k after taunting and rise to over 600k at the peak. This is pretty normal for the last few bosses of the tier, though of course earlier bosses don’t hit as hard. But remember, I have around 40k-50k attack power out of combat. That means that as much as 90% of my DPS is coming from Vengeance, rather than my gear, and thus not directly under my control.

But the part that’s really eye-opening is the healing graphs. Let’s switch to the healing view and filter the log for Eternal Flame. As a point of nomenclature, I use “Word of Glory” to refer to the base heal and “Eternal Flame” to refer to the heal-over-time (HoT) portion to keep them straight. The log, of course, uses the same name for both. But nonetheless, let’s look at the plot of healing done per second:

HPS output of Eternal Flame during a portion of the 25H Thok encounter.

HPS output of Eternal Flame during a portion of the 25H Thok encounter.

This plot, which includes overhealing, suggests that I’m producing about 150k-250k HPS just with Eternal Flame.  And those two spikes are the Word of Glory heals, which are obviously really huge. Let’s see exactly how huge:

Event view for Eternal Flame for this portion of the 25H Thok encounter.

Event view for Eternal Flame for this portion of the 25H Thok encounter.

The base Word of Glory heals are 1M (at ~400k Veng) and 1.5M (over 600k Veng) when I refresh Eternal Flame. The Eternal Flame ticks generated by those casts are ~260k and ~400k, respectively, occurring every ~1.8 seconds.

Note that I have a little over 1 million hit points. That means the base WoG heal is basically a Lay on Hands, limited only by Bastion of Glory ramp-up time. It also means that the HoT, which is providing 140k-220k HPS all by itself, is capable of healing me to full every four to six seconds at high Vengeance.

And that’s just Eternal Flame. If you filter the log for Seal of Insight you’ll see that it is also healing for 100k-140k per tick, producing another 100k-200k HPS. Combined, these two effects heal for ~250k to 420k every second. That means I’m essentially healing to full every 2-4 seconds just from these two passive sources.

You might note that I’m including overhealing here, but if you’ve read any of my survivability posts over the last year or so you should already realize that it’s a mistake to immediately discount overhealing. Because that overhealing isn’t overhealing when you’re in a dangerous situation, like during a damage spike. The fact that it overheals when you’re safe is irrelevant if it saves your ass when your ass actually needs saving.

It also has implications beyond just spikes. With enough avoidance and mitigation, I’m producing enough healing to keep myself alive without healers against this boss. This is an effect we’ve seen in Simulationcraft and discussed before. But it’s also happening in-game, on fights like Thok and Siegecrafter Blackfuse. On more than one attempt, I’ve been able to tank each of these bosses for well over a minute after everyone else had died, just with my own self-healing. Other paladins I’ve talked to have been able to do the same (one even boasts a 3-minute solo on Thok until he hit enrage).

Playing The Blame Game

Now, as far as I can tell, other classes aren’t capable of this degree of self-sufficiency. So it’s not clear that this problem is all Vengeance’s fault. But it’s definitely one of several contributing factors. And it underlies one of the lessons we can learn from 5.x Vengeance: I think it is far too generous.

See, Vengeance increases with the boss’s raw damage throughout an expansion, and even within a tier. So on early bosses you might only have 200k Vengeance, while later bosses will give you upwards of 500k. And of course, those later bosses do more damage than the earlier bosses do – which is why they give more Vengeance in the first place.

But as the expansion goes on, your mitigation and avoidance keep increasing. So while the 500k Vengeance boss gives you twice as much Vengeance as a 250k boss, your gear upgrades between the time you first encounter those bosses mean that you have more mitigation and avoidance. So you don’t actually take twice as much damage from that later boss, because you’re avoiding and mitigating more of it.

In addition, our self-healing grows rather generously with attack power thanks to Eternal Flame, Bastion of Glory, and Seal of Insight. So those later bosses are giving us (say) twice as much attack power, and thus roughly twice as much healing througput, without dishing out twice as much damage taken.

To give a more quantitative bent to that thought, consider the ratio of self-healing done to damage taken:

$$ R = \frac{{\rm SH}}{\text{DT}} \propto \frac{{\rm AP}}{(1-A)(1-M)\text{RD}} \propto \frac{k{\rm RD}}{(1-A)(1-M)\text{RD}}$$

Self-healing ${\rm SH}$ is proportional to attack power, which is proportional to some constant $k$ times the boss’s raw damage $\text{RD}$. Our damage taken is also proportional to the boss’s raw damage, but with additional factors $(1-A)$ and $(1-M)$ to account for our avoidance $A$ and average mitigation $M$ (I’m lumping armor, Shield of the Righteous, and blocking all together here).

Note that if this ratio is below one, then we take more damage than we can heal up. But if it goes above one, we’re healing for more damage than we take. In other words, a ratio of $R=1$ is the self-sufficiency limit, above which we can take care of ourselves (at least up until the boss is capable of one-shotting us).

It should be pretty clear what happens over the course of an expansion. As the expansion goes on, $A$ and $M$ increase,  $(1-A)$ and $(1-M)$ decrease, and the ratio gets larger. At the beginning of an expansion, we may be able to heal for 30% to 50% of our damage taken at best. But by the end of the expansion, when we’re pushing ~75% Shield of the Righteous uptime, ~35% avoidance, ~35% block, and 60% mitigation from armor, we’re able to push this ratio significantly above one. Which is why we can solo-tank Thok or Siegecrafter until their stacking debuff effects let them one-shot us.

Again, to illustrate that thought, let’s look at my damage taken for this same period:

Damage taken for the same period of the 25H Thok encounter.

Damage taken for the same period of the 25H Thok encounter.

If you total that up, you get about 12.6 million damage during this 54-second period, or about 233k damage taken per second. Now look at the healing table:

Self-healing from all sources for the same portion of the 25H Thok encounter.

Self-healing from all sources for the same portion of the 25H Thok encounter.

Even if we only consider Seal of Insight and the HoT portion of Eternal Flame, that’s 17.1 million healing. So from passive sources alone, our ratio is $R=1.36$. In other words, I’m passively healing for 36% more damage than we’re actually taking. And that’s ignoring the set bonus and my two Word of Glory casts, which would bring the total up to 21.1 million healing and a ratio of $R=1.67$.

The point in all of this is that our self-healing scales far too well with attack power, and thus with Vengeance.  As we get more “tanky” with more gear, we actually get more Vengeance than we need to compensate for our damage intake. As a tank, I think this is a problem because I don’t believe that tanks should ever be self-sufficient. The bulk of our healing should come from external sources to keep the tank–healer leg of the tank-healer-DPS interaction trinity alive. It’s one thing to have a lot of control over your survivability (which we do, thanks to active mitigation). It’s another thing entirely to be able to be your own healer when other classes can’t.

I don’t think the developers are ignorant of this fact, either. To compensate, the developers have reduced the conversion percent $(k)$ several times over the course of the expansion to attempt to account for this effect, but it simply hasn’t been effective enough. Or at least, not for us. I think they’ve probably kept all of the other tanks in line with these reductions, but somehow we slipped through the cracks (more on that in a bit).

There are some ways to ensure this doesn’t happen, or at least to prevent the need to change $k$ several times per expansion. The issue here is that the ratio of $k/(1-A)(1-M)$ grows as $A$ and $M$ grow. So the logical solution is to let $k$ vary the same way. The simplest way to do that is to use actual damage taken to determine Vengeance rather than raw damage. That introduces a factor of $(1-A)(1-M)$ in the numerator, which automatically corrects for variations.

Note that I argued strongly against doing this in the past, which may seem inconsistent. But the earlier versions of Vengeance gave us extra damage output for taking more damage. We’ve definitely seen antics this expansion that legitimized that concern. But if that’s no longer possible then we don’t have to worry about tanks feeling encouraged to stand in the fire to produce more DPS. As long as the damage-taken-to-Vengeance conversion is sane (i.e. even remotely balanced), we’ll get less self-healing back than the extra damage we take, so there wouldn’t be an advantage to taking more damage.

But while simple, this solution has its problems. For one thing, it would be awful for avoidance tanks, because it would make Vengeance really spiky. It would penalize you for avoiding attacks, which is bad if avoided attacks are something we should ostensibly be happy about. And while it may not matter as much once dodge and parry ratings don’t show up on gear, it’s still an odd quirk we’d like to avoid. Worse yet, it punishes you for using your active mitigation, which we definitely want to avoid.

An alternative that causes fewer issues is to keep the current Vengeance implementation, but use “estimated post-mitigation damage” rather than raw damage. And what I mean by that is that we define $k$ to be $k=(1-A)(1-M){\rm RD}$. In other words, every attack you receive gives you Vengeance whether you avoid it or not, just like it does now, but the amount of vengeance is artificially reduced based on your character sheet avoidance and mitigation.

This is tricky, insofar as it still has the negative interaction with avoidance, but it’s a weaker and more smoothed-out effect. To make it work, they would probably also have to exclude active mitigation sources from $M$, which means it would be primarily armor, spec-based mitigation, and possibly blocking. Excluding active mitigation means there would still be some creep in the ratio over the course of the expansion, but a judicious choice of $k$ would ensure that it keeps the ratio at sensible levels.

Maybe the simplest version is to just keep Vengeance as it is (minus the damage component, obviously), but slash $k$ significantly enough to keep the ratio low even in the highest-Vengeance cases. This also weakens Vengeance a lot, but that may not be a bad thing. Before Vengeance existed, there was a real sense of fear tanking a harder-hitting boss because your defenses didn’t immediately scale up to meet it. A weaker version of Vengeance would bring some of that feeling back. The downside, of course, is that WoG might lose ground compared to SotR in that system, making one or the other the better choice on a boss-to-boss basis.

But rather than dwell to long on ways to “fix” Vengeance, especially in the absence of information about how it will be calculated in WoD, I want to take this discussion in a different direction. For a moment, let’s look at the big picture. What if we’re the only class having these odd scaling issues. In fact, this isn’t much of a “what if” because I think this is actually the case. So if the problem is us, then maybe the solution isn’t to tweak Vengeance, but to tweak us. But how?

Back to Basics

From the logs we’ve analyzed above, it’s clear that a large portion of the problem is the massive effect that Vengeance has on Seal of Insight and Eternal Flame. Sure, I think it’s overpowered to be able to fire off a 1-million-point WoG every 20 seconds – Lay on Hands has a 5+ minute cooldown for a reason, after all – but the bulk of our self-sufficiency comes from these two passive healing sources. So the question becomes “Which abilities should 6.0 Vengeance affect, and how?” To answer that, first let’s consider the purpose of Vengeance for a moment . Celestalon put it fairly succinctly during the twittergeddon following Friday’s blog post:

So that things like Shield Block and Shield Barrier can stay competitive with each other.

In other words, it exists to keep point-based active mitigation (Shield Barrier, Word of Glory) competitive with percent-based mitigation (Shield Block, Shield of the Righteous).

To illustrate why that’s an important goal, imagine that Vengeance didn’t exist in Mists of Pandaria. Let’s ignore Bastion of Glory for a moment and say WoG heals for about 30% of your health at a certain gear level. If you’re raiding in a 25-man, you’d almost never cast it, because Shield of the Righteous will mitigate that much damage or more from a single swing, let alone two. But in a 10-man, where the bosses don’t hit as hard, you could almost ignore Shield of the Righteous and chain-WoG yourself.

That disparity in gameplay isn’t ideal. It would be better if your class worked the same way regardless of setting. It’s more immersive if the question you ask yourself when choosing a finisher is “do I need a heal right now” rather than “are there more than X players in my raid.” And this concern isn’t going away in Warlords – in fact, it’s getting more ubiquitous since normal and heroic modes will be flexible.

Another way to phrase the purpose of Vengeance is that it’s there to make sure that active mitigation abilities have resource parity. If Shield of the Righteous and Word of Glory both cost 3 Holy Power, then they have to perform similarly. Not identically, of course – for good, solid, interesting gameplay there should be situations where you’d choose one or the other. But we can’t have one of them be so dominant that you can take the other one off of your bars either.

Bastion of Glory accomplishes this to some degree, because it introduces an interaction between the two, and subsequently a time factor. You can chain-cast Word of Glory, but it will be weak. It gets a lot stronger (and thus more efficient per Holy Power) if you cast a few SotRs first. This interaction inherently makes that choice interesting, and limits the usefulness of strong WoGs to one every 20 seconds or so without artificially adding a cooldown to the spell. It’s a really great design, all told.

But it doesn’t solve everything, because it doesn’t let Word of Glory scale with boss damage, and to compete with Shield of the Righteous for resources, it has to.

Hope Springs Eternal

However, Seal of Insight doesn’t compete for our Holy Power. It’s automatic – we don’t even cast it. There is never a situation where we choose between another Shield of the Righteous and Seal of Insight.

Eternal Flame is a slightly different beast. In concept, it doesn’t compete for Holy Power either because we get the same Word of Glory heal with or without the talent. The added bonus of Eternal Flame is the heal over time, which could be construed as an extra bonus. This is really only true if you have the T16 4-piece bonus though, which disconnects Eternal Flame maintenance from the Holy Power opportunity cost.

In practice, Eternal Flame’s HoT is so strong that without the set bonus, we’re really choosing between spending Holy Power on Shield of the Righteous and spending it on the Eternal Flame heal over time. It turns it into a choice between a Shield of the Righteous that shaves ~300k off of each boss attack for 3 seconds and an Eternal Flame that heals us for 300k every two seconds for 30 seconds. The latter is just far more efficient, and the ability to overlap them so powerful that it isn’t even much of a choice.  In some sense, Eternal Flame becomes our version of Inquisition, with the caveat that we’d rather refresh it at high Vengeance. The gigantic Word of Glory heal becomes a bit of an afterthought, and I think that’s a bit of a problem.

And the weird self-sufficiency effects in 5.4 are all “collateral damage” from Eternal Flame and Seal of Insight due to the direct Vengeance-to-AP conversion. Eternal Flame in particular gets a huge boost from Vengeance thanks to it’s collection of multiplicative modifiers, which makes it tough to keep other talents (Sacred Shield) competitive with EF over a large range of AP values.

The Last Bastion

Bastion of Glory is part of the problem too. Eternal Flame benefits from Bastion thanks to buffs back in 5.2 when Eternal Flame was far behind Sacred Shield in survivability. And while I advocated for those buffs at the time, in retrospect it was the wrong call. It definitely made Eternal Flame stronger for Protection (though at the time, still not strong enough to be competitive), but did it in the worst possible way. The difference between a 5-BoG EF and a 0-BoG EF is huge, roughly a factor of 3 or 4 depending on mastery levels. And since that factor ends up applying to our huge Vengeance accumulations, the multiplicative nature makes Eternal Flame ludicrously powerful if we can refresh it with 5 stacks and at high Vengeance.

It also introduces a number of annoying gameplay intricacies. For example, is it worth replacing a 5-BoG EF with a 3-BoG one? Your gut would say no, but in many cases (like after a taunt) it is. If you gained a lot of Vengeance, the 3-BoG EF would be significantly stronger. Likewise, sometimes it’s not worth replacing a 3-BoG EF with a 5-BoG EF if you’ve lost Vengeance or if the 3-BoG EF was cast under Bloodlust and/or Avenging Wrath. It’s complicated enough that nobody can do that math in their head on the fly, especially given the lack of any sort of Vengeance display in the base UI. So to take advantage of those nuances, a player needs equally-complex WeakAuras to simplify the problem down to a go/no-go decision they can use to make split-second decisions.

That last bit is what really pushes me into “this is a bad mechanic” territory. It’s not transparent. The UI doesn’t provide clear information about it. It’s not easy for an advanced player to understand, let alone a beginner. It adds a type of depth and complexity to tanking, for sure, but it does so by making the timing of the EF refresh very sensitive to three or four different factors that the player doesn’t have an easy way to monitor outside of add-ons.

And in the process, it removes depth and complexity of timing the Word of Glory heal based on your health or expected damage. So I’m not sure it’s really adding that much depth overall, it’s just shifting it from being aware of the boss, your health, and combat to being aware of three arbitrary indicator dials (Vengeance, Haste, and BoG stacks).

If anything, I’d actually call that a loss. Because it actively dissuades players from using Word of Glory the way it was meant to be used – to react to damage spikes. Now, if you have to use your emergency heal, but you don’t have five Bastion stacks, you’re sacrificing even more long-term survivability to use your emergency heal. You’re actually penalized for using your emergency heal as an emergency heal!

In retrospect, I’m sorry I suggested that fix (though of course it’s not clear my suggestion had anything to do with it being implemented). Because I think it would have been more aptly solved with a simpler one: the “100% more healing when self-cast” solution, or equivalently, just increasing the size of the AP coefficient for protection. While bland, it produces all of the desired effects. It can be tuned such that the spell remains competitive with Sacred Shield, but without the huge swings in power with Bastion of Glory stacks and without subverting the design of Word of Glory.

A Limited Time Only Flame

So how do the developers “fix” all of these problems?

First of all, I think that Vengeance should only affect Word of Glory. It can be balanced such that spending HP on Word of Glory should heal for more than is mitigated from a single attack by SotR.  It should probably be tweaked such that a 5-BoG WoG heals for a little more than SotR would mitigate off of two attacks, but a 0-BoG WoG is close to the single-attack SotR value so that it’s useful no matter how many BoG stacks you have. That’s just a matter of fitting numbers, and keeps WoG an interesting choice over a large variety of content levels. And for clarity, that choice is “Do I need a heal right now to survive the next boss attack, or would I rather put up SotR to increase smoothness over the next two attacks?”

With none of our other healing abilities (Eternal Flame, Seal of Insight, and Sacred Shield) receiving a benefit from Vengeance, the balance of those abilities could be tuned much more finely. The drawback is that they wouldn’t adapt to boss damage, but if the attack power coefficients are chosen appropriately they should remain useful over several tiers of content. Since the spells won’t vary with Vengeance those AP coefficients can be made large enough to keep the skills significant without risking them being overpowered against certain bosses.

And finally, I think the Bastion of Glory interaction with Eternal Flame should be dropped. It has all sorts of unfortunate side effects, and it will be easier to balance Eternal Flame and Sacred Shield when one of them isn’t capable of fluctuating in strength so significantly. Paring both skills down to a single AP coefficient each means they can control the two effects well enough to make them truly competitive, because it will be a simple question of “do you want to absorb X every 6 seconds or heal for Y every 3 seconds,” where X and Y depend only on spellpower and can be independently tuned.

That’s really what I want to see this week, to be honest. While I’m excited to find out about the mechanics of the new version of Vengeance, the details are less important to me than these bigger issues with Eternal Flame, Sacred Shield, and Seal of Insight. I’d really like to see these passive effects toned down to be reasonable, and the only way that will happen is if they aren’t astronomically different in magnitude when they’re buffed by Vengeance, or in Eternal Flame’s case, Bastion of Glory.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , , , , , , , | 29 Comments

A Comedy of Error – Part II

As I said in Part I, I observed some strange error behavior in the 5.4.2 Rotation Analysis post. Now that we’ve had a thorough (and lengthy) review of the statistics of error analysis, It’s time we looked more carefully at the problem that started this whole mess.

Mo’ Iterations, Mo’ Problems

Once again, here was my comment about error from the Rotation Analysis blog post:

The “DPS Error” that Simulationcraft reports is really the half-width of the 95% confidence interval (CI). In other words, it is 1.96 times the standard error of the mean. To put that another way, we feel that there’s a 95% chance that the actual mean DPS of a particular simulation is within +/- DPS_Error of the mean reported by that simulation. There are some caveats to this statement, insofar as it makes some reasonably good but not air-tight assumptions about the data, but it’s pretty good.

I’m actually doing a little statistical analysis on SimC results right now to investigate some deviations from this prediction, but that’s enough material for another blog post, so I won’t go into more detail yet. What it means for us, though, is that in practice I’ve found that when you run the sim for a large number of iterations (i.e. 50k or more) the reported confidence interval tends to be a little narrower than the observed confidence interval you get by calculating it from the data.

So for example, at 250k iterations we regularly get a DPS Error of approximately 40. In theory that means we feel pretty confident that the DPS we found is within +/-40 of the true value. In practice, it might be closer to +/- 100 or so.

So let’s talk about these “deviations.” What caught my attention at first was that, even though the DPS Error reported by SimC was $\pm$ 40 DPS, I could sim the same rotation several times and get values that differed by much more than that, often in the hundreds of DPS. After looking into it more carefully, I’d say that the “$\pm$ 100 or so” I quoted in the last blog post was probably a bit of an under-estimate; $\pm$ 200 to 300 DPS might be a closer estimate to the actual variations I was seeing.

And while this is less than a 0.1% relative error given that we’re talking about DPS means near 400k, it’s still a little disconcerting. First, on a theoretical level, I believe in statistics, so it’s unsettling when they appear not to be behaving properly.  Second, it struck me as very odd that going from 50k iterations to 250k iterations didn’t seem to have a meaningful impact on the error fluctuations. As an experimentalist, I’m familiar with the process of determining how much error I can accept and how much integration time (in this case, iterations) it will take to achieve that level of confidence. So when these sims failed to meet the spec that I set, I took notice.

But a handful of assorted simulations that violate spec isn’t enough information to base a hypothesis on. I knew it wasn’t demonstrating the desired behavior. But to figure out what was wrong, I needed to first figure out exactly what behavior the system was exhibiting. And to do that, I needed more data.

Confidence Boost

In the quoted passage above, I said that what Simulationcraft reports as “DPS Error” is really $1.96 {\rm SE}_{\mu}$, which is the half-width of the 95% confidence interval (CI). The full 95% CI is $\mu_{\rm sample} \pm 1.96 {\rm SE}_{\mu}$, so it’s appropriate to say that when you look at a SimC report, the “DPS” value it reports is accurate to about $\pm$ “DPS Error.” This is a pretty natural way of reporting error, as we’ve seen in Part I.

Thinking back to our dice experiment in Part I, we said that if we repeated the experiment 100 times, we’d expect that about 95 of them would fall within the range $\mu_{\rm sample}\pm 2{\rm SE}_{\mu}$ (I’m rounding 1.96 to 2 here for simplicity). That was the meaning we ascribed to the 95% confidence interval. So one way to test the system is to do exactly that: run the simulation 100 times and take a look at the distribution of sample means.

And just to be abundantly clear about what that means, let’s assume we’re interested in the simulation error when we run it for 100k iterations. We can do that once to get a sample mean and 95% CI. We can then do it 99 more times, running the sim for 100k iterations each time, which gives us 100 sample means from the 100 independent simulations.

Our best guess at the population mean $\mu$ is the mean of those 100 sample means $\mu_{\rm sample}$ (I feel like I need an Xzibit image here…). And we could then empirically determine a value $\delta$ such that 95 of those means fit in the range $(\mu-\delta,\mu+\delta)$. If we did that, then $2\delta$ is our empirical estimate of the 95% CI. We could compare that to twice the value SimC reports as “DPS Error” to check for consistency.

There’s a number of ways to make that empirical estimate, but two of them are relatively easy in MATLAB. The first is to use the prctile() function, which we can use to find the DPS values that are the 2.5th and 97.5th percentiles of the data set. The difference of those two values is the empirical estimate of the 95% CI.

The second method is more involved, and uses Principle Component Analysis, or PCA. It also goes by a number of other names: eigenvalue decomposition, empirical component analysis, singular value decomposition, and several more. It’s related to finding principal axes in mechanics, if you’re familiar with mechanical engineering concepts. It attempts to find the confidence region (or “confidence ellipsoid”) of the data set, which is the generalization of a confidence interval into higher dimensions. When you apply it to a one-dimensional data set, though, you get the usual confidence interval.

In any event, it’s a powerful linear algebra technique that would require another whole blog post to explain, so if you’re really interested in the guts of it I suggest you read the Wikipedia article. For those that care, I’m using a function from this thread of MATLAB Central, which uses the finv() and princomp() methods from the statistics toolbox. (fun coincidence: I worked in the same building as the author of this code as a postdoc, though in a different department). The only change I’ve made is a minor correction; I’m fairly certain that the line

ab = diag(sqrt(k*lat));

should be

ab = diag(k*sqrt(lat));

so I’ve made that correction. Without the correction, the 95% CI’s the code produces are approximately half the size they should be (because $k\approx 4$) as tested with a normally-distributed data set that I generated for the purpose of testing the code. With that correction, this prediction agrees very well with the percentile-based estimate (as it should!).

So, armed with two techniques to empirically estimate the 95% confidence interval, I set to the task of doing that for various simulation lengths. In other words, run 100 simulations with 50 iterations each, then do it again for 100 iterations, and again for 250 iterations, and then for 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000. I did all of this with the T16H protection paladin profile and “default” settings in SimC.

That takes a while – the whole set of runs takes 5-8 hours depending on how many threads I use. But at the end, we get a graph that looks something like this:

Error analysis

Error analysis of Simulationcraft results. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

It’s a little harder to tell what’s going on in the top plot because it’s a semilog, but the bottom loglog plot shows the problem very clearly. At 1000 iterations (103) the three error estimates agree very well. However around 5000 iterations we see the observed error exceeding the reported error, and as we increase the number of iterations further the gap just gets larger. By 100000 iterations (105), we’re reporting a confidence interval of almost 100 DPS, but observing a confidence interval of nearly 500 DPS.

This is a problem – it means that we’ve effectively hit an “error floor” in SimC, because no matter how many iterations we throw at the problem, the error doesn’t seem to improve. And that’s pretty weird. But why?

Results Hazy, Ask Again Later

The “why” took a little more thinking. I’ve had several discussions over the past month with other SimC devs and a few academics about what might cause this sort of thing. As it turns out, everyone I spoke to had the same first guess that I did. If you remember back to Part I, we said that our error estimates were based on the Central Limit Theorem. Maybe we were violating the CLT somehow, and as a result our actual errors were larger than we expected?

If you recall, the constraints on the CLT were that each iteration needed to be independent and identically distributed. In other words, none of the iterations should depend on any of the previous iterations, and the probability distribution we’re sampling shouldn’t change from iteration to iteration. Of the two, dependence seemed like the more likely culprit.

I should note that while this was the first thought I had, the second thought I had was “but how?” Most people I talked with were similarly stumped at first. The thing that stuck out to us as the most likely culprit also seemed… somewhat unlikely. And that was the “vary_combat_length” option in SimC.

See, the default setting in SimC is to vary the combat length from iteration to iteration to smooth out the impact of cooldowns and other fixed-time-interval effects. To illustrate that concept, let’s say you had a spell with a 1-minute cooldown that gave you a 30-second buff that significantly increased your DPS (say, Avenging Wrath on steroids). If you ran the sim for exactly 1 minute and 30 seconds, you’d get two casts of that spell (once at the pull, once at the 1-minute mark) and you’d have 66.67% uptime on that buff. But if you ran the sim for exactly 2 minutes, you’d have the same two casts but only 50% uptime. Your DPS would look really good in the first sim, and significantly lower in the second sim.

So to try and reduce that problem and give a more holistic view of your DPS that accounts for fluctuations in fight length, SimC varies the fight length by up to 20% from the default of 450 seconds. That way you get a spread of cooldown uptimes that more accurately represents and average encounter.

The reason that we thought this was an unlikely candidate was that it wasn’t clear how this violated either of the CLT constraints. See, SimC doesn’t just run arbitrarily for 450 seconds by default. It does that for the first iteration, during which it tallies up the amount of damage you do, and then for subsequent iterations it gives the boss that much health and lets you go to town on it, varying the health accordingly to get longer or shorter runs.

So varying the combat time doesn’t change the relative amount of time you spend in execute range, for example. That’s important, because if you spent e.g. half of the fight in execute range, and you do more DPS in execute range, then you’ve changed the probability distribution being sampled, so we’d be violating the “identically distributed” constraint.

However, the variation in combat length isn’t random either – it follows a predetermined sequence, where it alternates between extremes. As a rough example, it might start with a run that’s 20% shorter than the average, which we’ll call “-20%.”  It’ would follow that with a run that’s 20% longer than average, or “+20%.” And then one that’s -19%, followed by another at +19%, followed by -18%, and so on. Note that these aren’t relative to the previous iteration – they’re all relative to the target length of 450 seconds. So in theory, these shouldn’t be violating the independence clause on that account. But they are somewhat deterministic because of the patterning.

So, it felt unlikely that this was the problem. But we really weren’t sure. So we tested it by repeating the experiment with the command-line argument “vary_combat_length=0” to disable the combat length variation code. And five to eight hours later, the result was this:


Error analysis of Simulationcraft results with “vary_combat_length” disabled. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Well, that didn’t help. So at the very least, the combat length variation code isn’t the only problem. We can’t rule it out completely based on this data, because it’s possible (if unlikely) that it is one of two or more contributing factors. But it certainly looks like the culprit lies elsewhere.

Death and Decay

The next candidate we came up with was a quirk of how the boss health calculation works. I glossed over this above by saying that we determine the boss’s health based on the damage done in the first iteration. But that’s not really the whole story.

There’s no guarantee that the first iteration is a representative sample of your DPS. Maybe in that first iteration you had an unusually low number of crits or Grand Crusader procs, so your DPS was below average. In that case, the health we assign the boss for iteration #2 will be a little low, and you might blow through it in 425 seconds rather than 450 seconds. If we kept using that boss health value, we may find that after a large number of iterations the mean combat length is only 430 seconds rather than our target of 450 seconds.

So Simulationcraft incorporates that information by performing a moving average on boss health as we go. If iteration #2 was significantly shorter, it will add a little health to the boss for the next one. It basically makes an educated guess at how much more health it would take to bring the average back up to 450 seconds. It does that for each iteration, though with some amount of decay built-in to keep things from oscillating out of control. The technique is very good at homing in on an average of 450 seconds of combat after many iterations. This is called the “enemy health estimation model,” and it’s what SimC uses by default.

Unfortunately, it also means that each iteration is slightly dependent on the previous ones. If iterations one through 50 were a little short, then iteration 51 gets a little longer. Again, it’s not clear that this is a strong enough effect to matter, but we just weren’t sure, and it’s a pretty obvious place to check if you’re worried that dependence between iterations is a problem.

There are two ways we can reduce the impact of health recalculation in SimC. The first is to use a time-based model with the command-line option “fixed_time=1″, which tells the sim to run for exactly 450 seconds, period. It will still perform the boss health recalculation from iteration to iteration, but since we’re stopping the sim based on time, that won’t cause excessively long or short runs. This option also respects the user’s choice of the vary_combat_length option, and adjusts the time accordingly unless it’s disabled.

The second way is to use the Fixed Enemy Health model by setting “override.target_health=X”. This forces the boss to have exactly X health every iteration, and the sim ends when the boss runs out of health.  So it automatically disables combat length variation and the health recalculation effect. This is the pinnacle of having independent trials, because it removes any possible dependence on previous runs.

So I ran three more configurations: One with fixed_time=1 and vary_combat_length left at the default of 0.2, one with fixed_time=1 and vary_combat_length=0, and one with target_health=171000000 (roughly appropriate for a 450-second run at ~400k sustained DPS).

Did I mention that each of these takes 5-8 hours?

Days later, here’s what I got out of the experiments:

Error analysis of Simulationcraft results with “fixed_time=1″. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Error analysis of Simulationcraft results with "vary_combat_length" disabled. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Error analysis of Simulationcraft results with “fixed_time=1″ and “vary_combat_length” disabled. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

    Error analysis of Simulationcraft results with "override.target_health=171000000". The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Error analysis of Simulationcraft results with “override.target_health=171000000″. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Now we’re getting somewhere. It seems from this data that the fixed_time setting didn’t change anything, but fixing the target health did. The fixed health simulation gives us results in excellent agreement with the theoretical results. So we really are looking for a violation of the Central Limit Theorem, at least somewhere.

But where? Was it in the health recalculation? Or the combat length variation? Or something else entirely that I overlooked?

Class Warfare

Around this time, one of the other SimC devs asked me if I had tested this with other specs or classes. The thought being that maybe it was an issue specific to paladins. And of course, I hadn’t yet, because each experiment takes five to eight hours to run, and I was in the middle of the last of the three runs above. But it was definitely on my to-do list to run a few other specs as a control group.

So I queued up a few more of these experiments for other specs. For example, using the  T16H retribution paladin profile:


Error analysis of Simulationcraft results. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Note that this is with default settings – the same settings that cause the error anomaly with protection. I ran a few more experiments to test enhancement shamans and protection warriors, with similar results. All of the other classes seemed to be obeying the CLT, even with combat length variation and health recalculation active. And even retribution seemed to be working properly under those conditions. It’s as if the problem was specific to protection paladins!

Which really meant that I thought the problem was something I did in the paladin module – i.e. it was my fault. So of course, I immediately went to digging through the paladin module looking for anything that would link one iteration to the next. Maybe I wasn’t re-initializing everything properly, so the state at the end of one iteration somehow was influencing the next? But after a few hours of combing through the code, I came up empty-handed. Nothing seemed to be persisting between iterations.

So I started debugging, literally running a few simulations and checking the state of the paladin at different break points in the process of the simulation. Combing through all of the relevant properties of the paladin object in Visual Studio, searching in vain for something – anything – that wasn’t being reset properly. And while I didn’t find anything, it did cause me to stumble over the answer in the dark almost by accident.

Fix Me Up, Before You Go Go

What I stumbled across was the fixed_time flag. I was running the T16H protection paladin profile through the simulation with completely default settings, and at one of my breakpoints I happened to notice that the fixed_time flag was active. Needless to say, this was… odd. It shouldn’t be on in a default simulation. Unable to figure out why it was on, I consulted the other devs, and was pointed to an old piece of code that had been hiding in the shadows:

if ( p -> primary_role() != ROLE_HEAL && p -> primary_role() != ROLE_TANK && ! p -> is_pet() ) zero_dds = false

If you’re not fluent in C++, that’s checking to see that the actor’s role is not “healer” or “tank”, and also that the actor is not a pet. And if the actor is none of those things, it sets a flag to false. Later on, that flag is used to forcibly enable “fixed_time=1″ if the flag is true. So in other words, the sim automatically shifts into fixed-time mode if you’re simming a healer or a tank!

Now, at the time it was written, this code makes sense. Keep in mind that Simulationcraft started out primarily as a DPS spec simulator. While it has the guts to support healers and tanks, it wasn’t until fairly recently that either of those roles were really supported well.  Arguably, healers still aren’t, for a variety of reasons, and a lot of the reason that it’s been improving for tanks is because I got involved and started implementing stuff that we wanted to see.

That’s not meant as a shot at the existing SimC devs either, by the way. These folks work incredibly hard to improve and maintain the project, but it’s a hobby for all of us, and there’s more than enough work to be done keeping it running properly for DPS specs. Getting solid support for, say, tanking pretty much requires a dev who has the interest and time to spend implementing tanking stuff, not to mention other devs who are willing to maintain the tanking part of each class module. And that didn’t really happen until I got involved and gave tanks a reason to care about the results (being able to calculate TMI, and correcting a bunch of minor errors in combat, mitigation, and Vengeance calculations).

It’s also why I suspect healers won’t be well-supported until a serious healing theorycrafter decides to say, “here’s what we need the sim to do in order to be useful to us,” and then wade in and make those changes.

But back to the point, if you’re simming a healer, you’re not putting out any DPS. It makes very little sense to base the simulation time on boss health in that scenario, so you’d clearly want to default to a fixed-time model for a healer. That line of code was basically just a catch-all to say, “only use the boss health estimation model for DPS classes/specs.” The fact that it enabled it for tanks was mostly an afterthought, because nobody was using Simulationcraft to simulate tanking at that point.

In any event, this was a giant clue that the problem had to do with the fixed_time option, so we dug into that in more detail. What I learned, mostly from discussion with the other devs, is that fixed-time mode did a bunch of things it really shouldn’t. The root of the problem was that it was still basing the boss’s health percentage on the health recalculation algorithm in this mode. That poses two major problems:

  • The boss still “died” when it reached 0% health, which meant that you could end the simulation earlier than your target time if you happened to be lucky on that iteration (i.e. had above-average DPS).
  • If you had an iteration of below-average DPS, the simulation would hard-stop at the target time. So if you were supposed to run for 450 seconds, and the boss wasn’t dead yet, tough – the simulation just ends.

That seems perfectly logical, but it causes some major CLT violations. Hard-stopping the simulation at 450 seconds is essentially throwing a Heaviside function (or “step function”) into the mix. It’s saying, “we don’t care what happens after this point, and we’re going to ignore it.” But natural variations in DPS output should cause some iterations to be shorter than 450 seconds and other iterations to be longer. The hard-stop only applies to the longer runs, which means we’re artificially affecting some of our iterations but not others.

To see why this is a problem, consider the following two scenarios:

  • An iteration where you had exactly average DPS, and the boss dies exactly at 450 seconds. You enter execute range ~370 seconds into the fight, so you spend about 80 seconds in execute range. Note that this is a little less than 20% of the time, because I’m assuming your DPS goes up in execute range.
  • An iteration where you had bad luck and produced below-average DPS. You don’t enter execute range until ~400 seconds into the fight as a result, so you only get 50 seconds of execute range. The simulation forcibly ends combat at 450 seconds with the boss still having 5%-10% health remaining.

The second scenario should produce even lower DPS than expected, because not only did you have bad luck during the initial part of the iteration, but you were robbed of 30 seconds (or more) of higher-DPS execute time. Statistically, that means is that we’re changing the underlying probability distribution, because the relative time spent in execute range is changing significantly from iteration to iteration.

And that violates one of our CLT conditions – each iteration needs to be identically distributed if we want to be able to use the CLT. If we spend 10% of our time in execute range on one iteration, but 20% on another iteration, and 15% on a third iteration, that condition isn’t being adhered to anymore, and we can’t expect our error to conform to the predictions of the CLT.

Ti-i-i-ime Is On My Side

The correction, which was made in this commit, was to fix the way we calculate health percentage. Instead of using boss health in fixed_time mode, we now ignore boss health entirely and use time to estimate boss health. For example, if you’re running a 450-second simulation and you’re 270 seconds into it, the health_percentage() function just returns the percentage of time left in the simulation: $100\%\times(1-270/450)=40\%$. This fixes both of the problems above: we’re no longer chopping off low-DPS runs and skewing our distribution, and the boss can’t die early on high-DPS runs because the sim calls health_percentage() to determine if the boss is dead yet. And if we rebuild the simulation after that commit and run the T16H protection paladin profile through it, we get this:


Error analysis of Simulationcraft results with default (forced fixed_time=1) settings after fixing the behavior of health_percentage(). The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Excellent. We’re now getting proper agreement with the CLT estimate even out as far as 105 iterations. And we can expect that trend to continue as iteration numbers increase because we’re not violating any of the CLT conditions anymore.

In a later commit, the line of code quoted above was changed to remove the tank role check as well. In other words, we’re no longer running in fixed-time mode all the time, which is fine because we produce stable enough DPS that the health recalculation algorithm should work properly. While that doesn’t have a significant impact on the results, it’s nice to know that we use the same defaults as most other specs (excepting healers, of course).

If you were attentive, you may have noticed that I tested protection warriors and found that they weren’t exhibiting the same error behavior. Now that you know what the problem was, you may ask, “why not?” After all, they’re tanks, so they were also being forced into a fixed-time mode when being simmed. So what gives?

If you guessed “they don’t have an execute range,” pat yourself on the back. Oh sure, warriors have Execute – the entire “execute range” term is named for it, after all. But if you take a quick look at the T16H protection warrior profile, you’ll notice that it isn’t being used. Which makes sense, because a tank that’s actually in danger would rather use that rage on Shield Barrier for more survivability. Since the T16H protection warrior profile doesn’t change the player’s behavior during execute range, it’s irrelevant how much time they spend there, because their DPS doesn’t change when the boss drops below 20%. So the types of variations that caused error bloat for the protection paladin profile simply don’t exist in the protection warrior profile.


If you’re thinking to yourself, “Man I really don’t want to read 4600 words, could you get to the point already,” this section is for you. In short, here’s what happened:

  • The simulation was forcing tanks into a “fixed-time” mode, where the sim runs for X seconds and stops if it reaches that time regardless of boss health.
  • As a result, the relative amount of time spent in execute range could change significantly from iteration to iteration based on your DPS, changing the underlying probability distribution.
  • Changing the underlying probability distribution violates the Central Limit Theorem, and makes Simulationcraft’s reported error estimate inaccurate, far lower than the actual error.
  • We fixed it by (a) changing the way we calculate the boss’ health percentage in fixed-time mode, and (b) not forcing tanks into fixed-time mode in the first place.

For anyone who didn’t skip to the end, I hope this was an enjoyable read, and less technically grueling than Part I was. It was fun (if time-consuming) to write, if only because I get to mix in concepts that I use frequently in a professional (experimental physics) context, like error analysis and experimental design, with theorycrafting and simulation.

I think many people don’t realize how intertwined the two are in practice. I’m sure a lot of theorycrafters, especially newer ones or ones without a strong science background, ignore error entirely. It’s a lot easier to just look at things like mean DPS or HPS, damage per resource spent, or similar metrics. But especially when it comes to simulation, it’s important to know how good your estimates are and whether you can trust them.

Part of my goal in this pair of posts was to provide a good example of how one goes about doing that, and why. Together, they’re a good introduction to how to properly perform error analysis on results and what to look for when you find results that don’t meet your expectations. Hopefully, at least a few theorycrafters come out of reading these posts feeling like they’ve added a new tool to their skill set.

And more generally, that non-theorycrafters leave with a sense of what it means to talk about statistical (i.e. random) error. I’ll consider it a success if a few people walk away from this set of posts saying, “You know, I never understood how this works before, but now I get it.”

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , | 15 Comments

A Comedy of Error – Part I

In the 5.4.2 Rotation Analysis post, I mentioned that I was looking into some odd behavior in the SimC error statistics:

I’m actually doing a little statistical analysis on SimC results right now to investigate some deviations from this prediction, but that’s enough material for another blog post, so I won’t go into more detail yet. What it means for us, though, is that in practice I’ve found that when you run the sim for a large number of iterations (i.e. 50k or more) the reported confidence interval tends to be a little narrower than the observed confidence interval you get by calculating it from the data.So for example, at 250k iterations we regularly get a DPS Error of approximately 40. In theory that means we feel pretty confident that the DPS we found is within +/-40 of the true value. In practice, it might be closer to +/- 100 or so.

Over the past two weeks, I’ve been running a bunch of experiments to try to track down and correct the source of this effect. The good news is that with the help of two other SimC devs, we’ve fixed it, and future rotation analysis posts will be much more accurate as a result.

But before we discuss the solution, we have to identify the problem. And to do that, we need a little bit of statistics. I find that most people’s understanding of statistical error is, humorously enough, rather erroneous. So in the interest of improving the level of discourse, let’s take a few minute and talk about exactly what it means to measure or report “error.”

Disclaimer: While I’m 99.9% sure everything in this post is accurate, keep in mind that I am not a statistician. I just play one on the internet to do math about video games (and in real life to analyze experimental results). If I’ve made an error or misspoken, please point it out in the comments!

Lies, Damn Lies, and Statistics

Let’s start out with a thought experiment. If we’re given a pair of standard 6-sided dice, what’s the probability of rolling a seven?

There’s a number of ways to solve this problem, but the simplest is probably to do some basic math. Each die has 6 sides, so there are 6 x 6 = 36 possible combinations. Out of those combinations, how many give us a sum of seven? Well, there are three ways to do that with the numbers one through six: 1+6, 2+5, and 3+4. However, we have two dice, so either one could contribute the “1” in 1+6. If we decide on a convention of reporting the rolls in the format (die #1)+(die #2), then we could also have 4+3, 5+2, and 6+1. So that’s six total ways to roll a seven with a pair of dice, out of thirty-six possible combinations; our probability of rolling a seven is 6/36=1/6=0.1667, or 16.67%.

We could ask this same question for any other possible outcome, like 2, 5, 9, or 11. If we did that for every possible outcome (anything from 2 to 12), and then plotted the results, it would look like this:

The probability distribution that describes the results of rolling two six-sided dies.

The probability distribution that describes the results of rolling two six-sided dies.

This gives a visual interpretation of the numbers. It’s clear from the plot that an 8 is less likely than a 7 (as it turns out, there are only five ways to roll an 8) and that rolling a 9 is even less likely (four ways) and that rolling a 2 or 12 is the least likely (one way each). What we have here is the probability distribution of the experiment. It tells us that on any given roll of the dice there’s a ~2.78% chance of rolling a 2 or 12, a 5.56% chance of rolling a 3 or 11, and so on.

Now let’s talk about two terms you’ve probably heard before: mean and standard deviation. These terms show up a lot in the discussion of error, so making sure we have a clear definition of them is a good foundation on which to build the discussion. The mean and the standard deviation describe a probability distribution, but provide slightly different information about that distribution.

The mean tells us about the center of the distribution. You’re probably more familiar with it by another name: the average.  Though both of those names are a bit ambiguous. “Average” can refer to several different metrics, though it’s most commonly used to refer to the arithmetic mean. “Mean” is used slightly differently in different areas of math, but when we’re talking about statistics it’s used synonymously with the term “expected value.” The Greek letter $\mu$ is commonly used to represent the mean. If you want the mathy details, it’s calculated this way:

$$ \mu = \sum_k x_k P(x_k)$$

where $x_k$ is the outcome (i.e. “5”) and $P(x_k)$ is the probability of that outcome (i.e. “11.11%” or 0.1111). For our purposes, though, it’s enough to know that the mean tries to measure the middle of a distribution. If the data is perfectly symmetric (like ours is), it tells you what value is in the center. In the case of our dice, the mean is seven, which is what we’d expect the average to be if we made many rolls.

The standard deviation (usually represented by $\sigma$), on the other hand, describes the spread or width of the distribution. Its definition is a little more complicated than the mean:

$$ \sigma = \sqrt{\sum_k P(x_k) (x_k-\mu)^2} $$

But again, for our purposes it’s enough to know that it’s a measurement of how wide the distribution is, or how much it deviates from the mean. A distribution with a larger $\sigma$ is wider than a distribution with a smaller $\sigma$, which means that any given roll could be farther away from the mean. For our distribution, the standard deviation is 2.45.

The thing I want you to note is that neither of these terms tell us anything about error. We aren’t surprised if we roll the dice and get a 10 or 12 instead of a 7. We don’t return them to the manufacturer as defective. The mean and standard deviation tell us a little bit about the range of results we can get when we roll two dice. To talk about error, we need to start looking at actual results of dice rolls, not just the theoretical probability distribution for two dice.

Things Start Getting Dicey

Okay, so let’s pretend we have two dice, and we roll them 100 times. We keep track of the result each time, and plot them on a histogram like so:

Histogram representing the outcome of 100 rolls of two six-sided dies.

The outcome of 100 rolls of two six-sided dies.

Now, this doesn’t look quite the same as our expected distribution. For one thing, it’s definitely not symmetric – there were more high rolls than low rolls. We could express that by calculating the sample mean $\mu_{\rm sample}$, which is the mean of a particular set of data (a “sample”). By calling this the sample mean, we can keep straight whether we’re talking about the mean of the sample or about the mean of entire probability distribution (often called “population mean”). The sample mean of this data set is 7.40, as shown in the upper right hand corner of the plot, which is higher than our expected value of 7.00 by a fair amount.

We can also calculate a sample standard deviation $\sigma_{\rm sample}$ for the data, which again is just the standard deviation of our data set. The sample standard deviation for this run is 2.52, which is a bit higher than the expected 2.45 because the distribution is “broader.” Note that the maximum extent isn’t any wider – we don’t have any rolls above 12 or below 2 – but because the distribution is a little “flatter” than usual, with more results than expected in some of the extremes and fewer in the middle, the sample standard deviation goes up a little.

But note that, by themselves, neither $\mu_{\rm sample}$ nor $\sigma_{\rm sample}$ tell us about the error! They’re still just describing the probability distribution that the data in the sample represents. At best, we might be able to compare our results to the theoretical $\mu$ and $\sigma$ we found for the ideal case to identify how our results differ. But it’s not at all clear that this tells us anything about error. Why?

Because maybe these dice aren’t ideal. Maybe they differ in some way from our model. For example, maybe you’ve heard the term “weighted dice” before? What if one of them is heavier on one side? That might cause it to roll e.g. 6 more often than 1, and give us a slightly different distribution. You could call that an “error” in the manufacturing of the dice, perhaps, but that’s not what we generally mean when we talk about statistical error.

So perhaps it’s time we seriously considered what “error” means. After all, it’s hard to identify an “error” if we haven’t clearly defined what “error” is. Let’s say that we perform an experiment – we make our 100 die rolls and keep track of the results, and generate a figure like the one above. And in addition, let’s say we’re primarily interested in the mean of this distribution; we want to know what the average result of rolling these particular two dice will be. We know that if they were ideal dice, it should be seven. But when we ran our experiment, we got a mean of 7.40.

What we really want to know is the answer to the question, “how accurate is that result of 7.40?” Do we trust it so much that we’re sure these dice are non-standard in some way? Or was it just a fluke accident. Remember, there’s absolutely no reason we couldn’t roll 100 twelves in a row, because each dice roll is independent of the last, and it’s a random process. It’s just really unlikely. So how do we know this value we came up with isn’t just bad luck?

So let’s say the “error” in the sample mean is a measure of accuracy. In other words, we want to be able to say that we’re pretty confident that the “true” value of the population mean $\mu$ happens to fall within the interval $\mu_{\rm sample}-E < \mu < \mu_{\rm sample} + E$, where $E$ is our measure of error. We could call that range our confidence interval, because we feel pretty confident that the actual mean $\mu$ of the distribution for our dice happens to be in that interval. We’ll talk about exactly how confident we are a little bit later.

It should be clear now why comparing our distribution to the “ideal” distribution doesn’t tell us anything about how reliable our results are. We might know that the sample mean differs from the ideal, but we don’t know why. It could be that our dice are defective, but it could also just be a random fluctuation. But since nothing we’ve discussed so far tells us how accurate our measured sample mean is, we don’t know for sure. To get that, we need to figure out how to represent $E$, the number that sets the bounds on our confidence interval.

It’s a common misconception that $E$ should just be the sample standard deviation $\sigma_{\rm sample}$. You may have seen results presented like $\mu \pm \sigma$, or $7.40 \pm 2.52$, to suggest an interval of confidence. That is, generally speaking, not correct. Or at least, very misleading. Because that’s not what the standard deviation means.

What we really want here is something called the standard error, though it’s also commonly called the standard error of the mean.  It’s also sometimes (mistakenly or carelessly) called the “standard deviation of the mean,” but we’ll clarify the difference in a second. I like the term “standard error of the mean,” because it makes it clear that this is a measurement of accuracy of the sample mean. As you might guess, it’s closely related to the sample standard deviation, but not quite the same. It’s calculated by dividing the sample standard deviation by the number of individual “trials,” or dice rolls, $N$:

$${\rm SE_{\mu}} = \frac{\sigma_{\rm sample}}{\sqrt{N}}.$$

This, at long last, is a good measurement of error. It’s worth noting that the standard deviation of the mean is defined similarly, but uses the true standard deviation of the distribution:

$${\rm SD_{\mu}} = \frac{\sigma}{\sqrt{N}}.$$

The reason the two are often used interchangeably is that we generally don’t know what the actual distribution looks like, nor do we know the expected values of $\mu$ and $\sigma$. Sometimes we do, of course; if we have a theory describing the process we’re measuring, then we can often calculate the theoretical values of $\mu$ and $\sigma$. But we don’t always know if our experiment matches the theory as well as we’d like – for example, if one of the dice is weighted and rolls more sixes than ones.

And sometimes, we don’t have a well-described theory at all, we just have a pile of data. This is the case for most Simulationcraft data runs, because we don’t have an easy analytical function that accurately describes your DPS due to any number of factors: procs, avoidance, movement, and so on. In that sort of situation, we can never truly know $\sigma$, so the lines between ${\rm SE}_{\mu}$ and ${\rm SD}_{\mu}$ blur a little bit, and we tend to get sloppy with terminology.

Double Standards

Now, we’ve thrown around a lot of terms that have “standard deviation” in them. It’s no wonder the layperson is easily confused by statistics. So it’s worth spending a moment to make the differences between these terms abundantly clear. Let’s reiterate quickly why we use standard error to describe the accuracy of the sample mean rather than just using $\sigma$ or $\sigma_{\rm sample}$.

We have a theoretical probability distribution describing the result of rolling two 6-sided dice. Here’s what each of the terms we’ve discussed so far tells us:

  • The mean (or “population mean”) $\mu$ tells us the average value of a single roll.
  • The standard deviation $\sigma$ tells us about the fluctuations of any single dice roll. In other words, if we make a single roll, $\sigma$ tells us how much variation we can expect from the mean. When we make a single roll, we’re not surprised if the result is $\sigma$ or $2\sigma$ away from the mean (ex: a roll of 9 or 11). The more $\sigma$s a roll is away from the mean, the less likely it is, and the more surprised we are. Our distribution here is finite, in that we can never roll less than two or more than 12, but in the general case a probability distribution could have non-zero probabilities farther out in the wings, such that talking about $4\sigma$ or $5\sigma$ is relevant.
  • The sample mean $\mu_{\rm sample}$ tells us the average value of a particular sample of rolls. In other words, we roll the dice 100 times and calculate the sample mean. This is an estimate of the population mean.
  • The sample standard deviation $\sigma_{\rm sample}$ tells us about the fluctuations of our particular sample of rolls. If we roll the dice 100 times, we can calculate the sample standard deviation by looking at the spread of the results. Again, this is an estimate of the population’s standard deviation, and it tells us how much variation we should expect from a single dice roll.
  • The standard deviation of the mean $SD_{\mu}$ tells us about the fluctuations of the mean of an arbitrary sample. In other words, if we proposed an experiment where we rolled the dice 100 times, we would go into that experiment expecting to get a sample mean that’s pretty close to (but not exactly) $\mu$. $SD_{\mu}$ tells us how close we’d expect to be. For example, under normal conditions we’d expect to get a result for $\mu_{\rm sample}$ that is between $\mu-2{\rm SD}_{\mu}$ and $\mu+2{\rm SD}_{\mu}$ about 95% of the time, and between $\mu-2.5{\rm SD}_{\mu}$ and $\mu+2.5{\rm SD}_{\mu}$ about 99% of the time.
  • The standard error of the mean $SE_{\mu}$ tells us about the fluctuations of the mean of our particular sample of rolls. Once we actually make those 100 rolls, and calculate the sample mean and sample standard deviation, we can state that we’re 95% confident that the “true” population mean $\mu$ is between $\mu_{\rm sample}-2{\rm SE}_{\mu}$ and $\mu_{\rm sample}+2{\rm SE}_{\mu}$, and 99% confident that it’s between $\mu_{\rm sample}-2.5{\rm SE}_{\mu}$ and $\mu_{\rm sample}+2.5{\rm SE}_{\mu}$

You can see why this gets confusing. But the key is that the standard deviation and sample standard deviation are telling you about single rolls. If you roll the dice once, you expect to get a value between $\mu+2\sigma$ and $\mu-2\sigma$ about 95% of the time.

Whereas the standard deviation of the mean and standard error tell us about groups of rolls. If we make 100 rolls the sample mean should be a much better estimate of the population mean than if we made only a handful of rolls. And if we make 1000 rolls, we should get a better estimate than if we only made 100 rolls.

So we use the standard deviation of the mean to answer the question, “if we made 100 rolls, how close do we expect $\mu_{\rm sample}$ (our sample mean) to be to $\mu$ (the population mean)?” And we use the standard error to answer the related (but different!) question, “now that I’ve made 100 rolls, how accurately do I think my calculated $\mu_{\rm sample}$ (sample mean) approximates $\mu$ (the population mean)?”

You might wonder what voodoo tricks I played to get these “95%” and “99%” values. These come from analysis of the normal distribution, which is a probability distribution that comes up frequently in statistics. If your probability distribution is normal, then about 68% of the data will fall within one standard deviation in either direction. Put another way, the region from $\mu-\sigma$ to $\mu+\sigma$ contains 68% of the data. Likewise, the region from $\mu-2\sigma$ to $\mu+2\sigma$ contains about 95% of the data, and over 99.7% of the data will fall between $\mu-3\sigma$ to $\mu+3\sigma$.

Our probability distribution isn’t a normal distribution. First of all, it’s truncated on either side, while the normal distribution goes on infinitely in either direction (we’ll never be able to roll a one or 13 or 152 with our two dice). Second, it’s a little too discrete to be a good normal distribution – there isn’t quite enough granularity between 2 and 12 to flesh the distribution out sufficiently. It’s really more of a triangle than a nice Gaussian, though it’s not an awful approximation given the constraints. Luckily, none of that matters! As it turns out, the reason our distribution looks vaguely normal is closely related to the reason that we use the normal distribution to determine confidence intervals.

Limit Break

The Central Limit Theorem is the piece that completes our little puzzle. Quoth the Wikipedia,

the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.

That’s a bit technical, so let’s break that down and make it a bit clearer with an example. We start with a dice roll (a “random variable”) that has some probability distribution that doesn’t change from roll to roll (“a well-defined expected value and well-defined variance”) and each roll doesn’t depend on any of the previous ones (“independent”). Now we roll those dice 10 times and calculate the sample mean. And then roll another 10 times and calculate the sample mean. And then do it again. And again, and again, and… you get the idea (“a sufficiently large number of iterates”). If we do that, and plot the probability distribution of those sample means, we’ll get a normal distribution centered on the population mean $\mu$.

The beautiful part of this is that it doesn’t matter what the probability distribution you started with looks like. It could be our triangular dice roll distribution or a “top-hat” (uniform) distribution or some other weird shape. Because we’re not interested in that; we’re interested in the sample means of a bunch of different samples of that distribution. And those are normally distributed about the mean, as long as the CLT applies. Which means that when we find a sample mean, we can use the normal distribution to estimate the error, regardless of what probability distribution that the individual rolls obey.

Now, there are two major caveats here that cause the CLT to break down if they aren’t obeyed:

  • The random variables (rolls) need to be independent. In other words, the CLT will not necessarily be true if the result of the next roll depends on any of the previous rolls. Usually this is the case (and it is in our example), but not always. There are two wow-related examples I can think of off the top of my head.

    Quest items that drop from mobs aren’t truly random, at least post-BC (and possibly post-Vanilla). Most quest mobs have a progressively increasing chance to drop quest items, such that the more of them you kill, the higher the chance of an item dropping. This prevents the dreaded “OMG I’ve killed 8000 motherf@$#ing boars and they haven’t dropped a single tusk” effect (yes, that’s the technical term for it).

    Similarly, bonus rolls have a system where every failed bonus roll will cause a slight increase in the chance of success with your next bonus roll against that boss. So this would be another example where the CLT won’t apply, because the rolls aren’t truly independent.

  • The random variables need to be identically distributed. In other words, the probability distribution can’t be changing in-between rolls. If we swapped one of our 6-sided dice out for an 8-sided or 10-sided die, all of the sudden our probability distribution would change and there would be no guarantee that the CLT would apply.

    You might ask if you could cite either of the two examples of dependence here as examples of non-identical distributions. After all, in each case the probability distribution is changing between rolls. However, that change is due to dependence on previous effects – in a sense, the definition of dependence is “changing the probability distribution between rolls based on prior outcomes.” So dependence is a more specific subset of this category.

If either of those things occur, then we can’t be sure that the CLT is valid for our situation. Luckily, none of that applies to our dice-rolling example, so we can properly apply the CLT to estimate the error in our set of 100 rolls.

Keep Rollin’ Rollin’ Rollin’ Rollin’

So now that we’ve talked a lot about deep probability theory, let’s actually do that. The standard error of our 100-roll sample is,

$$ {\rm SE}_{\mu} = \sigma_{\rm sample}/\sqrt{N} = 2.52/\sqrt{100} = 0.252 $$

To get our 95% confidence interval (CI), we’d want to look at values between $\mu_{\rm sample}-2{\rm SE}_{\mu}$ and $\mu_{\rm sample}+2{\rm SE}_{\mu}$, or $7.40 \pm 0.504$. And sure enough, the actual value of the population mean (7.00) falls within that confidence interval. Though note that it didn’t have to – there was still a 5% chance it wouldn’t!

We could improve the estimate by increasing the number of dice rolls. For example, what if we rolled 1000 dice instead? That might look something like this:

The result of 1000 rolls of two six-sided dice.

The outcome of 1000 rolls of two six-sided dice.

We see that our new sample mean is $\mu_{\rm sample}=6.95$ and our sample standard deviation is $\sigma_{\rm sample}=2.41$. But now $N=1000$, so our standard error is much smaller:

$$ {\rm SE}_{\mu} = \sigma_{\rm sample}/\sqrt{N} = 2.41/\sqrt{1000} = 0.0762$$

As before, we’re 95% confident that our sample mean is within $\pm 2{\rm SE} = 0.1524$ of the population mean in one direction or the other, and sure enough it is.

Of course, we could keep going. Here’s what 10000 rolls looks like:

The outcome of 10000 rolls of two six-sided dice.

The outcome of 10000 rolls of two six-sided dice.

And if we calculate our standard error for this distribution, we get:

$$ {\rm SE}_{\mu} = \sigma_{\rm sample}/\sqrt{N} = 2.43/\sqrt{10000} = 0.0243$$

So now we’re pretty sure that the value of 7.01 is correct to within $\pm 0.0486$, again with 95% confidence. Like before, there’s no guarantee that it will be – there’s still that 5% chance it falls outside that range. But we can solve that by increasing our confidence interval (say, looking at $\pm 3{\rm SE}_{\mu}$) or by repeating the experiment a few times and thinking about the results. If we repeat it 100 times, we’d expect about 95 of them to cluster within $\pm 2{\rm SE}_{\mu}$ of 7.00.

You may have noticed that while the confidence interval is shrinking, it’s not doing so as fast as it did going from 100 to 1000. That’s because we’re dividing by the square root of $N$, which means that to improve the standard error by a factor of $a$, we need to run $a^2$ times as many simulations. So if we want to increase our accuracy by a whole decimal place (a factor of 10), we need to make 100 times as many rolls. This is important stuff to know if you’re designing an experiment, because you don’t want your graduate thesis to rely on making five trillion dice rolls. Trust me.

You probably also noticed that the more rolls we make, the more the sample probability distribution resembles the ideal “triangular” case we arrived at theoretically. That’s to be expected – the more rolls we make, the better the sample approximates the real distribution. This is related to another law (the amusingly-named law of large numbers) that’s important for the CLT, but I don’t have time to go into that here. But it was worth mentioning just because “law of large numbers” is probably the best name for a mathematical law ever.

Finally, I mentioned that our “triangular” distribution for two dice looks vaguely normal, and that this relates to the CLT somehow. Here’s how. Each die is essentially its own random variable with a “flat” or “uniform” probability distribution (you have an equal chance to roll any number on the die). So when we take two of them and calculate the sum, we’re really performing two experiments and finding two sample means (with a sample size of 1 roll each). The sum of those two sample means, which is just twice the average of the sample means, is our result. This is exactly how we phrased our description of the CLT!

The reason we get a triangle rather than a nice Gaussian is that two dice is not “a sufficiently large number of iterates.” There is, unfortunately, no clean closed-form expression for this probability distribution for arbitrary numbers of $s$-sided dice (something called the binomial distribution works when $s$=2, i.e. for coin flips). But if we rolled 5 dice or 10 dice instead of two, and added all of those up, we’d start to get a distribution that looked very much like a normal distribution. And in fact, if you read either of the articles linked in this paragraph, you’ll see that they both become well-approximated by a normal distribution as you increase the number of experiments (die rolls).

World of Stat-craft?

Now that you’ve read through 4000 words on probability theory, you may ask where the damn World of Warcraft content is. The short answer: next blog post. But as a teaser, let’s consider a graph that shows up in your Simulationcraft output:

A DPS distribution generated by Simulationcraft.

When you simulate a character in SimC, you run some number of iterations. Each iteration gives you an average DPS result, which is essentially one result of a random variable. In other words, each iteration is comparable to a single roll of the dice in our example experiment. If we run a simulation for 1000 iterations, that gives us 1000 different data points, from which we can calculate a sample mean (367.7k in this case), a sample standard deviation, and a standard error value.

And all of the same statistics apply here. This plot gives us the “DPS distribution function,” which is equivalent to the triangular distribution in our experiment. The DPS distribution looks Gaussian/normal, but be aware that there’s no reason it has to be.  It generally will look close to normal just because each iteration is the results of a large number of “RNG rolls,” many of which are independent. But some of those RNG rolls are are not independent (for example, they may be contingent on the previous die roll succeeding and granting you a specific proc, like Grand Crusader). With certain character setups you can definitely generate DPS distributions that deviate significantly from a normal distribution (skewed heavily to one side, for example).

But again, because of the Central Limit Theorem, we don’t care that much what this DPS distribution function looks like. As long as each iteration is independent, we can use the normal distribution to estimate the accuracy of the sample mean. So we can calculate the standard error and report that as a way of telling the user how confident they should be in the average DPS value of 367.7k DPS.

At the very beginning of this post, I said I was looking into a strange deviation from the expected error. What I was finding that my observed errors were larger than what Simulationcraft was reporting. Next time, we’ll look a little more closely into how Simulationcraft reports error, and discuss the specifics of that effect – why it was happening, and how we fixed it.

Posted in Simcraft, Simulation, Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , | 6 Comments

5.4.2 Rotation Analysis

In December, I talked about the code I’ve written to automate the testing of Simcraft profiles. In that post, I tackled the two easiest simulations to write: glyphs and talents. In both of those cases, we’re just editing a single line of the .simc file, so it was a fairly simple job of tweaking that line and repeating. Of course, there was the entire superstructure of code surrounding that idea, which is what took far longer than the (relatively) simple logic required to swap out talents and glyphs.

Today I present the results of the other end of the spectrum – one of the most difficult sims to write. Because today we’re going to look at rotations.

If you haven’t read the previous post, I recommend you go back and do so now.  Or at least re-read the “Automating Simcraft” portion of it. I’ll refresh your memory about certain points, but I’m going to assume that you’re familiar with the basics of how this code operates. In short, if you don’t remember that we piece together a .simc file from discrete components (i.e. a player, a gear set, a rotation, a set of glyphs, a set of talents, etc.), then you should probably go re-read that section.

Note that I’ve taken to calling each of these components “blocks” in the rest of this post. That’s what I tend to call them in my head, and it’s faster than typing “component” over and over. Plus, I think it gives a nice visual – sort of like building the .simc file out of a bunch of different distinct Lego pieces.

Rotations Schmotations

You might ask what makes the rotation sim significantly harder than, say, a glyph sim. The short (and woefully incomplete) answer is that it involves changing more than one line of the .simc file we feed to the executable.

I say “woefully incomplete” because that statement encompasses a lot more than just swapping out a single component.  For example, in the glyph simulation, we kept the same player block, gear block, rotation block, and so on, and just swapped out the glyph block. We did that by pre-generating a glyph block for all of the different glyph combinations we were interested in and cycling through them.

On its face, it seems like that same logic couldn’t apply to the rotation simulation. We could just generate 100 different rotation blocks that describe the different rotations we’re interested in, and then swap them in and out one by one to get the results. Right?

Wrong. Oh, so wrong…

That might work fine for a really simple rotation simulation where we only consider combinations of basic abilities. For example, we limit ourselves to Crusader Strike, Judgment, Avenger’s Shield, Holy Wrath, Hammer of Wrath, and Consecration. That would be enough to figure out the basic gist of the rotation, for sure.

But it should be obvious that this list is missing a few important abilities. What if we want to include Sacred Shield, or one of our level 90 talents? All of those have to go into the rotation somewhere. And the sim won’t use them unless we’ve talented them. So, first of all, that means we need to swap the talent block out at the same time as the rotation block. And not just that, but we need some way to know which talent block to use when – it’s no good if we use a talent block with Light’s Hammer when we’re testing Execution Sentence rotations. That seems like an obvious and trivial problem to solve, but it’s still an extra moving part we need to consider in a sim that’s already going to be pretty complicated.

Because it’s not just talents we need to worry about, either. Let’s say we want to look at execute-range rotations in particular. We might want to know if Holy Wrath changes priority when Final Wrath is glyphed. But to do that, we need to enable that glyph, or else use it by default. But there may be cases where we don’t want it on, either. So we need to be able to swap glyphs too.

Further, we need to be able to specify conditionals in the action priority list (APL). So that, for example, we can compare




Now, of course, that’s not really a problem in theory, because we could just write each block by hand and take care of all of that. But we might have hundreds of rotations, and the risk of making a small, unnoticed but relevant error in one of them is pretty high when you’re talking about writing that many by hand. Also, if you really expected me to write hundreds of rotation files by hand, you’re kidding yourself.

We’ll still need a good shorthand for it for identifying rotations on tables anyway, and if you’re going to write a robust shorthand, then you may as well automatically generate the rotation blocks from that shorthand. That gives us the consistency we want (because there will never be an error in “HW” in one file that doesn’t exist everywhere else) and makes tables easy to read. But it adds another complication: now we need to write a translator that goes between shorthand and full SimC file, complete with all of the options and conditionals we might want to use.

You can already see why this snowballed into one of the more complicated sims to write. And it’s not even necessarily the hardest – the AoE one may be more annoying still depending on what exactly we want to calculate!

The Nitty Gritty Details

So, in short, this is how the simulation works. I’ve divided the rotations we care about up into groups (which, in a sad turn of events, I’ve called “blocks” in the code…. oops? I’ll be consistent about calling them “groups” here though).  Each group has a defined set of talents and glyphs, because for the most part those vary on a group level. So there’s a “Basic” group, an “Execute” group that focuses on Hammer of Wrath and Final Wrath, a “Defensive” group that’s primarily for testing Sacred Shield, and a “Level 90″ group that tests all the level 90 talents.

In addition, I have the ability to enable custom talents per rotation. So for example, within the Level 90 group, it will automatically check each rotation to see which level 90 talent it uses and tweak the talent block to enable that talent. It also does this for the Sacred Shield rotations in the Defensive group. I signify this by adding “+custom” to the end of the talent block, which is the flag the code looks for to decide whether it needs to perform this check.

In theory I could do the same thing with glyphs, I suppose, but I found that I didn’t really need to. It wouldn’t be difficult to modify the code to do that in the future if we decide it’s necessary.

The rest of the difficulty was coming up with the abbreviation scheme for abilities and their conditionals. Thinking ahead, I wanted this to be extendable to other classes, so I set it up such that each class can have its own definitions. For a paladin, CS will always mean Crusader Strike, but if we’re simming another class it could translate to something different.

The abilities were fairly easy, since I’ve been using a standard notation for them in the old MATLAB code for years. They are:

Ability Shorthands
Shorthand Ability
CS Crusader Strike
CSw Crusader Strike followed by a /wait (see below)
HotR Hammer of the Righteous
J Judgment
AS Avenger's Shield
HW Holy Wrath
HoW Hammer of Wrath
Cons Consecration
SS Sacred Shield
ES Execution Sentence
LH Light's Hammer
HPr Holy Prism
SotR Shield of the Righteous
EF Eternal Flame
WoG Word of Glory

In the earlier code, we used a bracketing technique for options, which was very powerful, but led to really long rotation names.This time around, I’m trying to keep the names fairly compact for display purposes, so I went with a slightly different method. Each option has a shorthand and gets appended to the ability shorthand with a plus sign (‘+’). The options I have enabled at this point are:

Conditional Shorthands
Shorthand Conditional
W# add a /wait after the ability if the cooldown is less than or equal to # seconds
GC# buff.grand_crusader.up or buff.grand_crusader.remains<#
DP buff.divine_purpose.react
DPHP# (buff.divine_purpose.react|holy_power>=#)
FW glyph.final_wrath.enabled&<=20
HP# holy_power>=#
nt !ticking
nF target.debuff.flying.down
SW talent.sanctified_wrath.enabled&buff.avenging_wrath.react
T# active_enemies>=#
R# buff.(ability_string).remains<#

So for example, AS+GC would translate into


Not all of these are in use in the data I’ll present today, but they’re all coded and potentially usable. I expect that we’ll add a bunch of action priority lists to the simulation after we’ve analyzed the results in this post. For example, it might be interesting to see if “Cons+nt” has any effect, but it wasn’t high on my list of priorities when I was putting this together so I didn’t include it.

There’s one special case I want to mention. The “wait” conditional works something like this: CS+W0.35 translates to:


As you might expect from the default APL for protection, this almost always nets an increase in holy power generation because it prevents us from doing silly things like CS-X-X-X-CS. That can otherwise happen in situations where one or more of the X’s were spells, so the GCD ends a little before CS becomes available. As a result, we’ll almost always want to follow CS with a wait. Since that comes up a lot, and I didn’t want to type CS+W0.35 all the time in the interest of keeping the rotation abbreviations short and readable, I’ve defined the shorthand “CSw” to implicitly mean “CS+W0.35″

As a final note, I want to mention that this simulation is limited to GCD-based abilities. In other words, I’m using the same precombat actions and the same finishers in each rotation. I’m basically bolting the rotations below together with the precombat actions and the following default finisher definitions:


This ensures that the changes we see are purely due to any change in holy power generation or dead time in the rotations themselves. And in any event, since our active mitigation is decoupled from the GCD, it’s not really part of our “rotation” in a strict sense. It’s stuff we use when necessary and available based on the resources, not based on whether they’re more or less important than e.g. CS.  We’ll analyze the finisher options specifically in a later sim in much the same way we do here for the rotation. Luckily, that sim will be a lot easier to write!

As usual, all of the code can be found in the matlabadin repository. This sim uses a lot of files, but the master one that controls it all is:

All of the results can be found in the /io/ directory, along with the results of the glyph and talent simulations. The sims are labeled appropriately with “>” replaced by “_”.
(Ex: rotation_paladin_protection_CSw_J_AS_HW_HoW_Cons.html)


We’ll go through each of the rotation groups one at a time, briefly discussing what makes them unique and why we’ve made the choices we have.They all use the default T16N profile gear set (which includes 4T16) and are pitted against the T16N25 TMI calibration boss. The default talents include Unbreakable Spirit, Eternal Flame, and Divine Purpose unless otherwise specified. Everything else should be provided in the details below.

I’ll note that for all of these simulations, I’ve set the number of iterations to 250k. Yes, that’s a lot, but it’s necessary to get the degree of accuracy we want.

The “DPS Error” that Simulationcraft reports is really the half-width of the 95% confidence interval (CI). In other words, it is 1.96 times the standard error of the mean. To put that another way, we feel that there’s a 95% chance that the actual mean DPS of a particular simulation is within +/- DPS_Error of the mean reported by that simulation. There are some caveats to this statement, insofar as it makes some reasonably good but not air-tight assumptions about the data, but it’s pretty good.

I’m actually doing a little statistical analysis on SimC results right now to investigate some deviations from this prediction, but that’s enough material for another blog post, so I won’t go into more detail yet. What it means for us, though, is that in practice I’ve found that when you run the sim for a large number of iterations (i.e. 50k or more) the reported confidence interval tends to be a little narrower than the observed confidence interval you get by calculating it from the data.

So for example, at 250k iterations we regularly get a DPS Error of approximately 40. In theory that means we feel pretty confident that the DPS we found is within +/-40 of the true value. In practice, it might be closer to +/- 100 or so.

Why does that matter for us? Well, we want to know if one rotation is better than another in a statistically significant sense. Based on the theoretical estimate, this means that as long as they’re farther apart than 80 DPS, we can trust that the higher-DPS rotation is better. In practice, I think we should expand that bound a bit, at least to 100 DPS, and probably to 200 DPS if we’re going to be generous and assume that there could be other sources of systematic error that we don’t know about. I’ve seen the same rotation sim up to 300 DPS differently from two separate runs, so I’m inclined to be a little more generous in my error estimate than SimC is.

And keep in mind that we’re looking at a mean value of almost 400k DPS in these sims. 400 DPS is a change of 0.1%, which is miniscule, and not likely to swing an encounter one way or another. Even if our sims are accurate to that level, that’s right around  the point where you prioritize mental bandwidth over DPS gain and choose the rotation that’s simpler to execute. So I’d probably be hesitant to ascribe any real significance to differences that are smaller than 1000 DPS, which is still less than a 1% change.

Basic Rotation Group

This group of rotations is focused on determining the order of operations for our basic abilities, excluding talents and execute range. From this, we determine our “ideal” base rotation, which we then go about tweaking in the other groups.

In this set, we use just two glyphs: Focused Shield and Word of Glory. We could have included Divine Protection, but we want to be able to compare the survivability results to those obtained in later groups which use all three glyph slots on DPS glyphs. Plus, there’s really not a lot to learn from glyphing Divine Protection here. It’s our only feasible survivability glyph and it’s so highly situational that there’s no guarantee we’re using it for a given boss.

In addition to the table, the sim spits out the maximum DPS Error measurement of the group (each rotation is fairly similar in that regard, so it didn’t make sense to include it on the table) and the talents and glyphs used:

Max DPS Error: 41
Talents: 312232
Glyphs: focused_shield/word_of_glory

Basic Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CS>J>AS>Cons>HW 373603 160013 160353 6212 2062 71.0% 14.5%
CS>J>AS>HW>Cons 379608 159521 159854 4287 971 71.4% 13.2%
CS+W0.3>J>AS>HW>Cons 373814 157738 158054 533 117 73.2% 12.7%
CSw>J>AS>Cons>HW 368204 157862 158182 460 47 73.1% 13.8%
CSw>J>AS>HW>Cons 373591 157666 157983 427 55 73.2% 12.7%
CSw>J>HW>AS>Cons 372798 157616 157932 410 77 73.3% 12.5%
CSw>HW>J>AS>Cons 359765 161552 161890 626 61 69.6% 15.6%
HW>CSw>J>AS>Cons 363466 161565 161905 806 122 69.6% 14.9%
CSw>AS>J>HW>Cons 373952 158483 158804 451 38 72.5% 12.8%
J>CSw>AS>HW>Cons 368396 162575 162942 90576 83031 68.5% 17.4%
J>AS>CSw>HW>Cons 372886 163157 163529 61525 35811 67.9% 17.0%
AS>J>CSw>HW>Cons 372490 163965 164342 174459 57861 67.2% 17.4%
AS>CSw>J>HW>Cons 378485 159759 160092 1971 365 71.2% 13.0%
HotR+W0.35>J>AS>HW>Cons 371877 159558 159894 6714 5015 71.4% 13.2%
AS+GC>CSw>J>AS>HW>Cons 374633 157958 158289 727 145 72.9% 12.7%
CSw>AS+GC>J>AS>HW>Cons 373734 157767 158086 409 52 73.1% 12.7%
CSw>AS+GC>J>HW>AS>Cons 373243 157700 158021 391 43 73.2% 12.5%
CSw>AS+GC>J>HW>Cons>AS 372838 158084 158405 429 79 72.8% 12.3%

Note that you can sort the table by a particular column by simply clicking on that column’s header. The “Var” column simply reports the measurement of “TMI Error,” which is really more of an uncertainty or variance measure due to the nature of the TMI distribution. Basically, treat that column as the +/- on the measured TMI value. The “Wait” column tells us how much time the sim spends waiting while the GCD is available, either because there’s nothing to cast or because we’re hitting the /wait action.

Before sorting, it’s clear that waiting for CS’s cooldown to come up is a significant survivability gain. The more subtle thing to notice is that it’s actually a slight DPS loss, mostly because CS hits like a limp noodle. There are a number of reasons for that, but the primary one is that CS’s damage increases far more slowly with attack power than the rest of our abilities do. So the higher Vengeance gets, the worse CS is compared to just about everything else we could cast.

A lot of the features here are expected. Dropping CSw below anything else in priority gives you a large survivability loss. It’s worth noting that the “CSw>AS+GC>J>*” rotations near the bottom produce some very low TMI results, but I’m still a bit skeptical of these. The SotR uptime isn’t any higher than the default (CSw>J>AS>HW>Cons), nor are the TMI values lower in a statistically significant sense.

If we sort by DPS, we see that the top rotation is actually the one where we don’t wait for CS’s cooldown, again because CS is such a weak ability at this point. But after that one, we have a bunch of rotations that emphasize AS in various ways. This can be summarized with a pretty simple rule of thumb: “if you don’t care about survivability and need max DPS right now, prioritize AS.”

There are a bunch of rotations where I push Holy Wrath up ahead of CS/J/AS. These aren’t interesting from a survivability point of view, because they uniformly increase our TMI. They also seem to uniformly reduce DPS compared to the standard CSw>J>AS>HW>Cons. We’ll have to revisit these in the execute range group where we have Final Wrath glyphed, which is where we might expect a high HW prioritization to bear fruit.

The HotR rotation I threw in has the same wait as CSw, so it’s directly comparable to a CSw rotation. This is really only relevant in cases where you want to know how much single-target damage you’re sacrificing to cleave to adds now that Weakened Blows is applied by both abilities. Nonetheless, we see it’s about a 1700 DPS loss to use HotR instead of CS. Not really a big deal in the grand scheme of things, we’re talking about less than a 1% difference. CS and HotR both hit so weakly it’s almost irrelevant which you use.

I also want to call attention to the TMI and Var columns again quickly. If you sort by either of these, you’ll see that as TMI goes up, so does the variance. This is one significant drawback of the current TMI formula – because it’s an exponential metric, the variance tends to be rather large when TMI is large. Increasing the number of iterations doesn’t end up helping it much, because it’s just not anything resembling a Gaussian distribution.

The two take-home messages I want to get across here are:

  • Unless two TMI values differ by more than the sum of their Var columns, it’s not 100% clear that they’re different in a statistically significant sense. So TMIs of 400 and 500 are roughly identical if their Vars are 100 or more, but you could safely say that a TMI of 400 is better than e.g. a TMI of 1000. We’re looking for order-of-magnitude effects in TMI, because that’s how the metric was constructed.
  • This will be fixed in TMI v2.0, which I’m working on currently. More on that soon, maybe next week if I have time to write.

Next, let’s look at the execute rotations.

Execute Rotation Group

In this case, we want to find out how we vary the basic CSw>J>AS>HW>Cons rotation in execute range. That means we need to know where to slot in Hammer of Wrath and what (if anything) to do about Holy Wrath when Final Wrath is glyphed.

Since we can already look at the table above to figure out what happens when Final Wrath isn’t glyphed, this group includes it by default along with Focused Shield and Word of Glory.

Max DPS Error: 41
Talents: 312232
Glyphs: focused_shield/word_of_glory/final_wrath

Execute Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CSw>J>AS>HW>Cons>HoW 383714 157727 158045 379 30 73.2% 11.2%
CSw>J>AS>HW>HoW>Cons 384536 157678 157999 436 111 73.2% 11.0%
CSw>J>AS>HoW>HW>Cons 383834 157566 157879 431 121 73.3% 10.9%
CSw>J>HoW>AS>HW>Cons 383380 157868 158196 529 135 73.0% 10.9%
CSw>HoW>J>AS>HW>Cons 383612 157968 158297 519 123 72.9% 10.9%
HoW>CSw>J>AS>HW>Cons 383963 158370 158738 2348 761 72.5% 11.0%
CSw>J>HW+FW>AS>HW>HoW>Cons 384673 157751 158072 397 41 73.2% 11.0%
CSw>J>AS+GC>HW+FW>AS>HW>HoW>Cons 384381 157701 158020 416 59 73.2% 11.0%
CSw>HW+FW>J>AS>HW>HoW>Cons 384846 158089 158426 458 65 72.8% 11.0%
HW+FW>CSw>J>AS>HW>HoW>Cons 385184 158096 158435 632 229 72.8% 11.0%

We can clearly see that Hammer of Wrath should slot in ahead of Consecration but behind Holy Wrath. TMI values vary somewhat depending on how far ahead of other abilities you put it, but note that HoW>J and J>HoW don’t differ much because both are 6-second cooldowns, so they don’t generally clash all that often.

However, if we push HoW ahead of CSw we get a significant TMI increase without realizing any sort of DPS gain compared to slotting it behind Holy Wrath. This is a little different than the results we got with the old MATLAB sims in 5.2, which suggested HoW was a DPS increase at the top of the priority queue. My guess is that the change is due to two factors: switching our L45 talent from SS to EF and losing Grand Crusader procs.

In 5.2, we had fewer empty GCDs because we’d be refreshing Sacred Shield every 30 seconds and using up more Grand Crusdaer procs, which ended up leaving less room for Hammer of Wrath and other fillers. Now, we have a larger number of empty GCDs to work with, so using Hammer of Wrath doesn’t necessarily push another filler back multiple cycles. And since we have those extra GCDs more regularly, it’s not worth pushing it ahead of the basic CS-J cycle; it’s just more efficient to slot it back in wherever it fits without delaying heavy-hitters like AS and Final-Wrath-Glyphed Holy Wrath (can we just call it “Final Wrath” in execute range?).

Speaking of Final Wrath, it looks like that does hit hard enough to be a DPS increase at the front of the queue, for relatively little cost in TMI. The CSw>J>AS+GC>HW+FW>AS>HW>HoW>Cons rotation is particularly interesting in that it gives you a small (~300) DPS boost without sacrificing any holy power generation. But at the same time that difference is right at (or below) our error threshold, so it’s not clear that’s a realizable gain. By the time we’re looking at 0.1% DPS increases, we’re splitting more hairs than we probably should.

So the conclusion here seems to be that the filler order ought to be HW>HoW>Cons, and during execute range you can prioritize “Final Wrath” as high as you want for a DPS gain, realizing that you’re sacrificing a little survivability if you use it instead of a holy power generator.

Next up: Defensive rotations.

Defensive Rotation Group

While I called this the “Defensive” category, it should really just be called the “Sacred Shield” category, since that’s the only defensive spell in here. And with EF being so strong in Siege of Orgrimmar, it’s also mostly irrelevant. But I’m including it for completeness, and to highlight how strong EF really works out to be.

One oversight here is that this group doesn’t take advantage of the T16 4-piece. The default finisher block has lines for SotR usage and Eternal Flame maintenance, but there’s nothing in there for Word of Glory. As a result, we expect to see a drop in SotR uptime corresponding to losing the 4-piece bonus, as well as an increase in TMI. In the future, I’ll be adding a line like this:


which should appropriately use WoG whenever we have 3 stacks of the 4T16 buff to fish for extra Divine Purpose procs. For now, just keep in mind that the results in this group aren’t strictly comparable to the ones with EF for survivability purposes. However, they should still be accurate for comparing the SS rotations against one another if you’re hell-bent on running Sacred Shield.

Max DPS Error: 41
Talents: 313232
Glyphs: focused_shield/word_of_glory/final_wrath

Sacred Shield Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CSw>J>AS>HW>HoW>Cons>SS 367667 111879 120831 230953 15250 68.8% 3.3%
CSw>J>AS>HW>HoW>SS+R1>Cons 370778 113784 122153 151427 5737 69.9% 8.9%
CSw>J>AS>HW>HoW>SS+R1>Cons>SS 367419 111758 118767 116705 3947 68.8% 3.3%
CSw>J>AS>HW>SS+R1>HoW>Cons>SS 367205 111699 118253 99323 3247 68.8% 3.3%
CSw>J>AS>SS+R1>HW>HoW>Cons 368861 113136 119949 100877 3655 70.0% 9.1%
CSw>J>AS>SS+R1>HW>HoW>Cons>SS 366914 111684 118101 98289 3051 68.8% 3.3%
CSw>J>AS+GC>SS+R1>AS>HW>HoW>Cons>SS 366800 111667 118048 98425 4289 68.8% 3.3%
CSw>J>AS>SS+R2>HW>HoW>Cons>SS 367064 111661 118041 95695 2790 68.8% 3.3%
CSw>J>AS>SS+R3>HW>HoW>Cons>SS 366863 111692 118097 98717 2810 68.8% 3.3%
CSw>J>AS>SS+R4>HW>HoW>Cons>SS 366868 111676 118065 96511 2942 68.8% 3.3%
CSw>J>AS>SS+R5>HW>HoW>Cons>SS 366886 111665 118051 96597 3131 68.8% 3.3%
CSw>J>AS+GC>SS+R1>AS>HW>HoW>Cons 368770 112856 119307 87275 2789 70.0% 9.1%
CSw>J>SS+R1>AS>HW>HoW>Cons 368547 112834 119190 87898 2947 70.0% 9.1%
CSw>SS+R1>J>AS>HW>HoW>Cons 368234 112800 119149 96260 5332 69.7% 9.2%
SS+R1>CSw>J>AS>HW>HoW>Cons 368466 112689 119040 95784 7286 69.7% 9.1%

First, notice that all of the TMI values on this table are in the 100k range, compared to ~400 when we use Eternal Flame. Some of that is the utter dominance of EF over SS at high AP/Vengeance, some of it is because the Shield of the Righteous uptime is lower by a few percent because we’re no longer leveraging the 4-piece bonus. Note that our SotR uptime is a little higher here than the ~64% range we saw in the 4T16 post; we’re averaging around 69% instead.

You might wonder why that is – after all, in that earlier post we said the 4T16 benefit is about 10% SotR uptime, and we’re not taking advantage of the 4-piece in this group of sims. However, when we talent Sacred Shield we also don’t have to maintain Eternal Flame, which means we can spend that holy power on SotR instead, making up about half of the difference. If we were fishing for extra DP procs with Word of Glory, SotR uptime should actually catch up to what we get with Eternal Flame.

In any event, there’s not a lot to say here. TMI obviously improves as we increase the priority of refreshing SS (“SS+R1″ means “refresh SS if it’s got less than 1 second left”), but there’s no advantage to putting it ahead of CS or J. I added the CSw>J>AS+GC>SS+R1>AS>HW>HoW>Cons option at the last minute on a hunch, as I suspected that would truly be the low-TMI option after looking at the rest of the results, and it paid off. I’m not entirely sure why this performs better than the identical rotation with an exra “>SS” tacked onto the end though. It’s clear that it’s causing some kind of holy power generation loss based on the SotR uptimes, but I don’t really see how.  Something to investigate for later, I guess.

I also want to draw attention to the fact that refreshing it at 2 seconds early seems to be the sweet spot. One second puts it off long enough that sometimes you get short gaps due to the GCD. Three seconds or longer tends to be no more effective than two seconds. I don’t know offhand why the SS+R3 version scored so poorly, but again, it could just be RNG given the Var column is nearly 3000.

That’s enough about Sacred Shield, let’s move on to the level 90 talents.

Talent Rotations Group

This is the fun group, where we make use of our “+custom” talent flag. Basically, we’re just swapping the L90 talent appropriately so that we have the ability the rotation calls for.

There are two things we’re checking for in this sim. First, what’s the “default” place to slot each talent into the rotation, ignoring what section of the encounter we’re in. Then, we want to try and fine-tune that by specifying execute rotations to see if there’s an advantage to increasing the priority during execute. We might care about that because once Hammer of Wrath becomes available, we don’t have that many empty GCDs to work with, so we could inadvertently ignore a L90 talent (or at least delay it for a long time) if we slot it behind Hammer of Wrath.

I’ve decided to split this group up into three tables for ease of filtering/sorting.

Max DPS Error: 41
Talents: 312232+custom
Glyphs: focused_shield/word_of_glory/final_wrath

Execution Sentence Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CSw>J>AS>HW>HoW>Cons>ES 404748 157849 158167 591 332 73.1% 9.6%
CSw>J>AS>HW>HoW>ES>Cons 406988 157828 158147 454 110 73.1% 9.7%
CSw>J>AS>HW>ES>HoW>Cons 407136 157867 158187 406 46 73.1% 9.7%
CSw>J>AS>ES>HW>HoW>Cons 406419 157757 158077 517 235 73.2% 9.9%
CSw>J>ES>AS>HW>HoW>Cons 406392 157809 158126 433 48 73.1% 9.9%
CSw>ES>J>AS>HW>HoW>Cons 405022 157857 158175 539 117 73.0% 10.0%
ES>CSw>J>AS>HW>HoW>Cons 405441 158113 158437 665 134 72.7% 10.0%
CSw>J>AS>ES+ex>HW>ES>HoW>Cons 407045 157824 158144 432 91 73.1% 9.7%
CSw>J>AS+GC>HW>AS>ES>HoW>Cons 405890 157756 158075 452 125 73.1% 9.7%
CSw>J>AS+GC>HW+FW>AS>HW>ES>HoW>Cons 406777 157838 158157 385 35 73.1% 9.8%

Light's Hammer Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CSw>J>AS>HW>HoW>Cons>LH 393512 157962 158280 308 22 73.1% 9.6%
CSw>J>AS>HW>HoW>LH>Cons 393944 158002 158310 329 51 73.1% 9.8%
CSw>J>AS>HW>LH>HoW>Cons 394156 158013 158320 327 80 73.1% 9.8%
CSw>J>AS>LH>HW>HoW>Cons 393962 157918 158221 316 38 73.2% 10.0%
CSw>J>LH>AS>HW>HoW>Cons 394201 158002 158305 324 59 73.1% 10.0%
CSw>LH>J>AS>HW>HoW>Cons 394363 158040 158344 326 50 73.0% 10.0%
LH>CSw>J>AS>HW>HoW>Cons 394678 158286 158595 421 89 72.8% 10.0%
CSw>J>AS>LH+ex>HW>LH>HoW>Cons 393949 158018 158323 279 19 73.1% 9.8%
CSw>J>AS+GC>HW>AS>LH>HoW>Cons 392669 157944 158250 314 34 73.1% 9.7%
CSw>J>AS+GC>HW+FW>AS>HW>LH>HoW>Cons 393873 158029 158335 299 33 73.0% 9.8%
Holy Prism Rotations
Rotation DPS HPS DTPS TMI Var SotR Wait
CSw>J>AS>HW>HoW>Cons>HPr 396882 158186 158505 339 27 72.9% 7.6%
CSw>J>AS>HW>HoW>HPr>Cons 396093 158345 158655 336 37 72.8% 7.8%
CSw>J>AS>HW>HPr>HoW>Cons 395777 158348 158657 363 63 72.7% 7.9%
CSw>J>AS>HPr>HW>HoW>Cons 394603 158170 158476 300 23 72.9% 8.0%
CSw>J>HPr>AS>HW>HoW>Cons 394422 158173 158479 383 65 72.9% 8.0%
CSw>HPr>J>AS>HW>HoW>Cons 393947 158426 158732 343 35 72.7% 8.1%
HPr>CSw>J>AS>HW>HoW>Cons 395775 159554 159877 521 74 71.6% 8.2%
CSw>J>AS>HPr+ex>HW>HPr>HoW>Cons 395674 158362 158669 369 73 72.8% 7.9%
CSw>J>AS+GC>HW>AS>HPr>HoW>Cons 395192 158255 158564 463 115 72.9% 7.7%
CSw>J>AS+GC>HW+FW>AS>HW>HPr>HoW>Cons 395805 158399 158707 388 57 72.7% 7.9%

First, it’s clear that Execution Sentence is our damage option, with Holy Prism trailing it slightly and Light’s Hammer coming in at a close third place.

Execution Sentence seems to be a toss-up with Hammer of Wrath initially, with them neck and neck at around 407k DPS. But the ES>HoW version is far enough ahead that I’m willing to believe it’s a little better, but again, we’re talking about differences that are right on the boundary of our error level. Still, level 90 talents are more fun than Hammer of Wrath, and when two rotations come this close in DPS that’s as good a criterion as any. This basically boils down to “ES>HoW” during execute range, since outside of execute the two rotations are identical. In other words, the ES rotation should be:


None of the tweaked versions that prioritize things differently in execute range give us a significant improvement over that rotation, so we can rule them out.

Curiously, with LH we would be lead to believe that prioritizing it above AS, J, or even CS is a DPS increase. That doesn’t make a lot of sense though: ES hits harder, yet those rotations didn’t exhibit this same behavior. At this point, I’m inclined to believe that something fishy is going on here. I’d call it an outlier, even though it’s several hundred DPS ahead of some of the other options, but it’s not just one rotation. All three of the rotations with LH near the top show the same effect. I’m not sure why that’s happening yet.

That said, if we ignore those three, perhaps on the grounds that it’s a HPG loss, then the same rotation that maximizes Execution Sentence is the best choice here as well. The CS>J>AS>HW>LH>HoW>Cons rotation is the strongest performer when we’re not putting LH above holy power generators. Though again, the difference between that rotation and LH>HW>HoW or HW>HoW>LH is so small that any of them would be fine.

Holy Prism is an odd duck. It seems to enjoy – no, relish even – hanging out in the last spot. Moving it anywhere higher in the queue is a loss of 800 DPS or more, a large enough gap that we can feel pretty certain it’s statistically significant. Even playing some execute-range tricks with it doesn’t help.

This is actually pretty easy to explain. Consider the following three charts of damage per execute time (DPET) for the rotations CSw>J>AS>HW>HoW>Cons>L90:


DPET for CSw>J>AS>HW>HoW>Cons>ES


DPET for CSw>J>AS>HW>HoW>Cons>LH


DPET for CSw>J>AS>HW>HoW>Cons>HPr


Note that the DPET on Execution Sentence is far higher than any of our other spells, which is why it’s worth prioritizing ahead of HoW and Cons. The only reason it isn’t worth pushing higher in the queue is that we have enough gaps in our rotation that it’s better to use high-damage, low-cooldown spells like Holy Wrath first to minimize empty GCDs.

The DPET on Light’s Hammer is lower than ES, but still above everything else, so most of the same logic applies. Again, with the weird unexplained exceptions that we talked about earlier, which I’m likely chalking up to error (either in the LH results or in the ES results – I’m not really sure which!).

But the DPET on Holy Prism is only on par with Hammer of Wrath and Consecration. This is mostly because it doesn’t scale as well with attack power as the other Level 90 options. Or to state that more precisely, the spell’s attack power coefficient is similar to that of Consecration and Hammer of Wrath (all around 0.7ish, I believe), so in the high-Vengeance regime they all do about the same damage. Light’s Hammer and Execution Sentence have significantly larger attack power coefficients, and thus do a lot more damage in that regime.

Now, it’s worth noting that this doesn’t mean Holy Prism is badly balanced. The cooldown of Holy Prism is only 20 seconds, compared to 60 seconds for ES and LH. In theory, you could get 3 casts of Holy Prism off in the same time that you cast one of either of the other level 90 talents. And those three Holy Prisms would total more damage than a single LH cast, though less than a single ES.

But those three Holy Prisms also cost three GCDs to the single GCD used by LH or ES. And that hurts Holy Prism in the “rotation priority” department, because it means we’re far more likely to be pushing something else back, effectively extending the cooldown of another spell and cutting into the DPS gain.

And if you have three spells that do very similar amounts of damage, one with a sub-6-second cooldown (HoW), one with a sub-9-second cooldown (Cons), and one with a 20-second cooldown (HPr), which one do you use first? Generally speaking, the ones with the shortest cooldown, because you usually lose less DPS by pushing the long-cooldown spell back than you do by pushing the short-cooldown spells back. See Wrath-era Retribution theorycrafting for another example of this, where Crusader Strike was prioritized over harder-hitting spells simply because its cooldown was much shorter.

Another reason for the discrepancy in DPET (and DPS) in our L90 talents is that LH and Holy Prism have some utility that they’re being balanced around. Both spells do a good bit of healing. Light’s Hammer works a lot like a raid cooldown, while Holy Prism does less of it but does it up-front. Holy Prism also has availability going for it, in that you can use it more frequently – something that anyone who’s tried to pick up a group of loose adds will recognize as a life saver.

In any event, that was a slight tangent; the take-home message of this last table is that Holy Prism gets to bring up the rear in our priority list.


This was an incredibly long post, and I didn’t even begin to go over the results in the sort of detail that I could given more time. But I’m pretty sure I hit most of the more important things. Still, it’s worth summarizing what we learned, or at least reinforced.

From this data, we’d ideally want to follow this rotation:


with the caveat that I’m obviously assuming you’re taking Eternal Flame instead of Sacred Shield, that you’re not doing any fancy Holy Wrath prioritization during execute range, that you’re glyphing Focused Shield and Glyph of Word of Glory, and that you’re ignoring whichever two talents you don’t currently have chosen.

We know that not waiting up to about a third of a second for CS to come off of cooldown is a notable DPS gain, as is prioritizing Avenger’s Shield. In both of those cases we suffer a noticeable decrease in survivability, however.

We also know that it’s a small DPS gain to push Holy Wrath higher in the queue during execute range if we’re using the Final Wrath glyph, but that this comes with a small survivability loss as well.

We know that if we’re taking Sacred Shield, we want to slot it in somewhere in the filler section to refresh it when the duration is almost up. It should probably be a gain to tack it on to the end of the queue to fill an empty GCD as well, but the data is inconclusive here, so the jury’s still out on that one.

And of course the Level 90 talent results are already incorporated into the rotation given above.

It’s also worth noting what we didn’t check here and to be clear about the limitations of this data set. We haven’t attempted to try any additional L75 talent options, so all we have is Divine Purpose data. Holy Avenger shouldn’t vary things too much, insofar as most of HA’s effect is simply more off-GCD SotR spammage. But it could cause rotations that try to increase DPS by prioritizing something over a holy power generator to fail miserably because each holy power generator is basically adding 2/3 of a SotR in damage during HA. On the other hand, they’re also less effective outside of HA than they would be with Divine Purpose, so who knows! Note also that we’re tanking the boss full-time here, so the effective uptime of Holy Avenger isn’t being considered.

Likewise, we didn’t test how Sanctified Wrath affects things. We’re already fairly sure that pushes a J+SW up ahead of CSw in priority, but we don’t know if it changes filler priority at all. Those are all on the list of things to add for next time.

We’re also simming the most bland encounter possible: solo-tanking Patchwerk forever. There’s no movement, no sudden or predictable damage bursts from a boss special, no significant variation in damage patterns (i.e. it’s a steady stream of melee+DoT damage, not an oscillating pattern of heavy melee followed by heavy magic followed by heavy melee and so on…). Basically none of the things that make real encounters interesting.

So keep that in mind when interpreting the results. I may say “this data suggests X is better than Y,” but I’m always doing that within the context of this particular set of constraints. It’s reasonable to assume that it generalizes fairly well to other situations, but it won’t always, and it’s almost certainly not going to be iron-clad enough to be correct for every encounter.

As usual, a smart tank should be looking for those inconsistencies and adapting their play to the encounter rather than blindly relying on “but Theck said so!”

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting, Uncategorized | Tagged , , , , , , , , , , , | 23 Comments

5.4.2 WeakAuras Strings

It’s been a little while since 5.4 was released, and I’ve still been tweaking my WeakAuras here and there as I go. I’ve finally made enough tweaks that I thought it was worth sharing with the class.

Again, the updated Paladin auras can all be grabbed at along with auras for all of the other class/spec combinations I use regularly.

Weakened Blows

The first change is actually the removal of an aura that doesn’t have much meaning anymore. Now that Crusader Strike applies Weakened Blows, there’s no reason to be tracking it’s uptime. So I’ve removed that aura entirely and shifted the Eternal Flame & Sacred Shield icons over to fill the empty space.

Priority Row Shuffle

I’ve tweaked the order of the spells on the priority row a bit. I’ve been using Holy Prism more frequently lately – or to be more specific, I’ve been swapping that talent more frequently and using all three choices. Nowadays, my sims are telling me that all three of these choices fall above Consecration in priority. And more importantly, I found that I tended to forget about those spells since I had their icons so far off to the right.

So I’ve re-ordered the last few icons on the priority row. I’ve moved Execution Sentence, Light’s Hammer, and Holy Prism to the left and moved Consecration and Sacred Shield to the right.  Now the order looks like:

CS – J – AS – HW – HoW – (ES/LH/HPr) – Cons – SS

Swapping Consecration and Sacred Shield is a last-minute change that I made after recording the videos (and taking the screenshots) shown later in this post, so in those Consecrate appears to be way off to the right. It looks a little cleaner now with that last-minute change though (which is why I decided to make it!).

Tier 16 4-piece Indicator

The first new indicator is one that tells you the status of your Tier 16 4-piece bonus. When you reach 3 stacks of Bastion of Glory, you get a buff called Bastion of Power that makes your next Word of Glory or Eternal Flame free. It’s a very simple matter to track this buff in WeakAuras so you know when it’s available.

The indicator pulses if you have a 5-stack of Bastion of Glory to remind you that it’s at full strength. As per this comment on last week’s blog post, refreshing the buff immediately at 5 stacks of BoG tends to be an ideal strategy in the steady-state (i.e. at constant Vengeance). In practice you’d want to consider your current Vengeance level, of course.

The two new indicators I've added in 5.4.2, along with the (slightly) adjusted layout now that the Weakened Blows indicator is gone.

The two new indicators I’ve added in 5.4.2, also showing the adjusted layout now that the Weakened Blows indicator is gone.

Eternal Flame Stoplight

To that end, I’ve added a new indicator. In this comment, Zil asked me if I could write an aura that would tell you whether refreshing Eternal Flame would give you a larger or smaller HoT. So I wrote up this “stoplight” indicator to give us that information.

Every time you cast EF, it calculates the strength of that EF and stores that value (much like the text indicators store the effective HP used, BoG level, haste, and AP you had at the time of the cast). It then calculates the value of a new EF given the current conditions and compares that to the stored value.

If the new value is at least 10% larger than the existing one, the indicator turns green. If it’s not, it stays red. Yes, this is the reverse of how I have the text indicators working, but if that bothers you the colors are easily tweaked on the display tab of each aura. I tend to think of green as “good,” in this case the stoplight means “It would be good to cast EF” while the text auras are telling me about the status the existing EF (“Your current EF is good, don’t mess with it”). I suspect most people will only use one or the other anyway.

There’s also a text indicator that shows exactly how much better the new EF will be. It’s literally just showing you (new EF value)/(old EF value) as a percentage, so if it reads 115% (as it does in the image above) it means recasting EF at this point will be a 15% increase in healing throughput. Note that this percentage can get very large in cases where you have a one-HP Eternal Flame active and you’re sitting on 5 stacks of Bastion of Glory.

Note that this indicator takes into account everything, as far as I know. It should accurately reflect changes in mastery, haste, Bastion of Glory stacks, spellpower, holy power, crit, and even Avenging Wrath. The only thing I’ve omitted are constant factors like the 50% increase from self-casting (which you should always have) and the 5% Seal of Insight healing bonus (which you should probably also always have, since I don’t think many players are switching to Seal of Truth/Righteousness).

The video below shows the indicator during development at 4x the final size to make it easier to see how it works:


It didn’t occur to me to write a Sacred Shield stoplight until writing this post, but I’ll probably put one together in the next few weeks. I’ll toss it on pastebin, update the WeakAuras page with a link to it, and probably tweet about it, but probably won’t give it it’s own blog post.
Jan 15 2014 edit: SS stoplight added, bundled with the EF stoplight. Also added crit scaling to the EF stoplight.

Aura Group Re-organization

Finally, I’ve had to re-organize the aura groups a little bit. Adding the code to support the EF Stoplight aura caused one of the other groups to get too large for Ace Serializer, which in turn broke importing. So I had to split them up. They’re now organized a little differently. The three big groups haven’t changed:

Theck – Prot – Priority Row
Theck – Prot – Cooldowns Row
Vengeance Bars

Those aura groups are all independent and work perfectly well all by themselves. You can import any combination of those and they should work seamlessly.

All of the auras that give you specific information about Vengeance, Sacred Shield, and Eternal Flame now have a dependency: the “Vengeance/SS/EF Helper Auras” group. This group contains the code that saves a snapshot of your stats when you cast EF or SS, which is why it’s required for the other aura sets to work. They all perform calculations on that information to determine what to display, so without it, they don’t work.

Vengeance/SS/EF Helper Auras (required for the three sets below)

And finally, the auras that display EF and SS information; again, none of these work without the aura group linked above:

SS/EF Vengeance Bar Overlays
Vengeance/SS/EF text indicators
EF/SS Stoplight Auras

(Technically speaking, the simple Vengeance text indicator doesn’t require the helper auras, so if you’re a non-paladin tank class and want that, you can just grab the “Vengeance/SS/EF text indicators” group and delete everything with EF/SS in the title and that Vengeance indicator should still work).

Final Product

And after all of that, here’s how it looks in practice on a target dummy:

If you want to see the Vengeance Bar indicators at work, check out the old 5.4 video on the WeakAuras Strings page. And as always, that page contains the auras for all of the other classes I play or have played. If you have a question about what addon I’m using to create a certain UI element, check out the UI Construction and Key Bindings post, which should still be mostly accurate. If that doesn’t answer your question, feel free to ask in the comments.

Other Stuff

A handful of quick comments regarding Simulationcraft stuff before I go:

  • There’s a bug with Execution Sentence in Simcraft at the moment (in 542-1 at least, and several earlier builds). For some reason it’s not ramping up the damage of each tick appropriately for protection, even though it works perfectly for retribution. This is a non-trivial issue caused by some piece of code deep in the core, and at this point I couldn’t even give you a completely satisfactory answer for why it happens. But top men (meaning: people more competent than I) are on it. Top men, I say. Hopefully should be fixed for next build.
  • There’s also a small bug in the last couple revisions that causes Eternal Flame to be slightly undervalued. I put some code in there to handle the hotfix applied in September (when EF’s self-healing bonus was nerfed from 100% to 50%) and forgot to take it out once the spell database was updated to include that information. So EF was only getting a 25% bonus from being self-cast rather than a 50% bonus for a while, thus being undervalued by about 17%. Oops. Bad Theck. This will be fixed in version 542-2.
  • I’m almost done with the MATLAB automation code that runs the rotation simulation. This turned out to be a much larger and more annoying project than I expected (and I already expected it to be large and annoying). It’s arguably the most complicated of all of the sims because I had to allow the possibility of using different glyphs and talents for each rotation. Luckily, I should be able to finish it this week and have it ready for a blog post next week. Also luckily, the rest of the automation sims should be far easier to code than this one was.
  • Once those are done, I have the fun job of deciding how to make them translatable to other classes (if at all). I have to see if any of this code runs in Octave or FreeMat (unlikely, I use a lot of fancy structure and cell stuff), and if not, decide whether to translate all of this to another language so that other theorycrafters/players can contribute and use the code. I could also entertain the possibility of integrating some of these features into SimC itself in the long run (ex: a talent simulation would be pretty simple, I think), but that’s something I’ve yet to discuss with the other SimC devs.

That’s it for today.


Posted in Tanking, Theck's Pounding Headaches | Tagged , , , , , , , , , , , | 25 Comments

Itemization Value of 4T16

Last week, Fouton from Icy Veins asked me whether I had tried to determine an “ilvl value” for the tier 16 4-piece set bonus. Stated another way, the problem he was trying to solve was twofold:

  1. Is it worth using lower-ilvl tier pieces instead of non-set pieces just for the 4-set bonus?
  2. If so, how much lower? Is it worth using LFR tier instead of heroic warforged non-set?

Unfortunately, I didn’t have an answer for him. I knew the 4-piece was powerful, of course. There was no question that using tier pieces over warforged loot from the same difficulty level was a survivability gain. But I had never really looked into whether it would make sense to use much lower-ilvl tier instead of warforged gear.

I was unconsciously assuming that if you had access to warforged loot from e.g. heroic, then you also had access to off-set gear from that same difficulty mode, so the most you would care about is a 6-ilvl difference. But especially for guilds progressing through normal, or guilds at the mercy of the personal loot system of LFR/Flex modes, that’s not necessarily a good assumption. Surely there are cases where a player has an LFR tier chest or helm and warforged normal-mode off-set from a different boss, and wants to know what to wear?

So I threw together a few quick profiles in Simulationcraft to test this.


As a control group, we’ll just use the T16 normal-mode protection paladin profile. This uses four pieces of normal-mode T16 (head, shoulders, chest, and gloves) with non-warforged Legplates of Unthinking Strife as the one off-set piece. Note that none of the gear in this profile has valor upgrades applied. The stat breakdown is given below:

T16N Stats
Stat Amount
Strength 19540
Stamina 47990
Expertise Rating 5107
Hit Rating 2607
Crit Rating 1112
Haste Rating 15677
Mastery Rating 7602
Armor 60112
Dodge Rating 180
Parry Rating 1526

The rest of the setup is pretty much what you’d expect. Talents are Eternal Flame, Unbreakable Spirit, Divine Purpose, and Light’s Hammer, glyphs are Focused Shield, Alabaster Shield, and Divine Protection. 

I then worked up four different variant gear sets to compare. The first is a set where we downgrade two of our tier pieces to LFR level. We choose the chest and the shoulders for this, since the tier helm and gloves both have haste on them.  Since both chest and shoulders are expertise/mastery pieces with expertise reforged into haste, we lose a chunk of those secondary stats as well as some strength, stamina, and armor.

Since we don’t really want to deal with the hassle of reforging each gear set to cap expertise, we cheat a little bit by adding a shirt to the gear set that will put us over the cap. While this adds a little ambiguity to our results, it should be a larger boon to the non-set arrangements than the tier sets.

After doing all of that, our second gear set looks like this:

T16N-LFR Stats
Stat Amount Diff
Strength 18786 -754
Stamina 46506 -1484
Expertise Rating 6345 N/A
Hit Rating 2607 0
Crit Rating 1112 0
Haste Rating 15503 -174
Mastery Rating 7065 -537
Armor 59329 -783
Dodge Rating 180 0
Parry Rating 1526 0

For the next set, we replace the chest and shoulders with normal-mode off-set pieces. In each case we’ve gone for maximizing haste, so we’ve chosen Chestplate of Congealed Corrosion and Darkfallen Shoulderplates. In both cases we’ve used the warforged (ilvl 559) version and applied two valor upgrades for a net ilvl of 567. Since we’re using a hacked shirt with 2500 expertise on it, we’ve chosen not to reforge the shoulders and have used a crit->mastery reforge on the chest. This gives us the maximum bang for our buck since none of that extra itemization has to go into expertise.

The stats for that gear set look like this (note that “Diff” is still in reference to T16N):

T16N-WF Stats
Stat Amount Diff
Strength 20045 505
Stamina 48630 640
Expertise Rating 5896 N/A
Hit Rating 2607 0
Crit Rating 1827 715
Haste Rating 18053 2376
Mastery Rating 6821 -781
Armor 60551 439
Dodge Rating 180 0
Parry Rating 1526 0

The next set takes the previous one to the extreme and uses the heroic warforged versions of both chest and shoulders.

T16N-HWF Stats
Stat Amount Diff
Strength 20576 1036
Stamina 49676 1686
Expertise Rating 5896 N/A
Hit Rating 2607 0
Crit Rating 1928 816
Haste Rating 18420 2743
Mastery Rating 7044 -558
Armor 60958 846
Dodge Rating 180 0
Parry Rating 1526 0

In our final two gear sets, we go to the other extreme: what if we force the player to use four or all five LFR tier pieces, including the severely sub-optimal dodge/mastery legs? We’ll be kind and reforge the dodge on those legs to haste, and continue to compensate for expertise and hit caps by using a fake shirt.

T16N-4LFR Stats
Stat Amount Diff
Strength 18032 -1508
Stamina 45021 -2969
Expertise Rating 6182 N/A
Hit Rating 3992 N/A
Crit Rating 1112 0
Haste Rating 14971 -706
Mastery Rating 7065 -537
Armor 58686 -1426
Dodge Rating 180 0
Parry Rating 1353 -173
T16N-5LFR Stats
Stat Amount Diff
Strength 17680 -1860
Stamina 44406 -3584
Expertise Rating 6182 N/A
Hit Rating 3500 N/A
Crit Rating 1112
Haste Rating 13789 -1888
Mastery Rating 7262 -340
Armor 58294 -1818
Dodge Rating 821 641
Parry Rating 1353 -173

We take all six of these gear sets and run them through a 50k-iteration simulation against the T16N25 TMI boss. Anything not explicitly mentioned is identical to the defaults in the T16N profile.


Here’s what we get out the other side:


And summarizing the important bits in table format:

T16N 230.5 73.25% 380k 149834 149540
T16N-LFR 967.0 72.82% 377k 153704 153381
T16N-WF 4125.1 64.32% 386k 160163 159818
T16N-HWF 1705.0 64.90% 390k 157680 157362
T16N-4LFR 1627.3 71.93% 371k 156588 156240
T16N-5LFR 3457.7 70.25% 363k 157307 156937

It should be immediately apparent from the table that the T16N gear set performs the best for survivability. It has the lowest TMI by a large margin and the highest SotR uptime.

Using normal warforged off-set pieces (T16N-WF) may be a gain of 2376 haste, but you actually lose about 9% SotR uptime, which means losing the 4-piece is costing you over 10% SotR uptime all by itself. And of course, smoothness (as measured by TMI) suffers greatly; the TMI is about 20 times higher, which means the spikes are roughly 27% larger on average.

Upgrading those off-set pieces to heroic warforged (T16N-HWF) pieces cuts your losses somewhat, but still gives significantly worse results than the control set. It’s not a large increase in haste or SotR uptime over the normal warforged configuration, but the extra stamina drops the TMI to around 1700, still about 18% larger spikes than T16N.

The T16N-LFR gear set, on the other hand, outperforms both of the off-set configurations. The TMI is only about 4 times worse than T16N, corresponding to a 13% increase in spike size, but the SotR uptime isn’t that much lower. So there’s no question that using 2 pieces of LFR tier (chest and shoulders) to get the 4-piece bonus gives superior survivability to using two well-itemized heroic warforged items in those slots to get extra haste.

If you instead force the use of four or five LFR tier pieces, the situation gets worse. That’s a significant loss of haste and stamina, so the TMI is predictably much higher. 4LFR is roughly equivalent to the T16N-HWF set in TMI, making up for the significant stamina reduction with the higher SotR uptime of the 4-piece bonus. It’s solidly ahead of the T16-WF gear set in both categories.

5LFR is still better than the WF set that uses normal-mode warforged off-set, both in terms of TMI and SotR uptime. 5LFR gives higher SotR uptime than the HWF set, but it trails in TMI thanks to the extra stamina and secondary stats (~5k haste) of the heroic warforged gear. That said, I don’t think this situation will be very common – players that have access to heroic warforged off-set should rarely need to resort to LFR pieces to complete their tier set.

There are two other things I want to point out about this data. Note that the higher-ilvl sets also convey slightly higher DPS, which is something to consider. The difference isn’t large (less than 3%), but on a serious DPS check that might be worthwhile.

Also note that all of these results assume you’re using Eternal Flame and Divine Purpose. If you’re talenting Sacred Shield, then you can still game this effect with free Word of Glory casts to fish for more Divine Purpose procs, but the benefit will be reduced somewhat. And of course, if you’re not using Divine Purpose then the 4-piece bonus won’t help your SotR uptime at all, though it will still make you more survivable by virtue of removing the opportunity cost of having to heal yourself with Word of Glory.


I’m hesitant to assign an equivalent ilvl value to the 4-piece bonus for a few reasons. The first of which is something most people don’t think about: not all ilvls are created equal. The head, chest, and leg slots give you more stats per ilvl than the shoulder and glove slots do, so the exact ilvl value will depend on the particular slots in which you’re making the sacrifice. In addition, it will depend a bit on which off-set gear you have; we’ve only looked at two specific choices (shoulders/chest), so we’d get a different answer if we considered the head, glove, or leg slots.

However, it’s clear that under the right conditions the tier bonus is stronger than trading up 52 ilvls in two slots (the difference between T16N-LFR and T16N-HWF). We also know that it’s roughly equivalent to trading up 52 ilvls in shoulder/chest and gaining 25 ilvls in head/gloves (though in this case, with equivalent tier rather than off-set).

Beyond that we’d have to guess a little, or run more sims where we compare the tier sets to other sets that use only off-set pieces of much higher quality. That introduces the benefit of the 2-piece bonus as well, though that’s probably a relatively small effect. It’s clear that four heroic warforged off-set pieces would beat out four or more LFR tier pieces based on the data we already have. It seems unlikely that a set with four heroic warforged off-set would be able to compete with the T16N set though.

The take-home message here is that the 4-piece can be really, really strong if used properly, and it’s worth resisting the temptation of even significantly-higher-ilvl gear to keep it. In all but the most extreme cases, such as trading multiple LFR tier pieces for multiple heroic warforged off-set pieces, keeping the 4-piece is going to be the better call.

Again, that comes with some caveats: it assumes you’re talenting Eternal Flame and Divine Purpose. If you swap from Divine Purpose to Holy Avenger for an encounter, then the benefit is reduced (though not eliminated – it still makes EF easier to maintain); if you don’t use either DP or EF, then the benefit is smaller still, and depends on how often and effectively you use WoG as an emergency heal.

Posted in Simcraft, Tanking, Theck's Pounding Headaches, Theorycrafting, Uncategorized | Tagged , , , , , , , , , , , , , | 18 Comments

A Letter to Celesty Claus

Every Winter Veil, children of both factions write letters to Greatfather Winter and ask for toys and games. In the meantime, their parents are writing letters and saying prayers to a completely different deity: Celesty Claus, the great celestial dragon that maintains the cosmic (class) balance. Legends say that he flies through the sky on Winter Veil Eve showering the world with nerfs and buffs, and the occasional meteor by accident (one of the inherent downsides of automated shooting star delivery systems).

celesty claus

A rare picture of the elusive Celesty Claus.

Classes that were good that year are happy to wake up on Winter Veil morning to find buffs in their stockings. However, classes that were bad check their stockings with trepidation, because they know they’re only likely to receive nerfs.

The rest of the year is usually spent bickering about who got the best loot from Celesty Claus and why everyone else needs to be nerfed because they’re clearly overpowered in PvP. And asking for ponies.

Of course, he never brings ponies, because he’s heartless. I don’t mean that in a derogatory way, but in a literal, anatomical way. Dude’s made of stars, he’s powered by fusion reactions, he doesn’t have a need for meat and sinew. How many ponies do you know that have survived the heat – not to mention radiation exposure – of a body made of stars? So wishing for a pony is pretty stupid unless you want char-broiled irradiated pony.

This is my letter to Celesty Claus for this year, specifically for protection paladins.

Dear Celesty Claus,

I know you’re a busy man… dragon… spectral titan construct… thing. So I’ll dispense with the milk and cookies and get right to the point. Which is asking you for stuff.

1. Please bring me a version of Holy Wrath that doesn’t have the damage-splitting effect. I get the original goal – a long time ago in a continent far away it was a neat way to give Retribution AoE damage that wasn’t “free” without adding another spell to their arsenal. But this is 2014, Retribution doesn’t even have Holy Wrath anymore. It’s ours now, and really should be designed around our needs.

And right now, we need snap aggro. We’re already strong on up to three targets thanks to Avenger’s Shield.  And our sustained aggro on large groups of mobs is also fine thanks to Consecrate. But the difficulty is picking up aggro on groups of 5+ mobs so that our sustained aggro can do its thing. On large groups, Holy Wrath hits weakly enough that it can’t compete with things like Dizzying Haze and Thunder Clap.

I realize that removing the damage splitting effect is a buff to Holy Wrath, and a buff to our sustained AoE DPS/aggro as well. I’m happy to accept a nerf to Consecration to balance out sustained DPS to make Holy Wrath a more useful spell.

2. Please bring us equitable talent choices on our level 45 tier. Eternal Flame is extremely strong even without our tier 16 four-piece set bonus. It really needs to be nerfed a little more in order for Sacred Shield to be a competitive option.

Likewise, Selfless Healer is in a pitiable state for protection. An instant Flash of Light, while nice, still costs a GCD, doesn’t heal for as much as a full-strength Word of Glory with 5 stacks of Bastion of Glory, and doesn’t come with the fringe benefits of Eternal Flame. Please give it some love so that somewhere, some protection paladin will feel like it’s worth taking.

If Selfless Healer could allow Flash of Light to be cast off of the GCD for protection, that would help a lot. But it also needs to heal for a lot more to make up for the fact that it doesn’t give you the long-term smoothness of Eternal Flame or Sacred Shield. Those two talents prevent spikes before they start by giving you predictable healing or absorption at regular intervals. For Selfless Healer to be able to compete with those two proactive talents, it has to be a very effective reactive choice.

It should really gain the full increase from Bastion of Glory so that the talent remains competitive as we stack mastery. Ideally, a Flash of Light cast with 3 stacks of Selfless Healer and 5 stacks of Bastion of Glory should heal for quite a bit more than a Word of Glory with 5 stacks of Bastion so that it’s your first go-to reactive tool. It’s trying to compete with two strong “over-time” effects, so it should condense the raw healing or absorption of those effects into a single huge shot. If it doesn’t heal for 80% of your health, it’s not really going to be competitive with an Eternal Flame that heals for 60%-70% and gives you several times your health over 30 seconds.

3. Please bring me a version of Consecrate that benefits from haste. As I’m sure you know, we paladins love haste, almost to the point of irrationality. And while Sanctity of Battle helpfully reduces Consecrate’s cooldown as we stack haste, it doesn’t change its tick interval. It ticks at fixed one-second intervals no matter what your haste level is.

The problem that arises here is that when we’re at high levels of haste, we can be in a position where we re-cast Consecrate before the previous one is done ticking. Since we can’t have two Consecrates on the ground, we end up clipping the earlier cast and losing ticks, reducing Consecrate’s damage per cast.

In a single-target situation, it’s fine for Consecrate to be our lower-DPS filler that remains a low priority. However, in AoE situations it is a much higher priority. That reduces the effect of haste on our many-target sustained threat. It’s almost like the spell suffers from diminishing returns with respect to haste.

More importantly, it makes it trickier to use properly for novice paladins. You lose DPS if you recast it early, but you lose more DPS by bumping it lower in priority during AoE situations. The default unit frame even shows a little timer for it, which could be misleading for a novice. I’d just like to see it work more seamlessly with Sanctity of Battle so that it feels less awkward.

4. Please make seals interesting again. I remember losing auras. It was sad to lose something iconic, but at the same time they had devolved into a “set it and forget it” mechanic that didn’t add a lot of fun game play. If it’s something I won’t change for hours and has a minimal effect on my experience, it’s probably not worth keeping.

The problem is that seals feel very much the same way as protection. Seal of Truth has been neutered to the point that the DPS increase is negligible. Seal of Righteousness is similarly weak. And most importantly, Seal of Insight is such a strong survivability component that it is almost never worth giving up for either of the other two. You could remove Seal of Truth and Seal of Righteousness from protection and most tankadins wouldn’t even notice.

But the idea behind the Warlords of Draenor talent “Seal of Faith” is interesting. We would trade a bunch of damage output for healing output. Of course, it doesn’t make a whole lot of sense right now, because we don’t have the supporting tools to make that useful. But if we had a more extensive toolkit of healing spells, I could imagine using that talent to help my raid survive heavy raid damage phases.

I don’t think I’ll ever take that talent, because having Holy Shield back is just too cool, but it’s the thought that counts.

And in this case, I’d love to see all seals work on this basic principle of having a more significant effect on your play. Seal of Insight could be the default “tanking” seal that gives you a big chunk of survivability by increasing armor and healing throughput. Seals of Truth and Righteousness could sacrifice a lot of that self-healing to grant other benefits that are primarily useful while not tanking, much like Seal of Faith sacrifices damage for more (possibly) raid-healing or off-healing capability.

The one fear is that by being able to swap between highly disparate modes could cause tank imbalances. We’ve seen this before, where one tank was able to switch from high damage output to high survivability by toggling stances, and it caused plenty of problems. It’s really something that all tanks need to be able to do in similar capacities to be balanced.

But the alternative is to just redesign or eliminate seals. Seal twisting just isn’t very fun for the same reason most retribution paladins dislike Inquisition.  Spending resources now for a zero-damage GCD feels bad, even if the math says it’s an overall DPS increase. And for protection, the damage increase is rarely, if ever, worth the large survivability sacrifice of dropping Seal of Insight. If seals aren’t getting redesigned, I’d rather just see each spec get one seal: Seal of Insight for protection and holy, Seal of Truth for retribution.

If you really want to go a little radical with the redesign option, give us one “passive” seal and make the others active abilities that operate like cooldowns. Seal of Righteousness could replace the active seal, granting its usual effect for 15-20 seconds on a one-minute cooldown, and then automatically swap back to the “default” seal after the effect has ended. That would give us the ability to actually use Seal of Righteousness for a temporary AoE damage boost without costing us two GCDs.

5. Please bring us an end to the raid cooldown arms race. While it’s nice to be able to contribute something to the raid group, the sheer number of raid cooldowns being tossed around is getting absurd. Many encounters are being designed around rotating raid cooldowns to survive. While there’s certainly some level of coordination involved in that, I think it makes the game less fun for healers. It also leads to class stacking on encounters where those cooldowns are not equitable.

I feel that raid cooldowns should be limited to one role, and that role should probably be healers. In 20-man mythic, the number of healers should be more stable than it is in current 25-man heroics. While the number of tanks will be stable as well, the temptation to sacrifice a little DPS for another raid cooldown would be strong.  Sacrificing an entire player worth of DPS for a raid cooldown is much more punishing and also more strategic, since you would do that on a fight where you presumably want more healing to begin with.

Raid cooldowns should, in my mind, be a finite resource that you have to use intelligently and carefully. Precision tools to deal with only the most difficult situations. Rotating cooldowns to trivialize an entire 30-60 second period of an encounter just feels cheesy to me, as is having enough Devotion Auras to throw at every single instance of a boss’s raid-wide damage ability.

Yes, I will be sad to give up Devotion Aura. But I will be happier with raiding as a whole, so it’s a sacrifice I’m willing to make.


P.S. Sorry about killing your cousin Elegon every week for the past six months or so. He was… um… corrupted by the Mogu or something, so it was justified. Each week. Really. On the bright side, he dropped this great mount that looks just like you!


happy holidays

Happy Holidays!

Happy Holidays from everyone here at Sacred Duty. See you next year!


Posted in Humor, Tanking, Theck's Pounding Headaches, Uncategorized | Tagged , , , , , , , , , , , , , | 22 Comments

MATLAB Automation Code

So it’s finally time to unveil a project I’ve been working on intermittently for the last three months.  If you recall, in the past I maintained a suite of MATLAB DPS simulations that attempted to determine optimal rotations, stats, glyphs, talents, weapons, and so on.  When the Finite State Machine (FSM) rotation modeler screeched to a grinding halt due to a combination of haste effects and long cooldowns, that suite of simulations was put on hold while I searched for a new (and faster) way to run simulations.

And as I’ve mentioned before, Simulationcraft was the solution I settled on.  It had the speed and accuracy I needed, and I wouldn’t have to do all the work myself thanks to the extensive set of contributors.  It also held the promise of unifying my DPS and survivability simulations into one simulation package rather than maintaining separate code for each.  And it even has built-in stat weight generation, so I wouldn’t need to replicate that functionality of the old sims.

I spent a good chunk of the summer and early fall getting Simcraft’s paladin module up to date and inventing and programming a new tanking metric.  But that was only the first step.  I now had the simulation back-end to do the heavy lifting, but I still needed some way to do batch processing.  To re-create the glyph simulation, for example, I needed code that would run Simulationcraft over and over for a bunch of glyph configurations and analyze the results.

There are a number of ways I could go about doing that.  I could write simple batch files in DOS, or more realistically in another language like Perl (which I’d then have to learn, since I don’t actually know Perl).  But what I really want to put together are giant tables of data and graphs.  And there’s one language I know that’s exceptionally good at handling giant tables of data and graphs.  MATLAB.

There was a bunch of grunt work involved, like writing functions that handled all sorts of mundane tasks: writing and reading strings to and from text files, regular expressions to pull the data I wanted out of Simcraft’s text output files, code to do simple tasks like making sure the path information was correct, code to automatically handle the caching and regeneration of results when I update to a new version of Simcraft, and code to output data into a table format that I can copy/paste directly into the blog.  None of that was very interesting, even though it was probably 90% of the work involved.

The interesting part, which is what I’m going to write about today, is how the code works and the output it produces.

Automating Simcraft

Simcraft operates by reading in an input file containing all of the relevant information about a character.  What we want to do is basically tweak portions of that input file to see what changes in the result.

There are a few different ways we could go about that.  For example, to test glyphs we could just have a “default_glyph_simulation_character.simc” file and edit that file over and over to change the glyph setup.  That wouldn’t be terribly hard to code, but it has a few downsides.  The main one is that it can be an issue for caching, which I’ll discuss a little later.

Instead, I went with a very versatile setup where I modularize the input file.   In other words, I split the .simc file into component parts: a player section, a glyphs section, a talent section, a gear section, a rotation section, and a boss section.  To run a sim, I just stitch the component parts together.  For example, I can combine the default player, talent, gear, rotation, and boss sections with various different glyph sections to create my different glyph setups.  I can save each of these combinations as a new .simc file, and by labeling them appropriately I’ll have the input file for each individual simulation.

This is useful for debugging, of course, but it also means that if we want to write a different comparison, we can reuse a lot of code.  Instead of swapping in and out the glyph component, we might swap in and out the talent component to create a talent sim, using most of the same automation logic.

It’s also helpful for caching the results.  One of the downsides of running simulations is that it can take a lot of time.  So it’s helpful to keep and reuse results that shouldn’t have changed.  If I run a sim with 50k iterations, I want to store the results so I can just call up those results later on rather than having to re-run the whole 50k iteration sim again. This essentially replaces a several-minute simulation run with a millisecond data read operation.

But you have to be careful about how you do that.  For example, if the input .simc file changes, we’d obviously want to re-run the full 50k iteration sim to generate new results rather than call up old results that may or may not have any relevance to the current problem.  By keeping a separate .simc file for every individual sim in a comparison, I can do that sort of checking and easily call up saved results when they should still be relevant. And of course, it will re-run the sim if it looks like anything important has changed (i.e. any of the inputs or the Simulationcraft executable are newer than the output).

So, to illustrate how this all works, let’s assume we’re writing a glyph simulation.  We start by defining defaults for the components we won’t vary.  In other words, we start with a default player:


and default talents:


and default gear

#T16N Gear Set


# Gear Summary
# gear_strength=19365
# gear_stamina=36396
# gear_expertise_rating=5107
# gear_hit_rating=2607
# gear_crit_rating=1112
# gear_haste_rating=15677
# gear_mastery_rating=7602
# gear_armor=60112
# gear_dodge_rating=180
# gear_parry_rating=1526
# meta_gem=indomitable_primal
# tier16_2pc_tank=1
# tier16_4pc_tank=1
# main_hand=siegecrafters_forge_hammer,weapon=mace_2.60speed_10257min_19051max,enchant=windsong

and a default action priority list:

# Snapshot raid buffed stats before combat begins and pre-potting is done.


We then come up with a list of all of the different glyph combinations we’re interested in and create .simc component files for those as well.  For example, there’s an “AS_AW_DA.simc” file that just contains:


and similar files for every other combination we care about.  We then piece together a complete .simc file from the default components and one of the glyph components, and run that sim to get our .html and .txt output files.  And then we do it again for a different glyph component file, and then again, and so on until we have results for all of them.

The last part is just collecting and displaying the data by reading those output files, searching for the relevant information, and arranging it in data tables or graphs.  That’s mostly done by filtering the text output files with regular expressions, and isn’t all that interesting.  However, the results it spits out are interesting.

Glyph Comparison

Below is the data from the first run of the completed glyph comparison.  The defaults being used are all shown above except for the boss component, which is just the TMI standard T16N25 boss.  This is a list of every possible glyph combination using the following glyphs:

AS – Alabaster Shield
AW – Avenging Wrath
BH – Battle Healer
DA – Devotion Aura
DP – Divine Protection
FW – Final Wrath
FS – Focused Shield
HW – Harsh Words
IT – Immediate Truth
WoG – Word of Glory

There are a few omissions here. Some glyphs are basically useless for simulation (Holy Wrath, for example), so they’ve been ignored.  Double Jeopardy is missing because it’s not programmed properly in Simcraft at this point – something I hope to remedy during the holidays.  I should also note that the Harsh Words glyph doesn’t do anything in the default setup since Eternal Flame is the chosen talent.  I can fix that in a variety of ways, the easiest of which is probably just to add an APL entry to offensively cast WoG if the glyph is present.

But otherwise, that list should cover all of the major glyphs that affect DPS and survivability.  I’ve ignored minor glyphs since none of them have a significant impact.

Below is a sortable list of the data. Since it’s long, I’ve spoilered it so you can open and close it.  While I haven’t included error metrics on the table, the maximum DPS error in this data set is 88, which is less than 0.005% error.  Note that “E” stands for an empty glyph slot.

Spoiler Inside SelectShow

Rather than dig through all of that data to come up with important conclusions, I’ve also programmed it to generate a table showing DPS for single-glyph configurations.  That table is shown below.  DPS error data is provided here, along with the DPS difference between that configuration and having no glyphs (“Delta”).  Delta is thus the DPS gain due to adding that glyph in isolation, to within +/- the error (“Err”).

Glyph DPS Err Delta HPS DTPS TMI SotR
E 367429 78 0 157433 157771 588.1 73.0%
AS 372376 78 4947 157454 157791 594.9 73.0%
AW 367355 79 -74 157486 157823 836.9 73.0%
BH 367471 78 42 156043 156789 18810.1 73.0%
DA 367334 78 -95 157443 157779 650.6 73.0%
DP 367323 83 -106 149574 149868 213.6 73.0%
FW 370402 83 2973 157422 157756 737.7 73.0%
FS 382948 78 15519 157464 157802 683.5 73.0%
HW 367394 78 -35 157451 157787 574.5 73.0%
IT 367187 80 -242 157468 157805 546.6 73.0%
WoG 374192 78 6763 157464 157801 598.9 73.0%

This table basically shows us that Focused Shield is the largest DPS gain we can get against single targets by a large margin.  Coming in second place is the Glyph of Word of Glory, thanks to all the EF casts we’re using in this profile, followed by Alabaster Shield.  Final Wrath is a distant fourth, and pretty much nothing else has a significant effect on our DPS output.

Some of the deltas are a little bigger than the “Err” column even though they should have literally zero effect (ex: Immediate Truth given that we’re using Seal of Insight), which suggests that the error bounds SimC is reporting probably aren’t generous enough.  I don’t remember whether it’s reporting a 95% CI interval or something else, so I’ll probably have to dig through the statistics module and figure out what I need to do on my end to get more realistic error bounds.

Anyway, we can also make two other useful tables out of this data.  The first would be to sort it in order of descending DPS to get the top 10 DPS combinations.  We should expect that FS+WoG+AS is on top, followed by FS+WoG+FW.  And indeed if we ask MATLAB to generate that table, we find:

Top 10 DPS Combinations
AS FS WoG 395388 85 0.00% 157420 157757 819.0 73.0%
FW FS WoG 393219 88 0.00% 157469 157805 566.7 73.0%
AS FW FS 391406 86 0.00% 157456 157796 571.9 73.0%
AW FS WoG 390205 85 0.00% 157464 157796 632.7 73.0%
FS IT WoG 390107 85 0.00% 157439 157773 572.1 73.0%
E FS WoG 390094 85 0.00% 157433 157771 536.3 73.0%
BH FS WoG 389977 86 0.00% 156020 156758 18379.6 73.0%
FS HW WoG 389911 85 0.00% 157495 157830 2258.5 73.0%
DA FS WoG 389881 85 0.00% 157418 157752 533.8 73.0%
DP FS WoG 389830 85 0.00% 149573 149868 384.7 73.0%

I wouldn’t trust the TMI results to better than +/-50% here because we’re clearly running into the “self-sufficiency” problem I discussed in an earlier blog post.  In other words, I doubt the difference between the top 6 DPS specs is at all significant, it’s probably just noise.  On the other hand, the significant jump we see when using Battle Healer is real.  I’m also not 100% sure what’s causing the higher TMI for the FS/HW/WoG combination – I’m guessing it’s a bug in how SimC handles Harsh Words and Eternal Flame (likely guess: it’s automatically casting WoG offensively when EF is cast, but still granting the player the HoT?).  Something to add to my holiday to-do list, I guess.

Finally, we could also make a “Best TMI combinations” list:

Lowest 10 TMI Combinations
AS DP IT 372263 149585 149879 186.2 10.80 5.80% 73.0%
DP FS IT 382505 149600 149895 194.3 13.40 6.90% 73.0%
E AS DP 372551 149533 149828 195.9 18.90 9.60% 73.0%
AW DP WoG 374264 149555 149852 196.7 14.00 7.10% 73.0%
AW DA DP 367330 149520 149816 197.1 18.70 9.50% 73.0%
DA DP WoG 374264 149524 149820 199.3 22.30 11.20% 73.0%
DA DP IT 367326 149549 149845 201.3 16.50 8.20% 73.0%
E DA DP 367376 149505 149798 204.7 36.00 17.60% 73.0%
AW DP IT 367226 149577 149873 205.3 31.50 15.30% 73.0%
DA DP FW 370381 149562 149856 205.5 19.20 9.30% 73.0%

I’m not sure there’s much to learn from this particular table.  DP is really the only big survivability glyph we have since Devotion Aura isn’t on the default APL.  So this list is essentially “10 random configurations that include DP.”

For reference, all of the results of these simulations are hosted on the matlabadin project in the “trunk\simc\io\” folder.  So if you’re curious about any of the individual simulations, you can just look up the “glyph_X_Y_Z.html” file corresponding to that sim and see exactly what the setup and results were.

Talent Comparison

I’ve also written the talent comparison; it works basically the same way the glyph one does, but cycles through all the possible talent combinations.  I’ve only considered the ones that have an effect on combat (L45, L60, L75, L90).  The max DPS error on this table is 84, again less than 0.005%.

The default glyph configuration for these sims is


though after looking at the results of the glyph comparison, maybe it should be FS/WoG/DP. Or FS/WoG/AS to try and cut down on the self-sufficiency problem, though that would also affect Unbreakable Spirit’s valuation significantly.

SH – Selfless Healer
EF – Eternal Flame
SS – Sacred Shield
PU – Hand of Purity
US – Unbreakable Spirit
CL – Clemency
HA – Holy Avenger
SW – Sanctified Wrath
DP – Divine Purpose
ES – Execution Sentence
LH – Light’s Hammer
HP – Holy Prism

Spoiler Inside SelectShow

In this case, rather than picking out “single-talent” combinations (since those really don’t exist), I’ve picked a handful of relevant ones for a shortlist.

Talent Short List
Talents L45 L60 L75 L90 DPS HPS DTPS TMI SotR
311212 SH US HA LH 390997 121134 153348 753032.4 71.0%
311222 SH US SW LH 388631 124118 161326 1080163.3 63.0%
311232 SH US DP LH 384737 122424 152032 500356.3 70.0%
312212 EF US HA LH 391302 153414 153716 423.1 71.0%
312222 EF US SW LH 388657 161300 161688 563.9 63.0%
312232 EF US DP LH 388086 149540 149832 216.3 73.0%
313212 SS US HA LH 382266 107749 109451 25367.7 69.0%
313222 SS US SW LH 379472 114575 117620 37985.6 62.0%
313232 SS US DP LH 376220 106990 108622 13449.7 69.0%

Here we see that Eternal Flame consistently beats Sacred Shield by a large margin for survivability (TMI in the hundreds vs. TMI in the tens of thousands).  The slight DPS gain of EF over SS is due to GCD clashes (remember, Glyph of Word of Glory isn’t chosen in the defaults).  Unbreakable Spirit is basically a no-brainer thanks to Divine Purpose, so there’s no reason to vary that.  Within a group, Divine Purpose consistently gives lower TMI than the other two L75 talents.  I stuck with Light’s Hammer across the board so that we could compare the L45 and L75 talents more directly, though I should probably add a few more combinations to this list so it highlights the difference in the L90 talents.  Luckily we get some of that from the next two tables: Top 10 DPS and Lowest 10 TMI.

Top 10 DPS Specs
Talents L45 L60 L75 L90 DPS Err %Err HPS DTPS TMI SotR
312313 EF CL HA HP 403360 76 0.00% 159434 159823 3014.5 71.0%
312113 EF PU HA HP 403339 76 0.00% 159400 159787 3775.0 71.0%
312213 EF US HA HP 403333 77 0.00% 153305 153621 520.2 71.0%
311113 SH PU HA HP 403248 76 0.00% 111387 160509 8468304.7 71.0%
311213 SH US HA HP 403211 76 0.00% 112976 153377 1180576.5 71.0%
311313 SH CL HA HP 403172 76 0.00% 111397 160498 8471353.9 71.0%
311323 SH CL SW HP 401415 69 0.00% 114830 168578 9243176.7 63.0%
312323 EF CL SW HP 401393 69 0.00% 167421 167950 3624.0 63.0%
311123 SH PU SW HP 401375 69 0.00% 114837 168569 9262060.2 63.0%
312223 EF US SW HP 401307 70 0.00% 161101 161520 772.7 63.0%

This list suggests that Holy Avenger and Holy Prism are the two dominant DPS talent choices this tier.  The L60 talent is irrelevant, and the L45 talents are similarly irrelevant in this table because of the lack of Glyph of Word of Glory.  Since nothing in the APL utilizes Selfless Healer yet, that’s basically an “empty” talent choice.  Combinations with EF and SH are interchangeable in regards to DPS because neither costs us any GCDs; SS combinations don’t show up because it does cost GCDs and pushes back DPS abilities.

From this list, it looks like in T16 normal gear, HA>SW>DP for DPS in the L75 slot.  This is a bit surprising, as I expected Divine Purpose to have a better showing here.  I haven’t quite figured out the rationale for why it’s performing so poorly for DPS in these sims.

Holy Prism rising to the top is also a bit of a surprise, but it makes sense.  We can cast three Holy Prisms every minute compared to a single Light’s Hammer or Execution Sentence.  Three Holy Prisms has always been more damage than either of those two alternatives, even early in the expansion.  However, those three Holy Prisms cost three GCDs.  When we were using Sacred Shield as our go-to L45 talent and getting Grand Crusader procs from attacking, we simply didn’t have those spare GCDs.  Switching to Eternal Flame and losing some Grand Crusader procs opened up enough GCDs that we can fit Holy Prism in very seamlessly.

I should also note that the default APL may not be properly optimized for Holy Prism yet.  That’s another thing we’ll have to refine once I have time to write the rotation comparison.  But that should only make Holy Prism better, not worse.

As far as the last two L90 talents, we can get a little bit of information from the next table.

Lowest 10 TMI Specs
Talents L45 L60 L75 L90 DPS HPS DTPS TMI Err %Err SotR
312232 EF US DP LH 388086 149540 149832 216.3 31.30 14.50% 73.0%
312231 EF US DP ES 388627 149833 150132 272.4 54.20 19.90% 73.0%
312233 EF US DP HP 399801 149415 149725 329.6 65.80 20.00% 73.0%
312212 EF US HA LH 391302 153414 153716 423.1 101.50 24.00% 71.0%
312211 EF US HA ES 391824 153606 153907 442.7 93.20 21.00% 70.0%
312213 EF US HA HP 403333 153305 153621 520.2 49.40 9.50% 71.0%
312222 EF US SW LH 388657 161300 161688 563.9 63.50 11.30% 63.0%
312221 EF US SW ES 388194 161475 161867 735.9 291.20 39.60% 63.0%
312223 EF US SW HP 401307 161101 161520 772.7 155.80 20.20% 63.0%
312332 EF CL DP LH 388118 155544 155885 914.6 84.60 9.30% 73.0%

Now, this is the table for lowest TMI, but the first thing I want to point out is about L90 talents and DPS.  The first three rows of this table are identical except for the L90 talent, and it’s clear from those rows that Holy Prism has a significant lead in DPS (around 11k DPS).  Execution Sentence comes in second and Light’s Hammer a close third, separated by only about 550 DPS.  I’m hesitant to put too much stock in the survivability value of the three talents given that both Execution Sentence and Holy Prism are always being cast offensively here (though Holy Prism still heals you via the secondary effect, obviously).  The TMI spread is also fairly small, so I hesitate to trust the order anyway.

However, turning our attention back to survivability, the dominance of EF+US+DP here is pretty clear, sweeping the top three spots.  Swapping DP for HA results in a small decrease in survivability, mostly through a loss of SotR uptime.  Swapping HA for SW is another clear loss, and losing US for Clemency in the last row is another clear loss.  Note that the default APL doesn’t use Hand of Purity, otherwise I suspect that EF+PU+DP+LH would have taken that last spot.  Add another thing to my holiday “to-do” list.

Realistically, I need to fix the “self-sufficiency” problem before I can rely on these TMI lists.  I may use some trickery to /cancelaura Vengeance periodically to try and reduce the problem.  We’ll see though – if the beta for Warlords of Draenor comes out any time soon, I may just start focusing on that since we’re basically done with content for this expansion anyway.


In short, I now have the ability to automate comparisons using SimC, much in the same way I used to do with my old MATLAB DPS simulations.  I’ve gotten the first two done (glyphs and talents), and they mostly confirm things we already knew.

The glyph simulation reinforces that Divine Protection is the only glyph that has a large survivability benefit (and again, that’s situational – on a fight with a big magical burst, you still wouldn’t use it).  It also confirmed that our best DPS glyphs are Glyph of Focused Shield, Glyph of Word of Glory, Glyph of the Alabaster Shield, and Glyph of Final Wrath, in that order.

The talent simulation reiterated that Eternal Flame is stronger than Sacred Shield for raw smoothness, and that Unbreakable Spirit is strong when you’re using Divine Protection on cooldown.  In the L75 talent category, it showed us that for DPS, Holy Avenger > Sanctified Wrath > Divine Purpose, but for survivability Divine Purpose > Holy Avenger > Sanctified Wrath.  And finally, in the L90 category it suggests that Holy Prism is significantly better than Execution Sentence or Light’s Hammer for DPS if you’re using Eternal Flame.  It didn’t tell us much about their survivability value, though.

My next task is probably to get more of the simulations online.  The  weapon simulation should be relatively easy, the rotation simulations less so (but arguably more interesting to write).  I also want to make some refinements to the settings for these two existing simulations based on some of the things I’ve noticed while writing this blog post, and probably based on things that other people notice and post in the comments.  I’m happy to entertain feedback on what I can improve here, since these sims are clearly still in a fairly rough stage of development.

In parting, I want to leave you with an interesting thought though.  Nothing about the code is all that paladin-specific. I’m bolting together .simc files that all contain paladin “stuff,” but the bolting together part is mostly class-agnostic.  Why is that important?

Well, consider: what if someone were to write default component .simc files for, say, a Frost mage? With a few minor tweaks, this automation suite could then run nearl identical simulations for a Frost mage.  Or a Protection warrior, or Blood DK, or… you get the idea.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting, Uncategorized | Tagged , , , , , , , , , , , , , | 37 Comments

Post-Blizz-Con Wrap-Up, Part 2

In the last post, I ranted about time travel and lore. This time, I’m going to talk about some of the mechanical changes that were announced at BlizzCon.

Stats, Reforging, and Gear

There were a lot of different gear-related changes that I’m lumping together in this one category because they’re all somewhat related.  It’s hard to say which is the “biggest” or “most important” of these changes, because several of them are (literally) game-changing. So we’ll go through them in no particular order.

First, gear will no longer have a specific primary stat.  If a piece of plate drops, and you’re in Holy spec, it’ll have intellect and stamina as its primary stats.  If you switch to Retribution or Protection, the item will suddenly have strength and stamina as primaries.  This is a pretty huge change, because it basically makes the big three primary stats irrelevant on the bulk of gear.  Every piece of plate, leather, and mail will always have stamina and whatever primary stat your spec uses.  In some sense, it consolidates strength, agility, and intellect into one flexible primary stat.

I don’t think many players will argue that this is a bad thing. You’ll automatically have up-to-date gear for all of your off-specs, so hybrid classes aren’t punished as much for wanting to be fluent in more than one spec.  The gear may still not be ideal because necks, rings, cloaks, and trinkets will only have secondary stats, some of which are only relevant to certain specs. But it’ll be a large improvement over always using last-tier’s gear for your off-spec, especially since you’ll have current-tier set bonuses.

We’re also getting a few new secondary stats, with the major three being Amplify, Readiness, and Multistrike.  These should work much the same way the Siege of Orgrimmar trinket procs do.  Readiness is just the cooldown reduction effect we’ve seen on trinkets, and will apply to a select few abilities based on your spec.  Adding X% multistrike will give you two chances (X/2% each) to do an additional 30% damage (or healing) with each attack (or heal).  And X% amplify increases your crit and multistrike multipliers as well as giving you X% more haste, mastery, spirit, readiness, and armor from gear.

It’s worth noting that all three of these stats can be considered “tanking” stats.  Readiness gives you more frequent access to cooldowns like Guardian of the Ancient Kings, Divine Protection, and Ardent Defender.  Multistrike works on healing as well as damage, so while the details are still a little vague, it’s likely that it will work on effects like Seal of Insight and Eternal Flame.  Sacred Shield is a little dicier, but it could be made to work by simply having a chance to apply multiple absorb bubbles; it’s just not clear whether it will or not.  Amplify is obviously a tank stat because it gives you more of everything: haste, mastery, armor, readiness, as well as larger crits (for Eternal Flame) and larger multistrikes.

Armor is also making a return as a secondary stat on specific items (namely necks, rings, and other non-plate gear), so we’ll have another secondary stat to throw into the mix.  I didn’t lump armor in with the “major” three simply because armor isn’t really new.  It’s still nice to have it back though, armor was always a powerful stat even though it’s passive.

Having four new “tanking” secondary stats is good, because the other bombshell piece of news is that four secondary stats are being removed entirely.  Hit and expertise are gone, making juggling the hit and expertise caps a thing of the past. I predicted we’d see a change to these stats, but I didn’t anticipate both of them disappearing because it would reduce the number of possible stats on gear too much. But the addition of three new stats more than makes up for that.  Also note that while bosses will still have a chance to parry attacks from the front (so that melee DPS still have to stand behind them), tanks will have a passive that bypasses that effect. So as a nice little side effect, the “tank expertise” penalty is going away as well.

I didn’t expect dodge and parry to be completely removed for similar reasons, though I did expect a change. But again, given four new secondary stats to play with, we really won’t end up missing these two.  It’s worth noting that the dodge and parry mechanics aren’t going to be completely gone – we will still dodge and parry attacks passively, we just won’t have the ability to stack them via secondary stats.  It’s likely that we will still build up dodge and parry over the course of an expansion through our primary stats just like we do today.  So strength will essentially be our avoidance stat, and we won’t have to worry about choosing it since it comes on gear by default.

Of less concern to tanks, they’re changing the way that DoT snapshotting works.  In short, it won’t snapshot anymore, it will dynamically update the tick amounts based on your current stats.  This will mean that specs like Affliction Warlocks won’t be quite as skill-dependent, because your DPS won’t drop as much if you accidentally re-apply DoTs a little too early after buffs wear off.  That’s good and bad – good if you think the skill differential between an average Affliction Warlock and an expert one was too big, bad if you didn’t think it was large enough.  Since I don’t get enough time to play my Warlock enough to keep in practice anymore, it’s arguably a buff for me, so I’m not too worried. But I can see how some Warlock mains might be peeved.

Again, while it’s not of that much relevance to us, it’s worth discussing how the new mechanic will work.  The tentative model I overheard during BlizzCon discussions is that every DoT/HoT will have its usual fixed duration, and we’ll just get partial ticks at the end.  So for example, let’s consider Eternal Flame, a 30-second HoT that ticks in 3-second intervals. If we have 20% haste, those ticks will occur at 2.5-second intervals (3/1.20), so we’ll get 12 ticks instead of 10.  If we increase that to 25% haste, the ticks will be 2.4 seconds long (3/1.25=2.4), so the first 12 ticks will take 28.8 seconds.  Then we’ll get a partial tick at 30.0 seconds that will be half-strength (because it will be a 1.2-second long tick rather than a 2.4-second long tick, and 1.2/2.4=0.5).  Presumably Sacred Shield will work in a similar fashion.

With the changes to hit, expertise, dodge, and parry, they’ve also decided that reforging isn’t necessary, and have removed that in 6.0.  This sparked mixed reactions from the players I spoke with.  Sure, we don’t need it to maintain hit and expertise cap anymore, or to balance our dodge and parry ratings.  And the changes to DoT/HoT snapshotting will get rid of most (but maybe not all) relevant haste caps in the game.  But reforging still narrowed the gap between a well-itemized piece and a poorly-itemized piece of equal ilvl.  That has its advantages, especially when it comes to allocating loot in smaller raids.  I’m not sure reforging absolutely had to go in this environment.  But it seems the decision is that keeping reforging just isn’t worth the hassle when its impact is so marginal.  It’s not a decision I’ll argue against, since I don’t have strong feelings about reforging either way.

They also talked about having fewer gem slots on gear and paring down enchants to cover fewer slots, though with more options for each slot.  That means the level of customization we have on gear will be going down a little bit.  Whereas now, we can stuff every socket full of haste gems and use haste enchants to rack up an extra 8% haste or so, we probably won’t be able to do the same thing in Warlords of Draenor.

Tanking Mechanics

One of the most significant announcements is something that wasn’t actually said outright, but merely implied. You see, one of my predictions was that all tanks were going to move to a “DPS tanking” model similar to what Monks, Druids, and now Paladins use.  And while I don’t remember them explicitly addressing that topic (maybe they did in a panel that I’ve forgotten), they almost didn’t have to. The removal of dodge and parry from gear itself was enough to guarantee that such a transition was happening.  The fact that all of the new secondary stats have a clear impact on survivability as well as damage output just further reinforces it.  So we can expect to see big changes to Warrior and Death Knight mechanics once 6.0 goes into beta to embrace haste and crit as true survivability stats.

It’s not clear yet whether every stat will have to have a tanking impact.  For example, right now Paladins don’t benefit much from critical strike rating unless they take Eternal Flame, and even then the impact is fairly small.  It may be that crit rating will still be our dump stat in Warlords of Draenor.  But it wouldn’t take much to make it at least a contender.  If Seal of Insight were able to crit, that would give crit rating some baseline value.  We could also get a secondary mechanic to help bolster it – something like a small HoT effect when certain spells crit, for example.  We’ll just have to wait and see what Blizzard decides on that front, I guess.

Vengeance = VICTORY!

There is, however, no question as to my favorite change.  While it wasn’t announced outright during the convention (again, maybe it was during the Q&A and I just missed it, but it’s doubtful), it came out during discussions with developers at the after-parties.  While I got a chance to talk with a few devs in various degrees of detail at BlizzCon, I wasn’t the only one, so I don’t feel bad about sharing it.

-Vengeance changed to increase tanking abilties, rather than pure AP. He wants Tank DPS to be roughly 75% of a DPS’ output.

Is it tacky to declare victory? Because we’ve suggested exactly this solution several times before.

In all seriousness, this is a huge change for a number of reasons. Mel and I have been blogging about vengeance for a long, long, long, long time.  Many of the more blatant problems have been cleaned up by hotfixes along the way, but some of the core problems remain.  One of those is that our DPS as tanks depends sensitively on taking damage.  That makes our damage drop off during off-tanking periods unless we play awkward taunting games to keep Vengeance high, and more importantly it makes playing through solo content infuriating because we do so little damage.

When 80% of your damage comes from having a raid boss nearby, dailies become an infuriating exercise.  I no longer even think about doing dailies as prot, because for an entire expansion now, I’ve had to switch to Ret to be even remotely efficient with my time.  And as I’ve mentioned in earlier blog posts, the feeling of loss of control over your own DPS potential is somewhat demoralizing, because it takes control away from a role that is obsessed with having control in the first place.

This change reverses all of that.  If our default output is nearly 75% of a regular DPS class, we’ll actually be able to perform solo content in a sane amount of time.  The only concern I have is that we may be too strong in PvP situations, but maybe that’s intentional.  Players have been bemoaning the inability to PvP as a protection spec, so maybe this will bring that back.  And anyway, it’s not like balanced world PvP exists anymore.

I’m ecstatic about this change for another reason: I’ll finally be able to evaluate my performance easily with logging sites again.  It’s incredibly annoying to realize that you have absolutely no idea what DPS you should expect to be able to do on an encounter.  You can compare to other guilds’ logs, but there are so many variables involved that the comparison is nearly meaningless.  Your DPS swings drastically with a number of different factors, including your guild’s strategy and which of your tanks happens to be tanking first.

Un-linking Vengeance from DPS fixes a lot of that, which means I can finally make more useful comparisons between myself, my co-tank, and other tanks.  It will also make tank DPS balance a little easier to achieve on Blizzard’s end, because the range of AP values over which the five tanking classes need to be roughly equivalent just became a lot smaller.

Looking Forward

There’s really no connecting thread that links all of these different ideas, so it’s hard to come up with a conclusion for this post.  The best I can do is to say that there are a lot of different exciting and awesome changes coming in Warlords of Draenor, and you should be as excited about it as I am!

Even though I think the story is sort of hackneyed, the mechanics changes are great and foreshadow what will likely be the the best expansion for tanks yet. We’re getting many of the significant changes that we’ve asked for during MoP: a less frustrating and more functional version of Vengeance, consistency between the stats we want and the stats that show up on our gear, the removal of boring stats like dodge and parry, the elimination of the tank expertise penalty, and much more.

That’s not to say there aren’t changes we can still hope to see.  I plan on vigorously campaigning for Holy Wrath to lose/modify its meteor effect so that we once again have a functional many-target snap aggro tool.  And Meloree will tell you that the game still lacks a good mechanic to tie DPS to tanks, completing the DPS-Tank-Healer trinity.  That role used to be filled by threat, but I think that ship has long since sailed.  But it’s hard to look at the wealth of other quality of life and toolkit improvements we’re receiving and not be very pleased with the direction Warlords of Draenor is taking.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , | 34 Comments

Post-Blizz-Con Wrap-Up, Part 1

I had a great time at BlizzCon, and met a lot of great people.  It’s really fun to meet someone in person that you’ve only ever interacted with online, either via Twitter, in blog comments, or in-game.

Unfortunately, this year I didn’t take great notes on where I was and who I met every day, so I’m not going to try and quickly write up a recap.  While I have a lot of great stories, unless I sit down and carefully re-trace my steps I’m guaranteed to forget someone or something, and then people will just feel disappointed that I forgot them.  If I have time later on, maybe I’ll try and put together that sort of recap.  But probably not – the further we get from BlizzCon, the less I’ll remember and the less relevant it will be.  Plus, there are more than a few people who would probably prefer I didn’t share all of the stories I have from when they were drunk.  You know who you are. 😛

So I’ll just say that I had a blast meeting everyone.  I had a lot of great conversations, met a lot of people I already knew and respected, and met a lot of people that I barely knew at all beforehand.  But I was very happy to meet all of you, whether you’re e-famous, or a theorycrafter, or a player that’s read the blog once or twice, or just an avid player that recognized my name.

Instead, this blog post is going to focus on the game-related news from BlizzCon, and my reactions to that information.  And today, we’re going to start with something that you probably never expected to see on Sacred Duty: A lore discussion!

No, not that type of lore!

No, not that type of lore!

Time Is On My Side…..

There are very few things I disliked about the new announcements.  For the most part, I’m on board with all of the changes they’re making in Warlords of Draenor.  But the story is one of those things that will never completely sit well with me.

Now, to be fair, I wasn’t that enthusiastic about Mists of Pandaria at first either.  For too many years, I’d assumed Pandaren were a joke race that would never see the light of day in WoW, and it was hard to break that mental stigma.  Plus, really, Pokemon in my WoW? And yet, here we are two years later, and they totally pulled it off. Mists has arguably been one of the stronger expansions, both in terms of gameplay and in terms of story and lore.  Even though I’ve never participated in a single pet battle, I know far more people who have done so and enjoyed it.  Even though I thought Pandaren were a joke, they turned out to be a well-fleshed-out race complete with culture and history that made them pretty bad-ass.  So it’s clear that first impressions can be misleading.

However, my objection to the Warlords of Draenor story is more fundamental.  Buckle up, it’s physics time.

You see, I absolutely hate time travel in video games.  As a physicist that specialized in things like quantum teleportation and superluminal pulse propagation, I’m intimately familiar with the important and fundamental role causality plays in physics.  And that’s made me fairly intolerant of any sort of “faster than light” or “time travel” suggestions, whether it’s in the media or in a game’s story.  When my own research was featured on slashdot, I was of course ecstatic, but still a little dismayed at the sensationalist presentation.  I have a self-created mental block on the entire idea of time travel.

So the “we’re going back in time” nature of the Caverns of Time has always sort of bothered me. While I love getting to see the stories from the earlier Warcraft games recreated in WoW, it’s always been a struggle for me to reconcile the time travel elements with the rest of WoW’s story.  And don’t even get me started on the whole Rhonin / Dragon Soul storyline – it was around that time that I threw my hands up in despair and completely gave up on WoW’s story ever making sense again.

Because frankly, time travel in games and movies has rarely been done well in my experience.  Generally, the amount of hand-waving that has to be done to justify time travel just creates new paradoxes that make the whole thing feel silly to me.  Once you allow the possibility of time travel, I feel like a narrative loses a lot of its motivation.  Striving to kill <current Big Bad> becomes much less suspenseful if you know that some intrepid time-traveler will just come fix it if you screw up.  Not to mention the inherent paradox there: if they’re coming back to fix it, shouldn’t they be here now? Why aren’t they?

We Are The Worlds

WoW’s take on time travel is a fairly standard one.  As far as I can tell, it mimics the “many worlds” interpretation of Quantum Mechanics.  In layman’s terms, every time a choice is made, the world splits into two or more different timelines, one for each possible outcome of the decision.  So for example, there’s a timeline where Garrosh dies at the end of Siege of Orgrimmar, and a timeline where he’s spared and imprisoned by Taran Zhu.

When we go back in time to the Black Morass or The Battle for Mt. Hyjal, we’re sticking within our own timeline and trying to prevent the Infinite Dragonflight from altering the events of our timeline.  Which again, raises the silly paradox of one-upsmanship: if we succeed, why don’t they just go back in time again to re-alter it?

On the other hand, in the End Time instance, we’re traveling to a future timeline where Deathwing wasn’t defeated. And in the Well of Eternity dungeon, we’re going back in time to take the Dragon Soul so that Thrall can use it in the future.  But if we take the Dragon Soul out of the past, doesn’t that change how the rest of history unfolds? Why do we return to a present that seems basically unchanged from how we left it?

The only rational explanation for this… well, ok, let me step back a moment. There is no rational explanation for this, because the whole pile of time travel nonsense is inherently irrational.  But the least irrational way to rationalize this is to assume that actions in one timeline don’t necessarily affect the others.  So maybe we’re going back to alternate timelines that are eerily similar, but not the same as, our “own” timeline.  So yeah, we stole the Dragon Soul from another timeline, which then caused all sorts of chaos in that timeline, probably resulting in the deaths of millions of people or something.  But it’s okay, because those people all sucked anyway, since they weren’t from our timeline.

I shouldn’t have to point out the multitude of problems with that interpretation.  What makes our timeline (or us) special? Is the Thrall in our timeline the same as the Thrall in other timelines? Am I the same person in this timeline that I am in another? How do I know which one is the “real” one, or are we all real?  And what does that say about individuality or free will if I’m just one of an infinite number of Theck clones in different timelines, all of whom have the audacity to jump into each others’ timelines and fuck around with them?

All aboard the Crazy Train, I guess!

The story of Warlords of Draenor, as it’s been explained to us, takes this concept to the next level by linking different timelines more strongly.  It’s depicted graphically below, though I can’t take any credit for the diagram; I stumbled across it in a post by Klaudandus on Maintankadin.  Garrosh manages to travel back to old Draenor and create a new split in the timeline.  The “Alpha” timeline on the diagram is the one we know and have played through, where Garrosh never went back in time.  The “Beta” timeline is the one in which Garrosh alters the events in that timeline to create the Iron Horde.  He then somehow opens a new portal that links the Beta timeline to a point in the future of the Alpha timeline.

Mock-up of Warlords of Destruction’s time-travel story. Note that I didn’t come up with this graphic, I got it from the forum post linked in the text. If you know/are the creator, please contact me so I can give you the appropriate credit!

The reason I say it “takes this to the next level” is that instead of allowing a few individuals hop around on the Timeline Superhighway, it’s essentially creating an on-ramp linking Interstate (“Intertime?”) Beta Draenor to Future Route Alpha Azeroth, so that anybody can hop back and forth between the two and cause chaos.  And instead of connecting two instants in time (whatever the hell an “instant in time” means in this hackneyed excuse for a coordinate system), it’s a continuous link, such that there’s a one-to-one correspondence between the time and date in Beta Draenor and the time and date in Alpha Azeroth, just with a pretty hefty offset.  I’m not exactly sure what that offset is – years? tens of years? hundreds of years? Does time even have any meaning if we’re going down this time-traveling rabbit hole?

What Did That Cat Ever Do To You?

So yeah, clearly I’m not a fan of the time traveling crap.  It’s just too problematic.  I know most people can just gloss over it and enjoy the ride, but I’m not one of those people. To me it just smacks of lazy storytelling in the same way that the many worlds interpretation smacks of lazy science.  Even calling it “science” is being generous, in fact. A large proportion of the scientific community (probably the majority of it) doesn’t consider the many worlds interpretation to be science at all, because it is inherently not falsifiable through any means we can conceive of.  In that sense, it’s no better than a religion, because we can’t test it. Scientists have long since accepted that there are other, slightly less insane ways to rationalize quantum mechanical behavior.  Though, for the reader’s amusement, I’ll note that to do so we say that we give up on “reality” to keep causality, which sounds even more off-the-wall.  But it actually does make sense once you rigorously define what we mean by “reality.”

In short, it’s the “Schrödinger’s cat” explanation you’ve probably heard about but never really understood.  You put a cat in a sealed box with some sort of random mechanism to kill the cat.  Traditionally it’s a vial of poison triggered by a chunk of nuclear material, but anything that is both random and fatal will work, so it could just as well be a pistol triggered by sunspots or a high-voltage arc triggered by seismic activity or a time traveler. Also, clearly physicists are awful human beings for doing this to a cat. Poor cat.

I've been waiting to use this in some context for *years*

I’ve been waiting to use this in some context for *years*

But the point of this gruesome example is that once the box is sealed, we don’t know whether the cat is dead or alive.  We could assign a probability to each (i.e. the cat has a 50% chance of being alive), but we don’t know for sure until we open the box.  In quantum mechanical terms, until we open the box the cat is both dead and alive! Or more precisely, it’s in a “superposition state” where it is simultaneously dead and alive.  It’s only when we open the box that the cat “decides” which it is for sure.

Now that may seem ludicrous, and to be fair it is ludicrous for a cat for a few reasons.  But it’s not ludicrous for quantum-mechanical systems, which is what the principle really applies to.  A quantum-mechanical particle has a number of properties (spin, momentum, position, energy, etc.) that aren’t completely decided until it interacts with something in such a way that the property needs to have a fixed value.  For another analogy, assume the particle could be one of two colors: red or blue.  Until it interacts with another object in such a way that it’s color matters (for example, someone observing it), it isn’t one color or the other, it’s in a superposition state of being both red and blue.  Note that this isn’t the same as being purple!

This is what we mean by giving up “reality.”  We have this inherent notion that objects have fixed, well-defined properties – a tennis ball is yellow, my car is blue, and at any given point in time those two objects have a particular position and velocity.  We call that concept “reality” because we assume that each object has “real” properties – i.e. that the tennis ball really is a tennis ball and won’t suddenly become a baseball.  But on the quantum-mechanical level, some of that goes out the window.  If the ball could be a tennis ball or a baseball, it’s both until we make a measurement.  And when we measure it, we have a random chance of discovering that it’s either (i.e. it isn’t just that we don’t know which type of ball it is until we measure, it’s that it literally isn’t one or the other until we measure, at which point it randomly decides which one it is, as if it were flipping a coin).

The fun part of all of this is that giving up “reality” is a choice we make.  To be able to explain experimental results, specifically with regards to entanglement, we have to give up either reality or causality.  That means that, technically speaking, we could give up causality if we wanted to preserve the fixed nature of things.  Most scientists have decided that reality is the one we need to give up though.  There’s far more evidence that causality is preserved (both in quantum mechanics and other branches of physics) than the alternative, which is that our intuition based on macroscopic objects simply doesn’t apply at the quantum level.

Mother May I?

In addition to all of the paradoxes and concerns I’ve raised already, perhaps the biggest issue I have with time travel is the notion of free will.  If you assume that time travel exists, and that people can do so willy-nilly, then eventually you need to accept that the timeline you’ve experienced has already been altered in every conceivable way possible by every person that cared to interfere with it from all times in the future.  At which point, what’s the use in caring about anything? It’s hard to make it feel like your actions matter when there’s the ever-present threat of a time traveler erasing everything you did.  And if what you’re experiencing is already the result of those efforts, did you really have a say in how things turned out? Or are you just dancing to the tune of some time-traveling puppeteer?

Some of that can be explained away by making time travel difficult, expensive, or limiting it in some other way. If only a few select people can pull off such a feat, it’s a little more palatable.  Or so the reasoning goes, I guess? I don’t really buy it, because those arguments always assume that future technological advances will never reduce that cost. The concept of nearly-instant global communication between any two people would have been unfathomable to society even 100 years ago.  Yet today we have cell phones that let us do exactly that.  And when in the future, someone discovers the time-travel equivalent of cell phones, then what? Better yet, why haven’t they brought that technology back to us already?

I should also point out that the whole “limited” point of view falls sort of flat for WoW. We have a giant portal connecting two timelines now. And for years we’ve had countless adventurers skipping back and forth through the Caverns of Time as if they were on a day cruise to Lets-Take-A-Shit-All-Over-Continuity’s-ville.

Really, the only “good” variation of time travel I’ve ever seen in a video game is in the Assassin’s Creed series.  I’m sure that biologists and geneticists reel at the entire pseudo-scientific “genetic memory” concept that the game invokes.  But the mechanic works well for a lowly physicist like me.  By retrieving “memories” encoded deep in DNA, a modern-day protagonist can go back and re-live the experiences of one of their ancient ancestors through a virtual interface.  In other words, you’re playing a video game in which your character… plays a video game about their ancestors!

Yeah, it's sort of like that.

Yeah, it’s sort of like that.

Really though, the animus mechanic solves all of the major problems with time-travel in games, because it’s distinctly not time travel.  It puts strict causal constraints on the problem, because you can go back and re-live the environment and world, but you can’t do anything that alters the course of history.  While it’s more constricting from a story-telling point of view, it’s also a lot more sane.

Or like that.

Well, ok, mostly sane.  Just don’t take that guy’s word for it.

In Part 2, we’ll talk about some of the actual mechanics changes that were talked about at BlizzCon, and what I think about those.  Which is probably far more relevant given the nature of this blog.  Bet you never thought you’d see a 2000-word rant on game lore at Sacred Duty!  Maybe I should have saved this post for April Fools Day?

Posted in Humor, Theck's Pounding Headaches | Tagged , , , | 41 Comments