## Protection L100 Talents

I want to talk a bit about our level 100 talents, but not in the way you might expect. This post isn’t going to be full of numbers, nor is it going to be a comparison of how the talents perform in a raid setting. In fact, there’s very little quantitative analysis of the talents in it at all. What I want to talk about is how the talents feel on a purely qualitative level.

Obviously that makes much of this post opinion, rather than fact. So keep that in mind – none of this is based on numbers, it’s all based on how I feel about the talents, and most of these opinions were formed well before I started running simulations to figure out how well they perform. You’ll get the performance posts soon enough, once I have time to write those up.

Holy Shield

Holy Shield is my favorite of the talents, but I’ll admit that I may be unreasonably biased just because of the name. During MoP beta, I campaigned for Shield of the Righteous to be renamed Holy Shield, because Holy Shield is just more iconic. In fact, I’d still love to see the names swapped. I’d love Holy Shield to be our active mitigation and Shield of the Righteous be the talent.

Regardless though, there’s a lot to like about this talent. It plays into the “block tank” theme that our kit was ostensibly based around before we lost that niche to warriors and Shield Block. It seems like Warlords is trying to bring a little bit of that back. We have the Improved Block Draenor perk that brings our block value up to 40%, and both of our tier set bonuses center around blocking as well. The two-piece gives us Faith Barricade, increasing our block chance by 50% after casting Avenger’s Shield, and the four-piece gives us a chance to proc Defender of the Light every time we block, boosting our block value by 50%.

Not only do we block more often with the talent, but we get the unique ability to block spell damage, which is pretty cool. If that doesn’t sound cool to you, consider that with a little mastery-stacking we can reach block cap while Faith Barricade is active, because the buff’s effect isn’t subject to diminishing returns. So if we save Avenger’s Shield for that large incoming magical attack, we can guarantee 40% mitigation of that attack through blocking. And if we’re lucky and Defender of the Light is active, we’ll mitigate 90% of it.

The damage return is just fun for nostalgic reasons. It brings back a bit of the BC- and Wrath-era “round up all the things” strategy that some of us miss. The damage isn’t shabby either, at 50% of your attack power. In an AoE situation, this talent should really shine. The downside is that it obviously doesn’t provide any damage output when you’re not tanking, but if the coefficient is tuned properly that can still be tweaked to balance it with the other two talents.

So if you couldn’t tell, I’m pretty positive on Holy Shield. Unfortunately, I’m not as positive on our other options.

Seraphim

Seraphim is an interesting idea. 50% of the time (15 second duration, 30 second cooldown) you become a giant ball of bad-ass with inflated stats. Seems fun, especially since it comes with a pretty animation. Seems like a great choice for fights with tank swaps, since you can pool up holy power to prepare for the taunt and get higher effective uptime out of the talent.

What really bothers me about this talent, though, is the holy power cost. Five holy power is steep. One of the things that Blizzard finally learned after Cataclysm was that a resource system is sort of meaningless if all you do is build up to the cap and then dump. That’s why we got Boundless Conviction in Mists – to turn holy power into a real resource that we could pool and spend, and make meaningful decisions about how and when we do either.

And yet… despite learning that 3-HP ability costs in a 3-HP world were limiting and frustrating, here we have a 5-HP ability in a 5-HP world. Apparently the lesson wasn’t taken to heart. I’d much, much rather have a 3-HP version of Seraphim that gave us 800 of each stat for 15 seconds or 1000 of each stat for 12 seconds than the current version. That 5-HP cost is just going to feel awkward when we’re used to spending only 3 at a time.

With a 5-HP cost, we’re almost guaranteeing that we won’t be using Shield of the Righteous in the ~5-6 seconds before Seraphim comes off of cooldown. Otherwise we’ll be delaying Seraphim and getting less bang for the buck out of our talent choice. Divine Purpose procs might help with that, and we could perhaps assume that our last SotR will cover the first part of that period. And of course, maybe that period is while we’re off-tanking if we’re talking about a tank swap scenario. But for regular old steady-state “take it in the face all day erry day” tanking, this is opening us up to a potentially-dangerous spike window. And turning into an invincible angel isn’t very effective if you’re dead before you get to cast it.

That said, when we get to cast it, it will be nice. We get about 4% reduced damage intake from the versatility rating, about 6% dodge from the critical strike rating, 10% haste, 9% crit, 9% mastery, 15% multistrike, and a fair bit of mitigation and attack power from the bonus armor. And we’ll still get one or two SotRs off during that 15 seconds. It really does live up to its billing as a miniature (or not-so-miniature) cooldown. I wouldn’t be surprised if it’s significantly stronger than Divine Protection, though probably not as good as Guardian of Ancient Kings.

However, it’s going to be very, very strange in the first tier of content while holy power income is still limited. You won’t, for example, want to take Eternal Flame and Seraphim together because they’ll be competing for resources, so Seraphim and Eternal Flame are almost mutually exclusive – they may as well be on the same tier of talents.

There’s also the question of “how many cooldowns is too many?” As part of the tank squish, Blizzard toned down cooldowns across the board. Yet we still have three baseline cooldowns (Divine Protection, Guardian of Ancient Kings, and Ardent Defender) and now two talented options (Holy Avenger and Seraphim). I wouldn’t be surprised if we end up chaining all of those to be nigh-invincible for minutes on end.

Last but not least, it also adds a button to an already busy spec. Many classes lots a lot of buttons this expansion, but not us. We’ve lost relatively few. I guess Seraphim can go where Avenging Wrath used to be on my key binds. But I’m a little disappointed that we’ve gained very little ground in the massive key bind disarmament process, and gaining a new tier of activated abilities isn’t helping with that.

So, overall Seraphim is interesting, and could be fun, but I’m a little worried about it feeling awkward and giving us a little too much cooldown coverage. I guess you could say I’m sort of neutral on this talent. Don’t hate it, don’t really love it either.

Empowered Seals

Empowered Seals is my least favorite talent from a design standpoint. In principle, I’m behind the idea of making seals more interesting. Right now, seals are set-it-and-forget-it buffs for protection. It’s not even worth swapping to Righteousness in AoE situations, because in most cases you need the self-healing from Insight to stay alive. And it’s not even worth the GCD to swap to Seal of Truth for single-target damage. But while I’d like to see seals be interesting, I’m not a fan of this implementation.

First of all, I really don’t like seal twisting. And this talent is advertised as the talent for people who like seal twisting. I’m sure there must be a few people out there who do like seal twisting, but I’m not one of them. Let me explain why.

Empowered Seals may as well be called “Holy Maintenance Buffs.” Having one maintenance buff in a spec is fine, in my opinion. As an example, I had no problem with Inquisition, even though many Retribution paladins complained. Maybe it was a bit annoying when the duration was only 30 seconds, but especially in 5.4 with a 1-minute duration, it was really hard to complain about Inquisition. Not that it stopped people from doing so, of course, which eventually led to it being pruned in Warlords.

When I discussed this with Meloree, he was quick to chide me for jumping on the “maintenance buffs suck” bandwagon. And he has a point. Buff maintenance has its strong points. In particular, it tends to be very good at differentiating players by skill. Even if you invoke a macro to take most of the thought out of it, you’re still choosing which GCDs to use to refresh those buffs, and that decision making process takes skill. And there are players who really like that play style – likely the same people who love seal twisting.

So I want to make it clear that my distaste for the talent isn’t because I hate all maintenance buffs. Having one or two maintenance buffs is perfectly reasonable, and in fact is even desirable. In Mists of Pandaria, we have two buffs that qualify as “maintenance,” or at least “actively managed” buffs: Shield of the Righteous and either Sacred Shield or Eternal Flame. And I feel like that game play has worked out rather well for us. It certainly hasn’t felt cumbersome to manage those two buffs.

However, juggling multiple maintenance buffs can quickly sap the fun out of a spec. Cataclysm-era subtlety rogue felt a lot like that, when I was fooling around on an alt in sub-par gear. While it was interesting to learn how to effectively keep Slice and Dice, Rupture, Hemorrhage, and Recuperate all up at the same time, it ended up feeling like I never really got to spend those combo points showing the boss the pointy end of my Eviscerate.

The entire seal twisting concept is all about swapping seals every 8-10 seconds. By definition, it’s adding up to 3 more maintenance buffs we have to watch and maintain. And those are added to the two maintenance buffs we already have. If you consider that we might be spending up to 3 GCDs every ~20 seconds to cycle seals, and another one on Sacred Shield (I’m cheating here a bit, because I know it’s simming ahead of EF) every 30 seconds, you’re looking at spending 18%-30% of your GCDs on maintaining buffs.

That just seems excessive to me. Compare that to MoP Retribution, which lost Inquisition in part because it felt like too much of an annoyance to maintain. It only spent 1 GCD every 60 seconds, which is a paltry 2.5% of your GCDs at most, and that was considered too much. How do you think Rets will respond to using even 10% of their GCDs to keep two of those buffs up? How much worse will it feel for a starter prot in low haste gear using almost one third of their GCDs to maintain buffs?

Even if seals were off-GCD (and thus didn’t interfere with the rotation), I think I would dislike it, because you’d still be juggling five different maintenance buffs, which is a little more than I think is fun. Ultimately, I think that sort of gameplay ends up in one of two places. It either turns into a game of watching buffs rather than playing your character, or you get an addon or macro to remove the thought from the process. Neither of which are ideal for making a spec feel fun to play.

To add to that, we now have one of the more complex rotations in the game. Many specs lost spells and saw their rotation simplified. We lost only periphery spells; none of our core rotation spells were removed. And our rotation (including off-GCD AM) already involved more buttons than many DPS specs do:

Hammer of the Righteous
Judgment
Avenger’s Shield
Holy Wrath
Consecration
Hammer of Wrath
Execution Sentence or Lights Hammer or Holy Prism
Sacred Shield (possibly)
Shield of the Righteous
Word of Glory or Eternal Flame

Empowered Seals adds three more spells to that list. Even if you already had keybinds for them, we’re approaching “John F@#!ing Madden” territory. And that’s all assuming you’re not doing anything to contribute to raid utility. The developers have said they like leaving a few empty GCDs so players can make use of those utility spells. With Empowered Seals, you can kiss those empty GCDs good-bye.

To me, the whole idea just sort of feels awkward and not very fun, which is the same opinion I had of seal twisting in Wrath. Several people have pointed out that you can use a castsequence macro to do the seal cycling with one button, though that’s a gripe unto itself. When the best thing you can say about a talent is, “well, it’s not so bad when you create a macro to do most of the thinking for you,” I think you need to critically re-evaluate whether that talent is a good design.

There’s another problem with Empowered Seals: maintaining balance between the talents. Even without numbers, we can make a philosophical argument for why this will inevitably be a problem. Empowered Seals is situated in a tier with a passive option (Holy Shield). Let’s ignore Seraphim for now. By default, Empowered Seals has to be tuned to be noticeably better than Holy Shield. Why?

If the difference in steady-state performance between Empowered Seals and Holy Shield s tiny, why would you bother taking the active option? It would be both more reliable and easier to just take the equivalent passive option in that case. At least with Seraphim, you’re comparing an always-on Holy Shield to something that has an on/off cycle, and there can be pros and cons for each. But the buffs from Empowered Seals are essentially “always-on” buffs that cost 3 GCDs every 20 seconds, so you’re comparing two static effects, one of which takes more effort.

Likewise, if it takes an enormous amount of concentration to pull off Empowered Seals for a small gain, then why would you bother? At that point, it’s just shelved as a bad talent because it’s ineffective and few will bother to take it. We have historical precedence for this sort of thing – if you have a few hours (days?), go read Cynwise’s excellent set of articles on The Decline and Fall of Warlocks, particularly the third post in the series, which examines the same topic through a different lens. A class, spec, or talent that requires a a large amount more effort or involves a lot more complexity to achieve similar results tends to get marginalized and ignored.

So by design, for it to not be a “bad” talent, it has to provide some noticeable advantage over Holy Shield. And that’s really the issue, because that effort/performance breakpoint is different for each player. For a player going into Mythic progression as soon as it opens, that sort of complexity is something we generally just deal with, because we have the skill to perform the rotation even if we don’t like it. It really only becomes a choice for players that aren’t concerned about min/maxing their performance or don’t have the skill to pull it off.

From one perspective, that makes it perfect for a talent. Since the skill threshold varies from  player to player, making it a talent allows players to choose it based on whether they actually can pull it off or not. In practice though, it doesn’t work that way. We’ve seen it happen over and over again in WoW, and every time the “passive but weaker under ideal conditions” option was the one that everyone avoided. Even by players who really should have been taking it, because they couldn’t handle the complexity of the stronger option. The weaker option was perceived as only for “bad” players, and nobody wants to be bad! Furthermore, I really don’t want to see Holy Shield become the “bad players” option. It’s too iconic for that sort of fate.

They might be able to tune it so that the margin is “close enough” to make it a valid choice, but it’s unlikely for quite a few reasons. One, it’s a razor-thin margin they need to hit; two, the target will vary wildly with player skill; three, the target will drift with gear and content. It’s somewhat naive to believe that any set of three talents can be balanced perfectly just based on numbers. It’s far better if the tier gives three solid but slightly different options that shine in different situations.

For example, the choice between Divine Purpose, Holy Avenger, and Sanctified Wrath is pretty solid. Divine Purpose gave the highest average uptime on SotR and scaled best with gear, but wasn’t controllable. Holy Avenger gave lower uptime, but gave it to you in the form of an extra cooldown that you could control. Sanctified Wrath’s MoP incarnation was somewhat lacking – it would have been good if it was the high-DPS but middle-of-the-road option, but it ended up being an afterthought instead. It’s Warlords implementation looks pretty solid so far though – higher average Holy Power income and lots of extra damage through Holy Wrath. Each has a niche it can fill, and you may pick different ones for different encounters because each favors different encounter mechanics.

I think they *can* make the level-100 tier interesting in a similar fashion. But it will be interesting only if the three talents provide different strengths. For example, if they turned out something like this:

• Holy Shield – Passive, always-on, best average survivability, average damage
• Seraphim – Highest short-term survivability and burst damage (during buff), but slightly lower average survivability and damage (due to the fallow periods).
• Empowered Seals – Most flexible. Lets you swap from high single-target damage mode (SoT) to high AoE damage mode (SoR) to high-survival mode (SoI), but you can’t have all at once. Average performance (i.e. cycling SoI while tanking and SoT while off-tanking) should be close to the baseline set by Holy Shield.

You can see the idea here.  Rather than picking the “winner” in that tier, you pick the talent that suits your purpose. Empowered Seals might be a 5% DPS increase with SoT up, but a ~5-10% sacrifice in survivability compared to Holy Shield. And vice versa if you’re running with SoI. Empowered Seals would give you the flexibility to adapt to the encounter rather than be stuck with the 30-second Seraphim cycle or the passive Holy Shield benefit.

But none of that works if Empowered Seals is effectively a passive effect that costs you 3 GCDs. If you have two passive choices, you pick the one that works best. If sims tell us that it’s worth spending those GCDs and losing Seal of Insight procs to keep up the buffs we get from Empowered Seals, then we take that over Holy Shield.

The way to make the talent interesting is to make our seals interesting, but I don’t think that seal twisting is the way to do it. Making seal twisting extremely powerful by making each seal grant a very powerful buff is putting lipstick on a pig – it’s covering what I feel is bad gameplay by just making the numbers big. I’d rather see Empowered Seals make seals in a substantial way that gives us interesting choices.

So rather than making Empowered Seals a rotational gimmick tied to Judgment, I’d rather see it actually empower our seals. Make SoT do more significant single-target damage. Make SoR do more significant AoE damage. Make Seal of Insight do more significant self-healing.

Better yet, make each of them into a mini cooldown. Maybe the talent gives you an “Empower Seals” spell that actually empowers whatever seal you have active to make it stronger. Using it doubles the effect of your seals for 20 seconds, with a 1 minute cooldown. Now, every minute you get to make an interesting decision.

Do I want single-target damage? Switch to Truth and pop Empower Seals for a burn phase. Do I want AoE damage for picking up and burning down some adds? Switch to Righteousness and pop Empower Seals for 20-seconds of AoE burst. Do I want more survivability? Use ES with SoI active and you have another mini cooldown. This fills a different niche than Seraphim in that you get to choose your benefit every minute, and to be balanced it would give you a bigger boost to that area than Seraphim does.

That’s interesting seal game play that actually involves making choices, rather than just mindlessly cycling through maintenance buffs.

## Simulationcraft Automation Tutorial – Part II

In the last post, we went over the interface of the Automation Tool and demonstrated how to use it to perform simple talent, glyph, and gear comparisons. Today we’re going to dive into the last comparison type: Rotations.

This comparison type is really the highlight of the tool. While the other comparison types are neat, this one is amazing. It’s an incredibly powerful way to test different rotations against one another, though of course, with that power comes some added complexity.

Interface

First, let’s take a look at how the interface changes when we choose “Rotation” from the Comparison Type drop-down box.

The Rotation Comparison Interface.

The first thing you’ll notice is that everything is enabled. This mode uses all of the text boxes. Though you’ll notice that the “Default Rotation” text box has been renamed “Actions Header,” and the previously-unused box in the lower left is now the “Actions Footer” box. We’ll talk about what those do shortly. The center box is now renamed the “Rotation Configurations” box, and the right-hand-side Rotation Abbreviations box is now editable.

I want to draw your attention to the text in the Rotation Configurations box though. It may look peculiar to you unless you’ve read some of my earlier MATLAB work, like the 5.4.2 Rotation Analysis. If you have read that post (or others like it), you might recognize these as rotation shorthands:

Shorthand rotation configurations.

The first one is equivalent to a rotation where your priority order is Crusader Strike, then Judgment, then Avenger’s Shield. When we hit Import!, it translates this to an actor with that action priority list:

The generated actor for CS>J>AS

How does it know that CS means crusader_strike and so on? That’s what the Rotation Abbreviations sidebar is for. So let’s look at that in more depth.

Abilities

The Rotation Abbreviations sidebar defines all of the shorthands and their longhand equivalents. It’s essentially the dictionary the tool uses to try and make sense of the input you give it. It’s divided into sections by headings marked with six colons, the first of which is “:::Abilities, Buffs, Glyphs, and Talents:::” as shown below.

The Abilities section of the Rotation Abbreviations sidebar.

As you can see, we’ve defined shorthands for a bunch of different abilities using the syntax:

shorthand=longhand

When you hit Import! the tool’s shorthand decoder takes the shorthand you give it and checks each ability against the shorthand side of each line of this section. If it finds one, it replaces that text with the longhand version. So for example, it searches for”CS” in this section, finds the “CS=crusader_strike” entry, and then creates an action priority list line for crusader_strike. As you can see in the example above, we use the greater than sign (>) in the shorthand to indicate a new line on the action priority list. So “CS>J>AS” becomes a three-line action priority list with crusader_strike, judgment, and avengers_shield.

This section also contains the definitions for particular buffs, glyphs, or talents. We’ll see how they get used a little later on, for now don’t worry about them.

Note that this text box is editable, meaning you can define your own shorthands if you want to. If you really want to use “Co” instead of “Cons” to represent consecration, you can change it! Likewise, if you want to add abilities (or buffs, glyphs, or talents) that aren’t already on the list, you can add new definitions by typing them into this section. Some specs already have a bunch of built-in shorthands written by SimC devs for that class. If your spec doesn’t, see the end of this post for information on how you can submit a list.

The shorthand syntax is much more powerful than just allowing you to define strings of abilities though. Most action priority lists use conditionals, so the tool has a way to let you do that in shorthand as well.

Options

The second actor uses the shorthand “CS>J>AS+GC” to demonstrate the use of options. If you look further down the Rotation Abbreviations sidebar, you’ll see a section labeled “Options”:

The Options section of the sidebar.

On this list you can see that we’ve defined an option “GC=buff.grand_crusader.react.” The tool uses the plus sign (+) as the indicator that you’re trying to specify an option, which is just a conditional for the use of that ability. The syntax “Ability+Option” translates into “ability,if=option” during the decoding process. Thus, “AS+GC” translates to “avengers_shield,if=buff.grand_crusader.react.”

The final example just shows that you can combine different options with logical operators, just like you can with conditionals in action priority lists. The entry

AS+GC&(DP|!FW)

translates to

actions+=/avengers_shield,if=buff.grand_crusader.react&(buff.divine_purpose.react|!glyph.final_wrath.enabled&target.health.pct<=20)

Obviously this is a nonsense conditional, it’s just there to demonstrate the flexibility you have with the system. With appropriate ability and option definitions, you can create almost any action priority list you can dream of here. One word of warning, however: note that while you can use logical operators like &|+-*/<=(), you cannot use >, because the decoder thinks that’s you telling it you’re starting a new action priority list line.

One way around that limitation is to use the pound sign (#), which is a special character definition in options. For example, the option

HP#=holy_power>=#

lets you specify an option like “HP3″ that will translate to “holy_power>=3″. The # sign will match any number including decimals, so it would properly translate HP3.2 into “holy_power>=3.2″. This should let you get around the limitation of not being able to use the greater than symbol in a logical expression.

There’s another special sequence for options that can be useful when an option depends on the ability it’s modifying. In an option, the text “$ability” will be replaced with the name of the current ability. So for example, since we have the definition “AC#=action.$ability.charges>=#”, a line like “AS+AC1″ would translate to

avengers_shield,if=action.avengers_shield.charges>=1.

Obviously this is nonsense since Avenger’s Shield doesn’t have charges, but this could be useful for abilities that do, like Shield Block.

Operators

The third section of the Rotation Abbreviations sidebar is called “Operators,” and it defines abbreviations that allow for more flexible creation of operators. An operator requires an additional abbreviation which it acts upon, called the “operand.” The syntax for this is “Operand.Operator”, like so:

GC.BA

When the decoder encounters the syntax “X.Y”, it checks for X in the list of Ability shorthands and checks for Y in the list of Operator shorthands. If it finds both, it will replace “X.Y” with the operator longhand, and then look for the text “$operand” in the result, replacing any instances of that text with the longhand form of the ability. The list of operators. In our example, the ability list contains “GC=grand_crusader” and the operator list has “BA=buff.$operand.react.” So the decoder will first replace “GC.BA” with “buff.$operand.react”, and then replace the “$operand” in that with “grand_crusader.” The final result is

buff.grand_crusader.react.

You might wonder why we would want to use an operator when we can just define an option to handle the entire conditional expression. The answer is primarily one of flexibility. The “.BA” operator lets us test for a buff with any abbreviation we’ve already defined in the Abilities section. Likewise, the “.CD” operator, we can check the cooldown of any of those abilities. By using  operators, we don’t need to define a new option for each cooldown, buff, or debuff that we want to check for.

Operator/operand pairs like “GC.BA” or “CS.CD” are just special types of options, so just like other options they can be combined using logical operations. The decoder will properly parse “AS+GC.BA&CS.CD0.5&E” as

avengers_shield,if=buff.grand_crusader.react&cooldown.crusader_strike.remains>=0.5&target.health.pct<=20

Just as with abilities and options, you can define your own operators by adding them to the appropriate section of the sidebar. There are a few things to keep in mind when defining your own shorthands in any of the three sections.

• Shorthands are all case-insensitive, so if you define “AS=avengers_shield” both “AS” and “as” will be matched and converted by the decoder. This means you can’t use “AS,” “As,” and “as” to stand for different abilities.
• Each section of the sidebar is independent. So you can define
• “AS=avengers_shield” in the “:::Abilities:::” section,
• “AS=cooldown.avengers_shield.remains” in the “:::Options:::” section, and
• “AS=action.$operand.charges” in the “:::Operators:::” section and the decoder will be smart enough to use the appropriate one based on syntax. So “AS+AS&AS.AS” would parse just fine. • The Rotation Abbreviations sidebar is saved when you exit the program, so if you add shorthands they will be there when you start the program up again. However, the sidebar is completely reset every time you choose a new class or spec from the drop down boxes, and your changes will be discarded. This is done partly because it’s far easier than saving the text of the sidebar separately for each class, and partly to make it easy for me to add abbreviations in the future. If there’s a shorthand abbreviation you feel should be added to a spec’s defaults, please contact me (here or via an issue ticket) and I’ll add it to the list. • The one thing you should absolutely not do is change the section labels in the sidebar. The code uses the “:::stuff:::” pattern to separate the sidebar into three separate tables of shorthand/longhand pairs, which it then uses to perform the replacement. If you delete one of the colons, the program will probably crash when you hit the Import! button. Now that we have all the shorthand pieces, let’s see how we use the tool to perform a simple rotation comparison. Using The Rotation Comparison Tool For our first foray into rotation comparisons, we’re going to stick to something simple. Let’s test the ideal priority of our holy power generators: CS, J, and AS+GC. With only three abilities, there are six different permutations we could consider: Our first rotation comparison. When we hit Import!, we get a list of six actors, each of which has the appropriate action priority list entries. The generated profile. You’ll note that the rotation comparison automatically gives each of these actors a name based on the shorthand you provide. You can override this automatic naming by providing your own as an option, just as we did in the earlier comparisons. For example, changing the first line to “CS>J>AS+GC name=Alice” will add a “name=Alice” line to the end of the profile which will overwrite the automatically-generated name. If we run the simulation, we again get a report like before, showing how each actor performed: Report for Rotation Comparison #1 The resulting rotation comparison report. Since we’re only running this with 1000 iterations, the error is about 10 DPS, so the rankings aren’t all that conclusive. In fact, there’s a much more glaring error in these results that would call their validity in question, which we’ll discuss in more detail shortly. However, it’s clear from this example how you might go about comparing different rotations. And note that just as we did with the earlier comparisons, we can tweak the talents, glyphs, or gear of any actor by adding commands after the shorthand, like so: CS>J>AS+GC name=Alice talents=1111111 glyphs=alabaster_shield CS>AS+GC>J name=Bob talents=2222222 glyphs=word_of_glory But wait, there’s more! Actions Header and Footer Let’s say we want to apply some actions to all actors. For example, if you look at that simulation you might have noticed the glaring error I mentioned earlier: none of the actors ever cast Judgment! In fact, they can’t, because none of them have a seal active, because we never told them to cast a seal. Normally that’s done in the precombat action list, which we never provided. Likewise, none of them actually use active mitigation abilities, so the TMI results are essentially random and don’t reflect the holy power generation of each action list. We could define a shorthand for SotR and put it in every one of the action lists, like “SotR>CS>J>AS+GC,” but that would be a bit tedious and make each configuration harder to read. Doubly so if we added shorthands and entries for Word of Glory and Eternal Flame. And it still doesn’t solve our seal problem. This is where the Action Header and Footer text boxes come to the rescue. They act as bookends for each rotation configuration, allowing you to specify some actions that apply to every actor. So let’s edit the Action Header and Footer boxes and see how this works. Editing the Action Header and Footer The tool will now sandwich each configuration between the text in these two boxes. When we hit Import!, we now have actors which will cast Seal of Insight before combat and use Shield of the Righteous, Holy Wrath, and Consecration: The profile generated after adding text to the Actions Header and Footer boxes. Running this simulation gives us a little better differentiation between the different holy power generator priorities: Results for Rotation Comparison #2 The results of our revised simulation. Another good example is adding an “actions+=/sacred_shield” line to the end of the Actions Footer to refresh Sacred Shield any time you have an empty GCD. I’m sure you can imagine other uses as well. Note that since the tool is just splicing together text, you’re not limited to putting just actions in the Actions Header or Footer. Any valid SimC syntax that you want to add to the end of each actor would work, such as “position=back” or “tmi_window=5.” In fact, you could add those to the Default Gear text box to achieve the same effect. Rotation Syntax: Part Deux What if I want to add something in-between two shorthands? As a (somewhat contrived) example, what if I want to try all of the permutations of CS>J>(other stuff)>AS? As it turns out, you can do that several different ways. If you prefer sticking with shorthand, you could obviously define a shorthand for (other stuff). To make this example more concrete, let’s say that “other stuff” is our level 90 talents, e.g. ES>LH>HPr, or /execution_sentence/lights_hammer/holy_prism in longhand. We could define X=execution_sentence/lights_hammer/holy_prism and then define rotations CS>J>X>AS, CS>AS>X>J, and so on to create our permutations. If the list of “other stuff” is so long that this would be unwieldy, we could make use of another input mode of the Rotation Configurations text box. You see, the text box can also handle full, text-based action lists that span multiple lines. However, in this input mode, we have to separate each actor with a blank line, much like we did in the gear comparison. That might look something like this: A longhand action list declaration. When we hit the Import! button, this text is faithfully reproduced between our Actions Header and Footer. The tool is smart enough to (a) recognize that we’re using a blank space to separate actors, and (b) only try to decode shorthand lines, which are lines containing one or more “>” symbols that do not start with the text “actions.” In fact, we can even mix full action declarations and shorthands, like so: A mixture of longhand and shorthand definitions. You can hit Import! to see that the decoder translates the shorthand lines into longhand for you, and places them in the appropriate order with all of the rest of the text for that actor. The only limitation is that it will not try to decode single-action shorthands, so “CS” won’t be translated, because the decoder treats any line that doesn’t contain a “>” to be plain text. This is why I didn’t use “AS+GC” in Alice’s configuration above. If you need to, you can get around this by “cheating” a bit and chaining the ability together twice. For example, “CS>CS” would be decoded into two crusader_strike lines. The second one would just be redundant, it wouldn’t change any of the actor’s behavior. If your “other stuff” was very long, you could put it in a simple text file (e.g. middle_of_rotation.simc) and use that file name in place of the explicit action description. In other words, you’d replace actions+=/execution_sentence/lights_hammer/holy_prism with middle_of_rotation.simc. And you could even define an abbreviation “X=middle_of_rotation.simc” on the sidebar if you wanted, so that you could use it in shorthands. You really have an incredible amount of flexibility in how you define rotations with this tool. There’s one major limitation that I want to mention though. If you use a new blank line to separate actors, you need to be consistent and do it for every actor. In other words, the input shown below will not work – or rather, it will translate to two different actors rather than three! Mixing single-line and multiple-line configurations. This only produces two actors, the second one having CS>J>AS>CS>AS>J as its rotation. In other words, don’t do this! Final Comments By the end of this tutorial, you should be able to use the Automation Tool to compare all sorts of different things. The goal was to make the tool as useful as possible while still keeping it simple to use. Any theorycrafter that’s familiar with basic SimC syntax should be able to fire up the GUI and use the tool without having to mess around with the command line or batch processing. I’ll be making heavy use of the tool myself in the coming months as I run (and post) simulations investigating our optimal play style in Warlords of Draenor. If you have a question about the tool or a suggestion for improvement, please don’t hesitate to contact me. Leaving a comment here on the blog is fine, or you can catch me in the #simulationcraft irc channel on irc.stratics.com. If you have a long list of shorthand abbreviations to add to the sidebar, you should contact me rather than posting the entire list here, as I have a google doc set up for that which will make it easier to copy/paste your shorthands into the code. There are several things that aren’t well-supported, and may not ever be. The most important is probably the use of multiple action lists and the /run_action_list command. A lot of profiles use this to separate out single-target and multiple-target rotations, or make more complicated rotation paths a little clearer to read. There’s no inherent syntax for defining separate action lists in the shorthand code, though of course you could write them out in full or use .simc files and cleverly stitch them together. That said, this tool was never intended for that type of fine-detail tuning. It uses the same philosophy as our TC101 experimental tests – simplify down to as few variables as possible. So for example, you might use this tool to test different single-target rotations (with one boss), and then use those results to determine the ideal single-target action priority list. You could then add a second boss in the Footer and create a bunch of configurations to test 2-target action lists, and so on. Once you’ve determined each of those independently, you would create a master profile that called the appropriate action list based on the number of targets. By breaking it down into several smaller problems, this tool can still save you plenty of time and effort compared to running sims by hand. | | 3 Comments ## Simulationcraft Automation Tutorial – Part I If you follow me on twitter, you know that I’ve been intermittently commenting on the new Automation Tool that I’ve been building for Simulationcraft. I’m happy to say that the tool is more or less finished at this point, complete with documentation! For those that haven’t heard about this project before, the Automation Tool allows you to quickly generate profiles to compare different talent, glyph, gear, or rotation configurations. For example, if you want to compare 10 different talent configurations, the tool will generate a profile containing 10 different players (“actors”), each of which is identical except for their talents. You can then run that simulation and look at the report to see which one of the actors performed the best, and by how much. The point of this post is to provide a tutorial that supplements the documentation. I’m going to give you a step-by-step guide that shows how to use the tool to perform different types of comparisons. And hopefully, show some of the tool’s impressive feature set in the process. Interface To start using the tool, fire up Simulationcraft (version 602-1 or later, a beta of which you can download here), click on the “Import” tab, and then choose the “Automation” sub-tab. You should be looking at a screen that looks like this: Simulationcraft’s Automation Tab – First Look Let’s walk through the interface quickly. At the top left, we have a drop-down box that lets us choose a “Comparison Type” (currently set to “None”), and to the right there is a help text area telling us that we need to choose a comparison type to get started. In the left column, we have a bunch of defaults. We have drop-down boxes to choose Class, Spec, Race, and Level, and some text boxes to define default talents, glyphs, gear, and rotation. The “Default Talents” box is filled in with a default “0000000” (meaning no talents), and the “Default Glyphs” box has some dummy glyphs for protection paladins. “Default Gear” has some placeholder gear, and “Default Rotations” is empty. There’s also a greyed-out “Unused” box at the bottom. In the center column, we have another greyed-out “Unused” box, and a “Footer” text box. Finally, the right column has a greyed-out “Rotation Abbreviations” box with some placeholder text. We can ignore all of the greyed-out boxes for now. Once we choose a comparison type, some of these boxes will become active and their titles will change to describe what they do. Since this is a paladin blog, let’s set our default class and spec. We’ll click on the Class drop-down box and choose “Paladin,” and then choose “Protection” from the Spec drop-down. We’ll leave the Race and Level as “Blood Elf” and “100” respectively. You may have noticed that the Rotation Abbreviations text box changed when you chose a class and spec – we can ignore that until we get to the section on rotation comparisons. We also need to specify some default talents, glyphs, and gear. For now, we’ll leave the talents and glyphs texts boxes alone (they already have defaults), but the gear box needs some attention. We’ll take the gear a premade level 100 character starts with and put it in this box (you can automatically generate this using the Simulationcraft addon). The left column should now look something like this: Defining some defaults. For now we’ll leave the Default Rotation empty. As it turns out, if we don’t specify a rotation, the sim is smart enough to use the class module’s default rotation, just like it does when you import your character from the armory. Now that we’ve gotten some defaults put in, let’s set up our first comparison. Talent Comparisons The first choice in the Comparison Type drop-down box is “Talents.” When we make that selection, the large text box in the center column becomes enabled and the name changes to “Talent Configurations.” In addition, the text in the box changes to some example talent configurations: “0000000,” “1111111,” and “2222222.” The other change is that the Default Talents text box becomes greyed out, indicating it’s no longer being used. Finally, the help text area gives us some instructions on how to set up a talent comparison. Selecting the “Talents” comparison type modifies the interface. Now let’s talk about exactly what this all means. Because we’ve chosen the “Talents” comparison type, we’ve told the tool that we want to compare several different talent configurations. It’s responded by greying out the Default Talents box and giving us a new place to list the talent configurations we want to try. Each talent configuration is specified as a 7-digit number, with each digit corresponding to a talent tier. The first number is our level 15 talent, the second number is our level 30 talent, and so on. A zero means we haven’t chosen a talent, and a one, two, or three indicates we’ve chosen the talent in the first, second, or third column. So “0000000” is a character that has no talents, “1111111” is a character that’s taken all of the talents in the first column, and so on. Each new talent configuration is specified on a new line, with no blank lines between them (this is important). When we hit the Import! button in the bottom right-hand corner, the tool will generate a profile containing three level 100 Blood Elf protection paladins. Each paladin will have the same glyphs and gear, but the first one will have “talents=0000000,” the second will have “talents=1111111,” and the third will have “talents=2222222.” Try this yourself, it should automatically swap you to the Simulate tab and look like this: The profile generated by our talent automation. As you see, the tool conveniently gives us simple names for our paladins corresponding to their talent selections, and each one is the same except for the “talents=XXXXXXX” line. So that’s the basic function of this tool. It takes the text you supply in each of the boxes and Frankensteins it together to create multiple actors that you can simulate together. Before we do that, though, let’s go back to the Automation tab and make these paladins a little more unique. Since the tool is simply combining bits of text, it’s very flexible. It will let us specify additional text that is specific to each actor. For example, let’s change the text in the Talent Configurations box to this: We can modify the configurations by specifying additional options for each actor. Again, each line is specifying a different actor. But now we have some extra options, like a name, for each actor. Each new option is separated by a single space, so we can string options together like we have for Bob and Eve. The tool is smart enough to put these extra options at the end of the actor’s profile, so the “glyphs=alabaster_shield” command we’ve added to Bob’s profile will overwrite the default glyphs. Likewise, Eve will be sporting a new phony_helm with 100 stamina and will be standing behind the boss rather than in front of him. Also note that for Eve I’ve included “talents=” in the talent definition; the sim is smart enough to look for that and adjust its output appropriately. If you hit “Import!” and scroll down to the end, it should look like this: Our profile after changing the text in the Talent Configurations box. As you can see, the “name=Bob” and “glyphs=alabaster_shield” are at the end of the third profile, and all of Eve’s options are at the end of her profile. Since the Automation Tool puts the profile directly on the Simulation tab, it works just like a profile you import from the armory or another source. The simulation will use whatever settings you’ve chosen on the Options tab, including number of iterations, boss type, fight style, stat scaling calculation settings, and so on. Here’s the report that Simulationcraft generates when we hit the Simulate! button: Report for Talent Simulation This is a raid simulation, so at the very top it shows a summary of all actors, showing DPS, HPS+APS, DTPS, and TMI charts: After simulating, we get a raid report detailing how well each actor performed. We can quickly see here that Eve’s configuration does the most DPS and takes the least damage, while Alice has the highest HPS and gets the best TMI score. The report also has detailed results for each actor that we can scrutinize if we want to figure out why a particular actor is performing better than another. This example was a bit contrived due to the talent choices, but you should be able to see how you would use this tool to compare individual talents. Comparing one actor with “talents=0000001″ to another with “talents=0000002″ and a third with “talents=0000003″ would let you compare the spec’s level 100 talents with one another in the absence of other talents. However, there are some warnings to keep in mind here. This sim uses the default action priority list, so if the APL isn’t properly optimized for a given talent the results will reflect that. In this example, the APL hasn’t been optimized properly for L100 talents, so even if we had looked solely at those talents we wouldn’t want to draw any conclusions about their relative strength from these results. It’s also worth remembering that talents can interact; the value of Empowered Seals could be different if you’ve chosen Sacred Shield than if you’ve chosen Eternal Flame, since Sacred Shield requires GCDs and Eternal Flame doesn’t. And we’ve also added some variables by tweaking certain players’ glyphs and gear rather than using the defaults. In general, it’s good to keep that issue in mind when analyzing the results of this sort of simulation. There are plenty of variables involved, and you should consider all of them before drawing conclusions. As we’ve discussed in earlier TC101 articles, trying to control as many variables as possible leads to better results and an easier analysis. The tool is designed to do that as much as possible for you by setting default glyphs, gear, and rotations, but in the interest of flexibility it gives you ways to circumvent those controls. Glyph Comparisons If we go back to the Automation tab and switch the Comparison Type to “Glyphs,” the Default Talents text box becomes active again and the Default Glyphs box becomes greyed out. The “Talent Configurations” box is renamed “Glyph Configurations” and gets a new set of placeholder text specifying two different glyph configurations. The help text area has also changed to provide instructions for a glyph comparison. The interface changes slightly when choosing a Glyphs comparison type. As you might guess from the screenshot, the syntax for a glyph comparison is very similar to a talent comparison. Each set of glyphs is defined exactly as they are in any other profile, as a tokenized (i.e. lowercase, spaces replaced with underscores, all other symbols excluded) list separated by forward slashes. Each new line is a new configuration, as before. In our example above, one actor has Alabaster Shield and Focused Shield, while the second has Final Wrath and Word of Glory. We could add more actors if we wanted by adding more lines to the central text box. For example, let’s add a third configuration that uses only Alabaster Shield – then we’ll be able to compare the first and third configuration to see how much DPS the Focused Shield glyph provides. We can also specify options like we did in the talent comparison, and to demonstrate that we’ll name this new configuration Al. You have invited Al to the party. At this point, I also want to talk about the Footer text box at the bottom. This box gets added to the very end of the profile, and only once. It’s there so you can specify global options that you wouldn’t want or need to repeat once for each actor. So for example, we could type “optimal_raid=0″ in this box to turn off raid buffs (if we were too lazy to do that on the Options tab), or use this area to specify a custom boss or bosses. To illustrate that, let’s turn off raid buffs and specify a pair of bosses, like this: Using the Footer text area to turn off raid buffs and spawn a pair of bosses. If we hit Import!, we see that we now have three paladins with different glyph configurations, and our Footer text has been faithfully reproduced at the end of the profile. Like before, the tool is stitching together the text in each of the boxes to create the profile. The generated profile for a glyph comparison. Unlike the talent sim, this gives each paladin a generic “G_#” name rather than trying to come up with a descriptive name based on the talent configuration. So the ability to rename each actor using additional options, like we did with Al, is pretty helpful here. Again, we can hit Simulate and generate a report, linked below. Report for Glyph Comparison Raid report for a glyph comparison. Looks like Al did the most damage here. Remember that he’s using Alabaster Shield, while paladin G_0 is using both Alabaster Shield and Focused Shield. And while Focused Shield would be a DPS increase against a single target, here we have three targets, so it’s a DPS loss. Wait, three targets? Didn’t we specify only two Fluffy Pillow bosses? What happened? If you look at the report, you’ll see both Fluffy Pillows and a TMI_Standard_Boss_T17N: Oops, a T17N boss snuck into this sim. That happened because I had the “TMI Standard Boss” option on the Options->Globals tab set to “T17N.” Remember that since we’re using the Simulate tab of the GUI, the sim will use all of the options you’ve specified, including that boss option. If we want to just use the bosses we have defined in the Footer section, we need to change the TMI Standard Boss drop-down to the “custom” setting. If you do that and run the simulation again, you’ll get a report with just our two Fluffy Pillow bosses, and the DPS results of G_0 and Al will be closer together. There’s not a lot more to say about glyph comparisons that I haven’t already said about talent comparisons. The two are pretty similar from the tool’s point of view. You can modify the talents of a particular actor by adding ” talents=xxxxxxx” as an option after their glyphs are specified, just like you could modify glyphs in the talent comparison. If you do use that flexibility, just be aware of it when you’re analyzing your results. Before we move on to gear comparisons, I want you to go to the Comparison Type drop down and switch back to “Talents.” You’ll notice that the central text box reverts to the input we used for the talent comparison. This central text box stores its content separately for each configuration type, so you don’t have to worry about losing your old inputs when you swap between one comparison type and another. It should also save this text (and most of the other text boxes, in fact) in-between sessions, just like most of the other options in SimC do. Gear Comparisons Now I want you to change the Comparison Type to “Gear.” As with the other settings, the help text area updates to describe a gear comparison, and the central text box is renamed. This time it’s set to “Gear Configurations,” and some new example text appears in the center. The Default Glyphs text box is re-enabled and the Default Gear box is disabled. The Gear Comparison interface. Since defining an entire gear set on a single line would be a bit cumbersome, the gear comparison uses a slightly different syntax. Instead of putting each configuration on a new line, it allows you to use multiple lines for a single configuration to make it easier to see what gear you’re defining. Each configuration is now separated by a blank line. So the text we’re starting with here defines two gear configurations, each only containing a head and a neck. If we wanted to, we could flesh these configurations out by adding shoulders, chest, etc. until each was a full gear set. We could also add more configurations as long as each new configuration was separated by a blank line. And we could add options (like a name) to each configuration by adding lines to that configuration. For example, here’s what it looks like if we add a third actor named “premade” that uses our premade character’s full gear set: Adding a third gear set, with a name. That said, you can see how this would get unwieldy fast. If we wanted to define 10 gear sets, all of which used the default premade gear but had a slightly different weapon, we’d be doing a lot of copy/pasting. Luckily, there’s a slightly easier way. I’m going to take our premade gear and save it as a simple text file, like this one: base_gear.simc I’m going to put that file in the “profiles” directory of my simulationcraft folder, like so: Adding a simc file to the profiles directory. Now I’m going to go back to the Automation tab and create three actors, each of which uses a different weapon. I’ll use this file as the base gear set by calling it directly, and then simply add a new weapon for each set on a separate line. The result looks like this: Our modified gear comparison. And when we hit Import!, this is what we get: The generated profile. Again, you’ll notice that it gives each actor a default name like “G_1,” which is why I’m naming each gear set something descriptive. Since Simulationcraft’s input is all text-based, you can actually tell it to read in entire text files on the Simulate tab like we’re doing here. It will check the profiles folder, find the base_gear.simc file, and read that file in line by line as if we had typed the entire gear set right there in the profile. Very convenient! Keep in mind that we had to put the “main_hand=” line after the line containing “base_gear.simc” for this to work. base_gear.simc has its own definition of a main hand weapon, so if we want our new weapon to override it, it has to come second. This works just like a variable in a programming language – if I define “x=1″ and then in a later line of code define “x=2″, the second definition will overwrite the first. If we hit Simulate!, we get a report for the three actors as usual. Note that I’ve deleted the text in the Footer section and set the TMI Standard Boss drop-down back to “T17N” here. Report for Gear Comparison The results of our gear comparison. Each of the weapons has slightly different secondary stats, so they’ll all produce slightly different DPS, HPS, and DTPS numbers. TMI is largely unaffected by such a small change in the gear set. It shouldn’t be hard to imagine uses for this simulation mode. As in this example, we could use it to test out all of the weapons within a tier to see which gives the highest total DPS. We could also define several different complete gear sets to compare combinations of tier pieces and off-set pieces, different enchanting and gemming strategies, and so on. Also note that while I didn’t show it, you can add comments to the text in the Gear Configurations box as well. You can add a line like “#config one” anywhere in the configuration and it will show up in the generated profile. Next Time – Rotations As it turns out the final Comparison Type, which lets you compare rotations, is far more complex than the three we’ve demonstrated today. So much more complicated that it really deserved its own blog post. So stay tuned for next time, when we go over the most powerful part of the tool: the Rotation Comparison. | | 9 Comments ## TC401: Avoidance Diminishing Returns in WoD Avoidance and diminishing returns are topics we’ve spent a fair amount of time discussing in the past. In 2012, I wrote a series of three blog posts discussing avoidance diminishing returns in Mists of Pandaria, chronicling the background of the formulas the community uses, numerically fitting data sets to determine the diminishing returns coefficients, and eventually discussing what those equations mean for gearing. We used similar fitting techniques to find values for druids and monks. Later that year, we used a more precise data set collected with the Statslog addon to fine-tune the values for paladins and uncover some wonky rounding in the block calculation. And in 2013, we performed an exhaustive analysis including lots of pretty surface plots to get accurate values for all of the tank specs. It seemed at first like Warlords of Draenor was going to be an easy expansion in that regard. Celestalon has outright given us the strength-to-parry and agility-to-dodge coefficients and the diminishing returns coefficients for each class. While the provided coefficients are in the format of their internal diminishing returns formula, which is a little different from the version the community usually uses, translating between the two is not difficult. And yet…. First, some quick tests uncovered a bug in the calculation of base parry, which Blizzard subsequently fixed. And in the past two weeks, I’ve discovered another oddity, which I’ve detailed below. As usual with TC101 articles, I’ll give the results up front for reference (and those that don’t want to scroll to the bottom to get them), and then start talking about how we performed the tests. What’s Changed The diminishing returns (DR) formulas for WoD have changed very little from the Mists versions. In fact, the most significant changes are to the contributions that are not affected by DR. In Mists, base strength and agility add an amount of parry or dodge, respectively, that is unaffected by DR. In Warlords, that contribution has changed in the following ways: • Strength no longer grants parry for druids and monks, and agility no longer grants dodge for paladins, warriors, and death knights. • They have subtracted out the “non-DR” parry and dodge contributions from base strength and agility, such that base parry and dodge should be exactly 3.00% (the default for all players and NPCs) plus any race- or spec-based passives. For example, a night elf would have 5.00% base dodge thanks to Quickness, as would a paladin thanks to Sanctuary. However… there are a few quirks. • They have subtracted out the contribution due to the class portion of base strength, but not the portion due to the racial strength modifier. That racial strength modifier’s contribution is also not affected by DR. So a gnome warrior (-5 racial strength modifier) would have 2.97% base parry (-0.0283% due to the -5 strength racial modifier). The same is true for the racial agility modifier’s contribution to dodge. • Finally, there’s probably a small error in the strength subtraction, because all strength tanks (regardless of race) are getting a “phantom” 0.0004178% parry, or about 0.0739 strength worth of parry, that isn’t affected by diminishing returns. The last quirk is the interesting one, because it’s a bug. Not a game-breaking bug, mind you; we’re talking about about 0.0004% parry. For most people, they’ll never even be perceptible. Nonetheless, it’s worth noting because it’s just barely large enough that it could cause a slight discrepancy between calculated and character sheet parry values if not accounted for. For example, if your theorycrafted parry is 21.0199%, but the actual value is 21.0203%, your character sheet will read 21.02% instead of 21.01% (as we mentioned last post, character sheet values are almost always floored, not rounded). I’ve informed Blizzard of the bug, but I’ve been told that given their programmer’s workload and the tiny magnitude of the effect, we should not expect to see it fixed. Warlords of Draenor Diminishing Returns Formulas Since Blizzard has kindly provided us with the diminishing returns coefficients, I’ve chosen to use their formula rather than the community’s traditional one. In the next section I’ll show how to convert the coefficients provided by Blizzard into the conventional ones you might be familiar with. In the following equations,$\text{ParryFactor}$,$\text{DodgeFactor}$,$\text{BlockFactor}$,$\text{VerticalStretch}$, and$\text{HorizontalShift}$are the class-specific Blizzard DR coefficients given in the table at the end of this section.$Q_S$and$Q_A$are the level-dependent and class-specific strength-to-parry and agility-to-dodge conversion factors.$\text{classBaseStr}$and$\text{classBaseAgi}$are class-dependent base stat values, and$\text{raceStrMod}$and$\text{raceAgiMod}$are the racial stat modifiers.$\text{Strength}$,$\text{Agility}$,$\text{Mastery}$,$\text{parryRating}$, and$\text{dodgeRating}$are your character sheet values for each of these stats. Note that I’m giving these equations in percent form. In other words, a$\text{bonusParry}$of 15.3% is 15.3 in the equation. If using decimal form (e.g.$\text{bonusParry}=0.153$) there is an extra factor of 100 in the term with$\text{VerticalStretch}$. The easiest way to accommodate this is to just multiply$\text{VerticalStretch}$by 100. The diminishing returns equations for parry are:$\text{baseParry} = 3.00 + \left ( \text{raceStrMod}+ 0.0739 \right )\times Q_S\text{bonusParry} = \left ( \text{Strength} – \text{classBaseStr} – \text{raceStrMod} \right ) \times Q_S + \text{parryRating} / 162\begin{align} \text{totalParry} & = \text{baseParry}  \\ &+ \text{bonusParry}/\left ( \text{bonusParry} \times \text{ParryFactor}\times \text{VerticalStretch} + \text{HorizontalShift} \right ) \end{align}$The diminishing returns equations for dodge are:$\text{baseDodge} = 3.00 + \text{raceAgiMod}\times Q_A + \text{racial/spec passives}.\text{bonusDodge} = \left ( \text{Agility} – \text{classBaseAgi} – \text{raceAgiMod} \right ) \times Q_A + \text{dodgeRating} / 162\begin{align} \text{totalDodge} &= \text{baseDodge} \\ &+ \text{bonusDodge}/\left ( \text{bonusDodge}\times \text{DodgeFactor}\times \text{VerticalStretch}+\text{HorizontalShift} \right ) \end{align}$The diminishing returns equations for block are:$\text{baseBlock} = 3.00 + \text{spec passives}.\text{bonusBlock} = \text{round}[~\text{Mastery}\times Q_M \times 128 ~]/128\begin{align} \text{totalBlock} &= \text{baseBlock} \\ &+ \text{bonusBlock}/\left ( \text{bonusBlock}\times \text{BlockFactor}\times \text{VerticalStretch}+\text{HorizontalShift} \right ) \end{align}$The tables below summarize the constants in these formulas for different tank specs. Constants By Class Constant Death Knight Druid Monk Paladin Warrior ParryFactor 0.634 1 1.659 0.634 0.634 DodgeFactor 1.659 1 0.3 2.259 1.659 BlockFactor 1 1 1 1 1 VerticalStretch 0.00665 0.00665 0.00665 0.00665 0.00665 HorizontalShift 0.956 1.222 1.422 0.886 0.956 classBaseStr 1455 626 626 1455 1455 classBaseAgi 1071 1284 1284 455 889$Q_S = \begin{cases} 1/176.3760684 & \text{ Death Knight, Paladin, Warrior} \\ 0 & \text{ otherwise}\end{cases}Q_A = \begin{cases} 1/176.3760684 & \text{Druid, Monk} \\ 0 & \text{otherwise}\end{cases}Q_M = \begin{cases} 1 & \text{Paladin} \\ 0.5/2.2 & \text{Warrior} \\ 0 & \text{otherwise}\end{cases}$Race Stat Modifiers Race raceStrMod raceAgiMod Human 0 0 Dwarf 5 -4 Night Elf -4 4 Orc 3 -3 Tauren 5 -4 Undead -1 -2 Gnome -5 2 Troll 1 2 Blood Elf -3 2 Draenei 1 -3 Goblin -3 2 Worgen 3 2 Pandaren 0 -2 Background Blizzard’s formula for diminishing returns looks a little different than the one the community generally uses. If you read community guides (including my old posts on the subject), we tend to use a two-parameter formula:$\text{totalParry} = \text{baseParry} +1 / \left ( \frac{1}{C_p} + \frac{k}{\text{bonusParry}} \right )$where$C_p$is the parry cap (or dodge cap$C_d$or block cap$C_b$for the other formulas) and$k$is a class-dependent constant. It’s not hard to show that this formula is identical in form to the Blizzard ones: multiply both numerator and denominator of the second term on the right-hand side by$\text{bonusParry}$to get:$\text{totalParry} =\text{baseParry} + \text{bonusParry} / \left ( \text{bonusParry} /C_p+k \right ) $In this form it’s clear that$k = \text{HorizontalShift}$, and that there’s a very simple relationship between$C_p$and the remaining two constants:$C_p = 1/ \left (\text{ParryFactor} \times \text{VerticalShift} \right )$and similarly for$C_d$and$C_b$, substituting the appropriate “Factor” constant. There’s also a fourth$C_m$and$\text{MissFactor}$that I’ve omitted since we don’t have ways to change our miss chance. Despite the fact that Blizzard’s formula has three parameters, each individual formula only really uses two.$\text{ParryFactor}$and$\text{VerticalShift}$are redundant parameters, since they only ever show up multiplied together. In theory, they could eliminate one of them entirely in favor of the other; e.g. eliminate$\text{VerticalShift}$and merge that into the$\text{Factor}$variable. Then every equation would have a different$\text{Factor}$variable (just as ours had a different cap constant$C_p$,$C_d$, etc.), but the same class-based$\text{HorizontalShift}$. We can also plug in the values Celestalon gave us to see how accurate our previous fitting session was: Calcualted Cap Values Class$kC_dC_pC_b$Death Knight 0.956 90.6425 237.1860 150.3759 Druid 1.222 150.3759 150.3759 150.3759 Monk 1.422 501.2531 90.6425 150.3759 Paladin 0.886 66.5675 237.1860 150.3759 Warrior 0.956 90.6425 237.1860 150.3759 Empirically Determined Cap Values (from Aug 2013) Class$kC_dC_pC_b$Death Knight$0.95690.6425(74) \pm 0.000010237.186(14) \pm 0.00015$- Druid$1.222150.3759(38) \pm 0.000041$- - Monk$1.422501.253(48) \pm 0.0003290.642(44) \pm 0.00014$- Paladin$0.88666.56744(62) \pm 0.0000060237.1860(40)\pm 0.000055150.3759(469)\pm 0.0000094$Warrior$0.95690.64254(65) \pm 0.0000052237.1860(91) \pm 0.000057150.375(68) \pm 0.00015$Comparing the tables, it’s clear that our fitting system nailed all of these. Most were accurate out to 4 decimal places, and the remaining ones out to three decimal places. Almost all of them were accurate to the reported precision in the table. That gives us a lot of confidence in our fitting algorithms and techniques, which is why we’re able to believe that we can detect a systematic error of 0.0004% parry. Now, let’s see how we figured that out. Testing The Parry Equation Based on several discussions with Celestalon, we know that the original intent was something like this:$\text{baseParry} = 3.00\%$(exact)$\text{bonusParry} = (\text{Strength} – \text{classBaseStr} – \text{raceStrMod})\times Q_S + \text{parryRating}/162\text{totalParry} = \text{baseParry}+\text{bonusParry} / \left ( \text{bonusParry}\times\text{ParryFactor}\times\text{VerticalStretch}+\text{HorizontalShift}\right ) $In other words, the parry from the class contribution to base strength (which is not affected by DR) is completely negated, such that base parry is exactly 3% for each class. However, we also discovered that this wasn’t the case, and were subsequently told that the racial strength modifier is not negated, and that this racial modifier is affected by DR. The “edit” and follow-up post he’s provided since are actually in error, as we’ll see shortly. So, taking his word from before the edits, what we should be seeing is:$\text{bonusParry} = (\text{Strength} – \text{classBaseStr} )\times Q_S + \text{parryRating}/162$Unfortunately, that’s not what the game is doing. For example, let’s take a naked Gnome Warrior. Here are the relevant values:$\text{classBaseStr} = 1455\text{ raceStrMod} = -5\text{ Strength} = 1450\text{ parryRating} = 0\text{ ParryFactor} = 0.634\text{ VerticalStretch} = 0.00665\text{ HorizontalShift} = 0.956$If you plug those values into the formulas above, you get (keeping only 10 decimal places): 2.9703430316% If you use GetParryRating() in-game though, you get (again, to 10 decimal places): 2.9720702171% A difference of ~0.002%. Not that far off… but this should be nearly exact, we’re using the same formula the game does. At first, I thought that perhaps this was because Celestalon rounded the$\text{ParryFactor}$,$\text{VerticalStretch}$, and$\text{HorizontalStretch}$values he gave us in the Theorycrafting thread. However, last week I took a few quick data sets to test this hypothesis, and convinced myself that isn’t the issue. In fact, given the accuracy and agreement we see with the empirically-obtained values from August 2013, the values he provided in the Theorycrafting thread may well be exact, and not rounded at all! To illustrate why, here’s the data for the naked gnome warrior, all retrieved through the the Statslog addon with minor tweaks to get it working again on beta. Statslog uses the World of Warcraft API functions UnitStat("player",1), GetCombatRatingBonus(CR_PARRY), and GetParryChance() to retrieve this information, giving us very high precision results. Again, I’ve only kept 10 decimal places for the table since we’d be ecstatic to fit to even that precision. To collect this data, I just added or removed gear to change my total stats. Gnome Warrior Data$\text{Strength}-\text{classBaseStr}\text{parryRating}/162\text{totalParry}$-5 0.0000000000 2.9720702171 -5 1.2962962389 4.3203210831 205 1.2962962389 5.5452437401 426 2.2037036419 7.7356786728 550 2.7530863285 8.9868812561 674 2.7530863285 9.6833515167 798 3.3024692535 10.9137287140 964 3.9814815521 12.4860315323 1130 3.9814815521 13.3895244598 1351 4.8888888359 15.4365730286 1517 5.6172838211 16.9933910370 1738 6.5246915817 18.9761753082 1904 6.5246915817 19.8289909363 2028 7.0000000000 20.8874912262 2249 7.0000000000 22.0019493103 2121 6.4753084183 20.8898067474 2245 6.9876542091 21.9709510803 I put that data in MATLAB and used it to fit values for$\text{VerticalStretch}$and$\text{HorizontalShift}$with very strict tolerances on accuracy but very loose tolerances on$\text{VerticalStretch}$($\pm 0.0001$) and$\text{HorizontalShift}$($\pm 0.001$). This means that MATLAB’s fitting algorithms will attempt to determine the values very accurately, but will also allow them to vary anywhere within the tolerance on each constant. Note that I’ve chosen those tolerances to provide complete flexibility in rounding those values and then some – in other words, any value of those constants which rounds to the ones given by Celestalon is fair game in this fit, and even some values outside of that range (e.g. 0.00666 for$\text{VerticalStretch}$). The formula, fit details, and plot are given below. In this fit,$x$is our definition of$\text{bonusParry}$above, which includes the strength contribution (in this case negative) of the racial strength modifier. General model: fitresult(x) = 3+0*0.0056697+x/(x*0.634*vs+hs) Coefficients (with 95% confidence bounds): vs = 0.006668 (0.006661, 0.006675) hs = 0.9559 (0.9559, 0.956) Gnome warrior parry fit. That looks pretty good – hs ($\text{HorizontalShift}$) is about right, though the parameter vs ($\text{VerticalStretch}$) has to be higher than is reasonable to be rounded to a value of 0.00665. However, the proof that something’s wrong is in the plot of the residuals – the difference between the fitted curve and the actual data. Here are those differences in tabular form: Gnome Warrior parry fit residuals Fit Data Residual 2.9703 2.9721 -0.0017 4.3190 4.3203 -0.0013 5.5442 5.5452 -0.0010 7.7352 7.7357 -0.0005 8.9866 8.9869 -0.0003 9.6832 9.6834 -0.0002 10.9137 10.9137 -0.0000 12.4862 12.4860 0.0001 13.3897 13.3895 0.0002 15.4369 15.4366 0.0003 16.9937 16.9934 0.0003 18.9763 18.9762 0.0002 19.8291 19.8290 0.0001 20.8875 20.8875 -0.0000 22.0018 22.0019 -0.0002 20.8898 20.8898 -0.0000 21.9708 21.9710 -0.0002 Again, it’s not off by much… but ~0.002% is enough to cause rounding errors that lead the character sheet value to differ by 0.01% compared to calculations (i.e. SimC’s output). So it’s worrisome. Furthermore, these residuals tell us something else about the fit. A graphical representation is a little more revealing, in this case: Gnome warrior parry fit residuals. It may not be clear to a layperson, but someone who’s been fitting data for years will instantly recognize the meaning of that plot. The fact that there’s curvature indicates some sort of systematic error. If we have the correct formula, the residual plot shouldn’t have curvature – it should look almost like a random scatter plot. For example, like this plot from 2012: Dodge residuals from August 2012 post. If you read through that post, you’ll notice that I used the same technique of looking for details and patterns in a residuals plot to identify a systematic error in our assumed value of the agility-to-dodge conversion factor, as well as to figure out that the game performs a binary rounding operation on$\text{bonusBlock}$. The point this time around is that with such loose tolerances on the two DR coefficients, the fit should have no trouble producing a very clean residual plot. That tells us that either 1. Our DR coefficients lie outside the region we’ve provided, which means Celestalon doesn’t know how to round (unlikely), or 2. something is wrong with the formula. My first thought was to try the obvious thing: assume Celestalon lied! Ok, not really, but I thought maybe he was mistaken about the racial stat modifier being affected by DR. So I moved it outside the DR calculation in my code. In other words, I now let$\text{baseParry} = 3 + \text{raceStrMod}\times Q_Sx = \text{bonusParry} = ( \text{Strength} – \text{classBaseStr} – \text{raceStrMod} )\times Q_S + \text{parryRating}/162$Doing that gives us: Gnome warrior parry residuals, mark II. General model: fitresult(x) = 3+-5*0.0056697+x/(x*0.634*vs+hs) Coefficients (with 95% confidence bounds): vs = 0.006654 (0.006652, 0.006656) hs = 0.9559 (0.9559, 0.9559) You might note that the residuals have gone down considerably here – the largest is now$-4\times 10^{-4}$rather than$-1.7\times 10^{-3}$, which suggests that the hypothesis is likely correct. The parry from racial strength modifiers is probably not being affected by diminishing returns. But what bothered me is that there’s still curvature on the residuals plot – that means we still don’t have the formula right. Just to make sure I wasn’t loony, I tried this with dwarf and draenei warriors too. Both races exhibited similar curvature in the residuals plot, so it’s not just gnomes acting oddly. Next, as a sanity check, I tried a human warrior. Since a human warrior has a racial modifier of zero, the curvature should disappear if that’s the cause of the problem. I also added$Q_S$as another fit parameter just in case that value was incorrect, unlikely as that was since it was also given to us by Celestalon. Adding that parameter turns our curve fit into a surface fit, so the residual plot will be 3-dimensional. Here’s what it looked like: Human warrior parry fit residuals. General model: fitresult(x,y) = 3+0/q+(x/q+y)/((x/q+y)*vs*0.634+hs) Coefficients (with 95% confidence bounds): vs = 0.006655 (0.006652, 0.006657) hs = 0.9558 (0.9556, 0.9561) q = 176.4 (176.3, 176.5) The curvature there should be clear despite the viewpoint of the plot. So this curious result tells us that this isn’t a problem limited to racial strength modifiers. There’s still something else missing. And it didn’t take much guessing to stumble across the solution. Let’s assume there’s some extra amount of strength that’s giving parry, and that it’s not subject to diminishing returns. In other words, we let$\text{baseParry} = 3.00 + (\text{raceStrMod} + R)\times Q_S$Let’s also assume that$Q_s$is accurate as given, so we can go back to curve fits rather than surface fits. Then, the fit and residuals for a human warrior look like this: Human warrior parry residuals, mark II. General model: fitresult(x) = 3+(0+r)*0.0056697+x/(x*0.634*vs+hs) Coefficients (with 95% confidence bounds): vs = 0.00665 (0.00665, 0.00665) hs = 0.956 (0.956, 0.956) r = 0.07385 (0.07367, 0.07404) This is what a residual plot should look like. It’s mostly numerical noise at the ~$1\times 10^{-6}$level, and it’s more or less randomly distributed. It’s not entirely clear why we even have that much noise, because$\text{HorizontalShift}$and$\text{VerticalStretch}$are allowed to vary by$\pm 0.001$and$\pm 0.00001$respectively, so it’s not due to rounding of those values. It may just be some subtlety of the calculation that differs between Blizzard’s implementation and my MATLAB formula. Regardless, it isn’t that important – nobody will ever notice a 0.000001% change in parry chance. In any event, the key point here is that Humans are magically getting around 0.07385 Strength worth of parry chance (about 0.0004187%) that isn’t affected by DR. The likely explanation is that the subtraction that negates the parry from$\text{classBaseStr}$isn’t quite correct, since we know they’ve been fiddling with that recently. It may be something as simple as rounding – if whatever formula determines a warrior’s base strength produces a value of 1455.07385, the game could be adding 1455.07385 strength worth of parry rating, but only subtracting off the rounded 1455 strength worth of rating. Repeating this with our gnome (-5 racial strength modifier) or dwarf (+5 racial strength modifier) revealed a few more details. If the racial strength modifier was included in$\text{bonusParry}$(and thus affected by DR), I was only able to get a fit with “good” residuals if I was flexible with$\text{HorizontalShift}$and$\text{VerticalStretch}$– for example, if I let$\text{VerticalStretch}=0.00661$and$\text{HorizontalShift}=0.9562$. And that was further complicated by the fact that those values would have to be allowed to be different for each race, which is obviously wrong since we know they’re class-dependent constants that are independent of race. If I tightened the restrictions on$\text{HorizontalShift}$and$\text{VerticalStretch}$such that they are nearly exact as given, the curvature started to appear again. For example: Gnome warrior parry residuals, mark III. General model: fitresult(x) = 3+(0+r)*0.0056697+x/(x*0.634*vs+hs) Coefficients (with 95% confidence bounds): vs = 0.00665 (fixed at bound) hs = 0.956 (fixed at bound) r = -0.1561 (-0.3099, -0.002212) However, if I move the racial strength bonus into$\text{baseParry}$(outside of the DR calculation), we get good residuals: Gnome warrior parry residuals, mark IV. General model: fitresult(x) = 3+(-5+r)*0.0056697+x/(x*0.634*vs+hs) Coefficients (with 95% confidence bounds): vs = 0.00665 (0.00665, 0.00665) hs = 0.956 (0.956, 0.956) r = 0.07391 (0.07368, 0.07413) Again, recall that the$\text{bonusParry}$value$x$I feed to this equation is adjusted for the location of the racial modifier in the code in each case. Note the amount of “phantom” strength$r$being added here: 0.07391, awfully close to the value we found for humans. As it turns out, we get the same amount of “phantom” strength if we fit a gnome death knight (0.07387) or a dwarf warrior (0.07382), and a very similar value from a draenei warrior (0.07369). Loading up a human paladin (and adjusting$\text{HorizontalShift}$appropriately) gives a phantom strength value of (0.07392), confirming that this isn’t a class-based thing and doesn’t depend on the DR calculation at all. In short, any tank class that gets a strength-to-parry conversion is getting this phantom amount of parry, regardless of race, DR coefficients, or gear. Testing with a monk shows that they have a fixed 3.0000000% parry, suggesting that the agility tanks are not getting this bonus. This is why I’ve framed it as a phantom amount of strength rather than a phantom amount of parry. We can’t say for sure if the agility tanks still have that phantom strength contribution or not, because either way they don’t get any parry from it. Testing The Dodge Equation The obvious next question was whether agility tanks were getting a similar effect. To test, I just assumed that the racial agility modifier worked the same way as the strength modifier did for the strength tanks. I created a Night Elf monk, which has a +2% dodge racial bonus and +4 racial agility modifier, and tried the formulas:$\text{baseDodge} = 5.00 + (\text{raceAgiMod}+R)\times Q_Ax=\text{bonusDodge} = (\text{Agility}-\text{classBaseAgi}-\text{raceAgiMod})\times Q_A + \text{dodgeRating}/162$along with the usual DR formula to determine$\text{totalDodge}$. The result was pretty good: Night elf monk dodge fit residuals.  General model: dfit(x) = 5+(4+R)*0.0056697+x/(x*0.3*vs+hs) Coefficients (with 95% confidence bounds): R = 2.345e-005 (-2.033e-005, 6.723e-005) hs = 1.422 (1.422, 1.422) vs = 0.00665 (0.00665, 0.00665) Looks like our hunch was correct, and that racial base agility is granting dodge that isn’t affected by diminishing returns. Just to be sure, we should test a few more times though. Let’s double check this with a dwarf monk, which has a racial agility modifier of -4, and a base parry of 3%: Dwarf Monk dodge fit residuals.  General model: dfit(x) = 3+(-4+R)*0.0056697+x/(x*0.3*vs+hs) Coefficients (with 95% confidence bounds): R = 2.95e-005 (-4.765e-006, 6.376e-005) hs = 1.422 (1.422, 1.422) vs = 0.00665 (0.00665, 0.00665) Looks good. Running a night elf druid through the fitting algorithm produces similar results: Night elf druid dodge fit residuals.  General model: dfit(x) = 5+(4+R)*0.0056697+x/(x*1*vs+hs) Coefficients (with 95% confidence bounds): R = 1.693e-005 (-1.27e-005, 4.655e-005) hs = 1.222 (1.222, 1.222) vs = 0.00665 (0.00665, 0.00665) This pretty much confirms that we’ve got it right, and it’s reasonable to expect that it works the same way for the rest of the druid and monk races. Testing The Block Equation There isn’t really much to say here. The block equation is unchanged from Mists, and it appears they haven’t tinkered with it. The exact same fitting code that worked in Mists works now, and produces a great fit. Nonetheless, I updated it to match the other functions and to use the Blizzard formulation: Human paladin block fit residuals.  General model: bfit(x) = 13+R+x/(x*1*vs+hs) Coefficients (with 95% confidence bounds): R = 7.405e-006 (-2.273e-005, 3.754e-005) hs = 0.886 (0.886, 0.886) vs = 0.00665 (0.00665, 0.00665) Similarly good fits were obtained with all of my warrior test subjects, suggesting that the block DR equations are working the same way they have been. Conclusions So we’ve confirmed that • Racial strength modifiers grant parry that is not affected by parry DR. • Racial agility modifiers grant dodge that is not affected by dodge DR. • There’s some phantom parry being added to all strength-based tanking classes, but not to agility tanking classes. • Block diminishing returns seem to be working exactly like they should be. The obvious question is, “what will Blizzard do about it?” The parry bug is a difference of 0.0004%, which is insignificant in the grand scheme of things. It only matters to crazy people like me that want to be able to replicate the character sheet values 100% of the time and stamp down those rare 0.01% rounding errors. So it didn’t surprise me in the least that the response was that I should just build it into my models, because it was too small to matter and their programmers had better things to do. I can’t argue with that at all, really. Similarly, I don’t think there’s any reason they need to move the racial strength modifier back out of$\text{baseParry}$and into$\text{bonusParry}$, or to remove it altogether. It’s entirely arbitrary how that works, and it only matters insofar as theorycrafters want to know how to model it properly. It seems like a waste of time for them to change that around at this point just for the sake of cleaning up the equations. So I think it’s safe to say that these are the diminishing returns formulas that we’ll be using throughout Warlords. I’ve already programmed the changes into Simulationcraft and run some tests with a few characters to make sure they’re working. A Word On Theorycrafting This installment was a little more complicated than the earlier TC101 articles. In particular, I talked a lot about “fitting” the data, but didn’t go into any detail on how that’s done. However, explaining how to fit data using MATLAB’s curve fitting toolbox or associated methods would be somewhat useless, because the likelihood is that you don’t have MATLAB at home. It’s more likely that you’d be putting the data in Excel or a Google documents spreadsheet and attempting to fit the data that way. Unfortunately, that way is a lot less flexible (which is why I use MATLAB instead!). The built-in fitting functions are more limited, for one thing. If your data is linear, then you’re all set, but if not you often have to be creative. You can’t quickly and easily define and change a custom fitting function, at least to my knowledge. But there are plenty of tutorials on how to do this sort of thing online. Fitting linear (i.e.$y=mx+b$) or polynomial data (i.e.$y=a + bx + cx^2 + dx^3 + …$) is incredibly easy, and you’ll find plenty of hits from a simple Google search. Fitting nonlinear data, like our diminishing returns equations, is more complicated, but there’s a great guide from California State Polytechnic University, Pomona that details how you’d go about fitting a complicated equation in Excel. Ultimately, though, your first step should be to ask yourself whether you need that level of precision. For something like this, most players don’t, and would be satisfied with being within 0.01% of the character sheet avoidance value. If you decide you do need more accuracy, that’s when you take stock of the tools you have at hand to deal with the problem, and decide if they’re suitable. If they are (maybe you have Excel and the time to learn how to use the Solver), then great! If not, though, you might need to seek out someone who does have the knowledge and/or tools you need. Remember, theorycrafting is a collaborative process, so there’s nothing wrong with asking for assistance. Sometimes just having another set of eyes looking at the data, or looking at it with a different tool, will crack a tricky problem wide open. | | 7 Comments ## TC101: How Stats Are Calculated Primary attribute calculations seem like they should be a pretty simple topic. So simple, in fact, that most players don’t even think about how they’re done. Most theorycrafters don’t either until they try to write a spreadsheet that models a character and notice that their math doesn’t work out. I wonder how many times the following scenario has been re-enacted over the past few years: Okay, my Paladin has 1455 base strength and 2378 strength from gear, for a total of 3833. With the 5% bonus for wearing plate armor, that should give me 3833\times 1.05=4024.7. The 5% stats raid buff should raise that to 3833\times 1.05\times 1.05=4225.9. My character sheet only gives an integer, so it should round that to 4226. Right? 4224 is not equal to 4226 Wait, what?? As it turns out, stat calculations are one of the more convoluted things in the game, and I suspect that (until now) few theorycrafters have modeled all the nuances with complete accuracy. Blizzard tosses a few floor() and round() functions into the mix at seemingly arbitrary places, which makes it tougher to reverse engineer. Over the past month or so, I’ve been collecting data from beta and working on determining exactly where and how the stat calculations are rounded. This is a Theorycrafting 101 article because this process is a great example of the sort of thing I spoke about in part 1: starting with a basic model and adding complexity until all the details work. After we go over the formulas for calculating stats, we’ll go step-by-step through the process I used to test and determine the formulas. Primary Attribute Formulas Here’s how your character sheet attributes are calculated. First, we define some conditional values.$\text{match} = \begin{cases}1.05 & \text{if armor matches class} \\ 1.00 & \text{otherwise} \end{cases}\text{epicurean} = \begin{cases} 2 & \text{if pandaren} \\ 1 & \text{otherwise} \end{cases}\text{alchemy} = \begin{cases} 2 & \text{if alchemist} \\ 1 & \text{otherwise} \end{cases}\text{multiplier}$is the total multiplier from buffs and other effects. So for example, if the only buff you have active is Blessing of Kings,$\text{multiplier} = \begin{cases} 1.05 & \text{for STR/INT/AGI} \\ 1.00 & \text{otherwise} \end{cases}$Similarly, Fortitude would be a$\text{multiplier}$of 1.10, Guarded By The Light would give 1.25, and so on. Multiple effects are multiplicative, so a protection paladin with Fortitude active would have a stamina multiplier of$1.10\times 1.25=1.375$. We then calculate the total base$B$and gear$G$contributions before multipliers:$B = \text{race_base}+\text{class_base} + \text{heroic_presence} + \text{endurance}\begin{align} G = \text{gear_stat} &+ \text{round}[~\text{food_stat}~] \times \text{epicurean} \\ &+ \text{round}[~\text{flask_stat}~] \\ &+ \text{round}[~\text{potion_stat}~] \\ &+ \text{round}[~\text{trinket_proc_stat}~] \end{align}$And generate a “composite” value$C$that incorporates the matching multiplier:$C = \text{floor}[~G\times \text{match}~]+B\times \text{match}$Finally, your character sheet mouseover tooltip reads: Strength CS_Total ( CS_Base + CS_Bonus ), with:$\text{CS_Total} = \text{floor}[~C \times \text{multiplier}~ ]\text{CS_Base} = \text{floor}[ ~B \times \text{match}~ ]\text{CS_Bonus} = \text{CS_Total} – \text{CS_Base}$Building The Model Now that we’ve got the math out of the way, let’s see how we determined it. First of all, let’s go back to our original example. If we log on to the beta PvP server and create a new level 100 Paladin, this is what we get: When I grow up, I want to run a camel farm. As you can see, the character has 1455 base strength and 2378 “bonus” strength. Since we haven’t chosen a spec yet, and we’re completely unbuffed, these values should properly reflect our total base strength and strength from gear, respectively. We can double-check this, because Celestalon gave us the full list of base stats for each class ($\text{class_base}$)as well as the racial base stat modifiers ($\text{race_base}$). A human’s$\text{race_base}=0$and a paladin’s$\text{class_base}=1455$, so it’s quite clear that our character sheet’s giving us the correct base value. Likewise, you could go through and add up all the strength on each piece of gear to confirm that the sum is 2378. Together, that makes the 3833 total given on the character sheet. In other words, so far we’ve got the following skeletal formulas:$B = \text{class_base}+\text{race_base}G = \text{gear_stat}\text{CS_Base} = B\text{CS_Bonus}=G\text{CS_Total}=B+G$Those aren’t final, of course – we’re going to be adding to them and correcting them as we go. This is also useful because it means when we go to test other classes, we can look at the unbuffed values before we chose a spec to grab our$B$and$\text{gear_stat}$values. Adding Armor Skills Now let’s choose a spec. When we spec Retribution, we get the 5% increase to strength from the Armor Skills passive. Which gives us slightly different numbers: We can respec her. We have the technology. We can make her …stronger…faster. The only thing we’ve changed is to add a multiplier of 1.05 thanks to the armor specialization passive. Yet, as you might have guessed from our earlier example, our new value is not just 1.05 times our un-specced value of 3833.$3833\times 1.05 = 4024.7$, yet our character sheet reads 4023! Let’s see what’s going on here. First, let’s consider the base value. We started with 1455, and$1455\times 1.05=1527.8$. The character sheet reads 1527, though, which tells us that the character sheet is taking the result of the calculation and applying a floor() function. In fact, this isn’t much of a surprise – it’s been known for a while that all of the values on the character sheet are floored rather than rounded. The game still uses the full-precision values when it does calculations though, so you don’t have to worry about stat points being “wasted” due to rounding/flooring. Similarly, the 2378 strength from gear has become 2496 bonus strength. If we check the math,$2378\times 1.05 = 2496.9$. So the bonus strength is also being floored, not rounded. Our total strength is just the sum of the two floored values, which tells us that some of this flooring is happening before it’s displayed on the character sheet, otherwise we should have 4024 strength, as mentioned earlier. This is also useful information: since we know that the three character sheet values are linked through basic addition, we really only need to find correct formulas for two of them, and the third will fall into place automatically. Now, none of this is really news. These details have been known for some time, since it’s very easy to stumble across. But this is how someone, at some point, had to go about determining it. So now we can update our skeleton formulas slightly to incorporate the new details:$B = \text{class_base}+\text{race_base}G = \text{gear_stat}\text{CS_Base} = \text{floor}[ ~B \times \text{match}~]\text{CS_Bonus}=\text{floor}[~ G\times \text{match}~]\text{CS_Total}=\text{CS_Base}+\text{CS_Bonus}$The trick here is that there’s some ambiguity about those floor functions. For example, according to this, our$\text{CS_Total}$could be expressed as:$\text{CS_Total} = \text{floor}[~B\times \text{match}~] + \text{floor}[~G\times \text{match}~],$but in reality, we’d get the same result from either of the following formulas as well:$\text{CS_Total} = \text{floor}[~\text{floor}[~B\times \text{match}~] + G\times \text{match}~]\text{CS_Total} = \text{floor}[~B\times \text{match} + \text{floor}[~G\times \text{match}~]~]$So we don’t really know which is correct yet. The one thing we can rule out is having no floors. Incorporating Multipliers Now let’s apply Blessing of Kings to see how that interacts with these formulas: It’s good to be the King. Our first guess might have been that Kings would work the same way as the matching multiplier. In other words, our character sheet base should be$1455\times 1.05\times 1.05=1604.1$, or 1604. But it’s clear that’s not how it works, because our base value hasn’t changed – it’s still 1527. Likewise, if we treat the bonus strength contribution the same way, we’d get$2378\times 1.05\times 1.05 = 2621.7$, which is far too low. Something else is happening here. If we naively take our total strength before Kings (4023) and multiply by the Kings modifier, we get$4023\times 1.05=4224.2$, which is exactly right after applying a floor function. Interesting! This actually gives us a hint as to how to proceed, but I’m going to blithely ignore it for instructional purposes. That detail does make it pretty clear that base strength is affected by Kings, though. But the game’s accounting adds that extra strength to the green bonus strength value rather than the white base strength value in the tooltips. For the moment, let’s go with our earlier (and unstated) assumption that the game calculates things the way we have, such that “base” and “bonus” strength are calculated independently and then summed to get the total. We’ll find out shortly if this is a good assumption or not. The bonus strength value is then, roughly speaking,$G\times \text{match}\times \text{multiplier} + B\times \text{match}\times (\text{multiplier}-1)$with the caveat that there may be some floors going on in there. We can turn that “may” into a “must” by checking the math:$2378\times 1.05 \times 1.05 + 1455\times 1.05\times 0.05 = 2698.1$which should give us a bonus strength value of 2698, not 2697. So we know that in order for this formulation to be correct, there has to be a floor happening somewhere before we get to the value that the game floors to show on the character sheet. We can also rule out any uses of round() here, because that would always give 2698. This is the same ambiguity I spoke of earlier – we knew that the character sheet values were being floored, but we weren’t 100% sure where. Including the Kings multiplier here has clarified that there needs to be at least one extra floor() floating around in our hypothetical formula, but doesn’t tell us exactly where. So we have to try them all, and see if we can rule any of them out. There are four logical scenarios that use only one floor (in addition to the final floor used to display the value on the character sheet). They are:$F1=\text{floor}[~G\times \text{match}~]\times \text{multiplier} + B\times \text{match}\times (\text{multiplier}-1)F2=\text{floor}[~G\times \text{match}\times \text{multiplier}~] + B\times \text{match}\times (\text{multiplier}-1)F3=G\times \text{match}\times \text{multiplier} + \text{floor}[~B\times \text{match}~]\times (\text{multiplier}-1)F4=G\times \text{match}\times \text{multiplier} + \text{floor}[~B\times \text{match}\times (\text{multiplier}-1)~]$If we put our data into this, with$G=2378$,$B=1455$,$\text{match}=1.05$, and$\text{multiplier}=1.05$, they give us the following results:$F1=2697.19F2=2697.39F3=2698.10F4=2697.75$This rules out$F3$, because the final floor() on the character sheet would leave that as 2698, which is wrong. But this test can’t distinguish between$F1$,$F2$, and$F4$. Any of those could still be correct. Unfortunately, that’s all the information we can extract from our premade Ret paladin. We learn a little more if we swap specs to protection. In protection spec, we get a 5% increase to stamina from the Armor Skills passive along with the 25% increase to stamina from Guarded By The Light. Starting with$B=890$base stamina and$G=3250$from gear, and using$\text{multiplier}=1.25$, the formulas give us:$F1=4498.63F2=4498.63F3=4499.13F4=4498.63$And looking at the character sheet…. One of these things is not like the others. Uh oh. The only one that worked was$F3$, which we’ve already ruled out with our ret paladin. So this tells us that none of those formulas are correct. And increasing the number of floors doesn’t help, because it just makes the values even smaller, which won’t satisfy the protection data. For example,$F5=\text{floor}[~G\times \text{match}\times \text{mult}~]+\text{floor}[~B\times \text{match}~]\times (\text{mult}-1) = 4498.50$Still no dice. We can’t add round functions either, because then the ret data isn’t satisfied. As it turns out, we’re going about this the wrong way. Our original hypothesis – that the game calculate the base and bonus values individually and sums them to get the total – has to be false. This isn’t really news, because theorycrafters have been using an alternative formulation for years. But it’s a good example of coming up with a hypothesis, testing it, and ultimately ruling it out, which is something that happens all the time in theorycrafting. Now that we’ve done so, let’s see if we have more luck with another hypothesis. A Change Of Approach The traditional approach goes something like this: rather than trying to calculate base and bonus strength individually, let’s try deriving correct formulas for the base and total values. Then the bonus value will be determined using basic subtraction. As we’ll see, this approach is much more successful. Just as we did for bonus strength, we’ll construct formulas for total strength using$B$,$G$,$\text{match}$, and$\text{multiplier}$. We also know there has to be at least one floor in there somewhere based on our armor matching modifier test. There are six obvious ways to do this using one or two floor functions:$T1 = \text{floor}[~G\times \text{match}~]\times \text{multiplier}+B\times \text{match}\times \text{multiplier}T2 = \text{floor}[~G\times \text{match}\times \text{multiplier}~]+B\times \text{match}\times \text{multiplier}T3 = G\times \text{match}\times \text{multiplier}+\text{floor}[~B\times \text{match}~]\times \text{multiplier}T4 = G\times \text{match}\times \text{multiplier}+\text{floor}[~B\times \text{match}\times \text{multiplier}~]T5 = \text{floor}[~G\times \text{match}~]\times \text{multiplier}+\text{floor}[~B\times \text{match}~]\times \text{multiplier}T6 = \text{floor}[~G\times \text{match}\times \text{multiplier}~]+\text{floor}[~B\times \text{match}\times \text{multiplier}~]$We could also come up with two more methods using two floors, but they would require that we be flooring before the$\text{multiplier}$in one term but not in the other, which seems unlikely. If none of these hold up, we’ll revisit that idea. Plugging in our ret paladin stats of$G=2378$,$B=1455$,$\text{match}=1.05$, and$\text{multiplier}=1.05$, we get the following results:$T1=4224.94T2=4225.14T3=4225.10T4=4225.75T5=4224.15T6=4225.00$So right out of the gate, we can cross off four of the formulas (2, 3, 4, and 6). Only$T1$and$T5$give a value consistent with the character sheet. And there’s something notable about those two formulas: we can factor out$\text{multiplier}$in each of them:$ T1  = \left (~ \text{floor}[~G\times \text{match}~]+B\times \text{match}~ \right )\times \text{multiplier}T5 = \left (~ \text{floor}[~G\times \text{match}~]+\text{floor}[~B\times \text{match}~] ~ \right ) \times \text{multiplier}$that observation will come in handy later. For now, we need to figure out which one of these two formulas correct. It’s worth noting that$T5$is what’s frequently been used for calculating base stats in most theorycrafting works. This is what Simulationcraft has been using throughout MoP, for example. It’s pretty close, and generally gives you answers that are correct. But as we noticed while working on the WoD code, on some rare occasions, it’s off by one. You can see why, as well: Let’s say that$B=919$. Then$B\times \text{multiplier}=964.95$. If we just multiply by$\text{multiplier}$like we do in formula$T1$, we’d get a subtotal of 1013.20. But if we floor that 964.95 before multiplying, we get 1012.20. The two formulas would give answers that differ by one, with$T1$giving a slightly higher value than$T5$. So to test this discrepancy, we need to find a character with just the right amount of stat bonus. This gets easier the higher$\text{multiplier}$is, so let’s try with our prot paladin, whose$\text{multiplier}=1.3750$for stamina once we apply the 10% stamina buff. Again using$B=890$,$G=3250$, and our$\text{match}=1.05$, we get:$T1 = 5976.44T2 = 5975.75$Demonstrating the “off-by-one” error. And what does the character sheet say? Conclusive Evidence! We’ve already shown earlier that this value has to be floored rather than rounded, so this removes any ambiguity.$T1$is the only formula that’s still standing. We now know conclusively how our total stats are calculated, at least so far. Since we can factor the$\text{multiplier}$out, it makes some sense to define a “composite” subtotal$C$to make the math a little easier. In other words, we define it such that$C = \text{floor}[~G\times \text{match}~]+B\times \text{match}$, and then our character sheet base and total values are just$\text{CS_Base} = \text{floor}[~ B\times \text{match}~]\text{CS_Total} = \text{floor}[~C\times \text{multiplier}~]$And of course,$\text{CS_Bonus} = \text{CS_Total}-\text{CS_Base}$. This gives you the bulk of the formulation provided in the beginning of this post. The Nitty-Gritty Details We’re not finished yet, though. Because even after this, we were able to observe some odd off-by-one errors due to certain special effects. For example, how are flat stat-bonus buffs like potions, trinket procs, flasks, and food incorporated into this? What about racial effects like Endurance, Heroic Presence, and Epicurean? So I set out to do some more testing. Endurance was the easiest to test. Consulting our handy tables, all of the classes have 890 base stamina, and taurens get a racial modifier of +1. So before we choose a spec, our tauren paladin test subject (lovingly named Testbeef) should have 891 stamina. Instead… There’s some extra beef here… Endurance gives 197.055 stamina at level 100. The tooltip is, as usual, floored, but we can get an exact value directly from the spell data using Simulationcraft’s spell_query tool: SimC to the rescue. It’s clear from this that Endurance is being counted as base stamina by the game (890+1+197 = 1088). In other words, we can update our formula for$B$:$B = \text{race_base}+\text{class_base}+\text{endurance}$And it’s fairly simple to confirm that all of the other formulas work out to accurate values as given. Next up is Heroic Presence, so we create a draenei paladin. It isn’t hard to determine how this works: Heroic base strength. The tooltip for Heroic Presence reads 65, but spell_query reveals that the actual value is 65.25. Draenei get a 1-point racial strength modifier, so it’s pretty clear that our base strength is just 1455+1+65=1521, meaning that Heroic Presence is also added into$B$:$B = \text{race_base}+\text{class_base}+\text{endurance}+\text{heroic_presence}$There’s one caveat here, which is that we can’t know for sure from this data whether Endurance or Heroic Presence are being floored before they’re added to$B$. Both of them have low enough decimal values that it would be unlikely to matter and difficult to test. Heroic Presence’s value was 130.5 several beta builds ago, and I was able to confirm that it isn’t being rounded (though that was unlikely anyway). But I couldn’t rule out a floor(). To do so now, we’d need modifiers such that$0.25\times \text{match}\times \text{multiplier} > 1$to test Heroic Presence (or$0.055\times \text{match}\times \text{multiplier}>1$for Endurance), which we don’t have. So it probably doesn’t matter very much, but it’s worth noting in case we find a situation where we can explicitly test that. Epicurean was the most interesting test, because first we had to figure out how stat buffs worked. For example, is the amount of stat given by food floored or rounded? It turns out that by being clever, we can test both at once. First, we searched through to find some foods that would be useful to us. The two we ended up using were Serpent Brew of Serenity and Hearty Elekk Steak. Serpent Brew has an intellect bonus of 24 according to the buff it grants. However, if we check the spell data… Sneaky Sneaky … it apparently gives a buff of 23.816 intellect. Which is a strong case for flat stat bonus buffs being rounded, not floored, and making them the exception to just about everything else. We can double-check this by rolling a level 100 monk, using Zen Pilgrimage to visit Master Chang at the Peak of Serenity, and testing this: I took off all of poor Foodtest’s gear and didn’t give her a spec to make this easier to see. I’m sort of a jerk like that. This tells us that the buff is definitely rounded, not floored. Because there are only two ways to get 48 intellect from a 23.816-intellect buff on a pandaren, and neither of them involve a floor:$F1 = \text{round}[~\text{food_stat}\times 2~]F2 = \text{round}[~\text{food_stat}~]\times 2$To figure out which one we have, we employ the new Hearty Elekk Steak, which is conveniently available from Savage Flaskataur, Esq. in Stormwind or Orgrimmar. This food is almost magical, in that the spell data says it grants a 187.49999 stamina buff. Even spell_query rounds this to 187.5, which led to a little confusion until we sussed out the issue. But what it means for us is that if$F1$is correct, we’ll see a buff that’s 375 stamina on our pandaren, but if formula$F2$is correct it will only be 374. And what is it? The steaks are low. So apparently, food buffs are rounded, and Epicurean is applied after the rounding occurs. It’s also fairly easy to test that these are affected by$\text{match}$and$\text{multiplier}$just like gear contributions are, so we can fold this directly into our definition of$G$:$G = \text{gear_stat} + \text{round}[~\text{food_stat}~] \times \text{epicurean}$Testing the others follows a similar process. Spell data tells us that a Flask of the Earth adds 170.926 stamina, but in-game both the tooltip and character sheet show that it grants 171 stamina. So flasks are clearly rounded as well. The Alchemy profession buff has already been turned off, so we can safely ignore that. A Potion of Mogu Power grants 455.793 strength according to spell data, but shows up as 456 in-game, suggesting that potions are also rounded. Testing with a few different trinket procs also showed that they were all rounded, not floored. The one thing we haven’t shown yet is whether these effects are rounded individually or after-the-fact. The Epicurean data strongly suggests each is rounded independently, but a more rigorous test would be nice. To do so, we need a pair of buffs that give a different value when rounded separately than together – e.g. both ending in 0.5 or greater, or both ending between 0.25 and 0.49. Digging through the spell data, we come up with Beer Basted Crocolisk, which grants 35.724 strength and stamina, and Flask of Steelskin, which grants 178.62 stamina. If we use both items, we should get 215 stamina if each is rounded individually, but only 214 stamina if they’re rounded after being added together. A few unlucky crocolisks later, we have our confirmation: Only about four crocolisks were harmed during the filming of this test. This is the last piece of the puzzle, and finishes confirming the equations given in the beginning of this post. Closing Thoughts You can see that we’ve done a pretty exhaustive job of testing edge cases to make sure our formula for base stats works correctly in all circumstances. If we were content to be accurate to within one point of stat, we could have stopped very early on. But it was important to me that the stats Simulationcraft spits out match your character sheet all the time. Ultimately that lack of accuracy reflects poorly on the sim, even if “off-by-one” errors have no significant effect on the overall simulation results. And as you’ve now seen, trying to cover all of those edge cases often takes some careful thought about what those cases are and how you can test them. Often it means ruling out all but one hypothesis, and sometimes you need very specific items or gear combinations to distinguish between different hypotheses. Most combinations of food+flask wouldn’t reveal the difference between the two rounding schemes proposed, we had to seek out a very specific pair based on the numerical constraints of the problem. But that’s the sort of work theorycrafters do – wading through the minutiae to build models that are as accurate as possible. And sometimes, even an “easy” task like calculating player attributes ends up having a ton of details that you never thought about before. Posted in Theck's Pounding Headaches, Theorycrafting | | 10 Comments ## TC101: Testing Simulationcraft In the last two installments, we talked about what it means to theorycraft and spent some time discussing experimental design. Today, we’re going to talk about how Simulationcraft fits into that picture. Simulationcraft is a numerical model of the game and its mechanics. It’s a fairly powerful theorycrafting tool, much like a good spreadsheet, but significantly more flexible. The downside of that flexibility is that the learning curve is a little steeper than using a spreadsheet. And unfortunately, a lot of players don’t really understand how to use that tool properly, leading them to mistakenly conclude that the tool isn’t very good. As a beginner theorycrafter, there are two primary ways Simulationcraft may fit into your work. The first is as a contributor helping to improve Simc’s modeling. You may find yourself performing in-game tests to determine mechanics, and then comparing those tests to similar experiments in Simulationcraft to verify that SimC has the mechanics coded properly. Note that this doesn’t require any knowledge of C++, just enough familiarity with the program to tweak a character profile or action priority list. The second way it may fit into your work is the obvious (and more common) complement: taking advantage of that model by using it to discover new techniques and determine optimal play patterns. This could be tweaking action priority lists to find the best rotation for a given circumstance, testing different gear sets to find a “best in slot” arrangement, or estimating the true value of a glyph, talent, or set bonus. In other words, using the model to answer the sorts of specific questions that come up in the course of optimizing your character. In this blog post, we’ll address comparing in-game experiments to Simulationcraft outputs. A tool is only useful if you can trust that it produces accurate results, and while that’s a good assumption for actively-maintained class modules, it may not be good for ones that have sat dormant for some time. In a future blog post, I’ll talk more about using Simulationcraft for discovery and optimization. When we want to validate Simulationcraft results, what we’re really doing is designing and performing a pair of experiments. One is our in-game experiment, which tells us what we use as a measuring stick for our SimC output. If the SimC output deviates significantly from what we observe in-game, then something is pretty clearly wrong. But we’re also designing a second experiment, which is the simulation itself. Just as we do with the in-game experiment, we have control over the gear, talents, glyphs, and other character properties for the simulation. We also have control over the experimental procedure by way of the action priority list. Simulationcraft takes care of the data collection for us, so we only need to worry about analysis. If you’ve never used Simulationcraft, there’s a pretty good (if slightly out-of-date) Starter’s Guide on the wiki. As an aside, this is another thing that we could really use help with and doesn’t require coding knowledge: people dedicated to keeping that wiki up-to-date for the benefit of new users. Dissection of a Simulationcraft Profile As an example, let’s consider a particular character profile. The following profile or “simc file” is a mock-up of a tier 17 normal-mode profile from simulationcraft’s Warlords of Draenor development branch. It uses the gear that a level 100 pre-made paladin has on beta, so it’s easy to make exactly this character for testing purposes. paladin="Paladin_Protection_T17N" level=100 race=blood_elf role=tank position=front professions=Blacksmithing=600/Enchanting=600 talents=http://us.battle.net/wow/en/tool/talent-calculator#bZ!201121. glyphs=focused_shield/alabaster_shield/divine_protection spec=protection # This default action priority list is automatically created based on your character. # It is a attempt to provide you with a action list that is both simple and practicable, # while resulting in a meaningful and good simulation. It may not result in the absolutely highest possible dps. # Feel free to edit, adapt and improve it to your own needs. # SimulationCraft is always looking for updates and improvements to the default action lists. # Executed before combat begins. Accepts non-harmful actions only. actions.precombat=flask,type=earth actions.precombat+=/food,type=chun_tian_spring_rolls actions.precombat+=/seal_of_insight actions.precombat+=/sacred_shield,if=talent.sacred_shield.enabled # Snapshot raid buffed stats before combat begins and pre-potting is done. actions.precombat+=/snapshot_stats actions.precombat+=/mogu_power_potion # Executed every time the actor is available. actions=/auto_attack actions+=/arcane_torrent actions+=/holy_avenger,if=talent.holy_avenger.enabled actions+=/divine_protection actions+=/guardian_of_ancient_kings actions+=/eternal_flame,if=talent.eternal_flame.enabled&(buff.eternal_flame.remains<2&buff.bastion_of_glory.react>2&(holy_power>=3|buff.divine_purpose.react|buff.bastion_of_power.react)) actions+=/eternal_flame,if=talent.eternal_flame.enabled&(buff.bastion_of_power.react&buff.bastion_of_glory.react>=5) actions+=/shield_of_the_righteous,if=holy_power>=5|buff.divine_purpose.react|incoming_damage_1500ms>=health.max*0.3 actions+=/crusader_strike actions+=/judgment actions+=/avengers_shield actions+=/sacred_shield,if=talent.sacred_shield.enabled&target.dot.sacred_shield.remains<5 actions+=/holy_wrath actions+=/execution_sentence,if=talent.execution_sentence.enabled actions+=/lights_hammer,if=talent.lights_hammer.enabled actions+=/hammer_of_wrath actions+=/consecration,if=target.debuff.flying.down&!ticking actions+=/holy_prism,if=talent.holy_prism.enabled actions+=/sacred_shield,if=talent.sacred_shield.enabled head=primal_gladiators_plate_helm,id=111211 neck=primal_gladiators_choker_of_cruelty,id=111207 shoulders=primal_gladiators_plate_shoulders,id=111213 back=primal_gladiators_cloak_of_prowess,id=111206 chest=primal_gladiators_plate_chestpiece,id=111209 wrists=primal_gladiators_armplates_of_victory,id=111182 hands=primal_gladiators_plate_gauntlets,id=111210 waist=primal_gladiators_girdle_of_cruelty,id=111174 legs=primal_gladiators_plate_legguards,id=111212 feet=primal_gladiators_warboots_of_prowess,id=111178 finger1=primal_gladiators_signet_of_cruelty,id=111219 finger2=primal_gladiators_signet_of_accuracy,id=111220 trinket1=primal_gladiators_medallion_of_cruelty,id=111229 trinket2=primal_gladiators_insignia_of_victory,id=111233 main_hand=primal_gladiators_hacker,id=111198,enchant=dancing_steel off_hand=primal_gladiators_shield_wall,id=111221 # Gear Summary # gear_strength=2407 # gear_stamina=3248 # gear_crit_rating=1242 # gear_haste_rating=370 # gear_mastery_rating=591 # gear_armor=4366 # gear_parry_rating=9 # gear_multistrike_rating=701 # gear_versatility_rating=224 If you’re new to Simulationcraft, it’s worth spending a few minutes discussing how profiles work. I’ll give a brief overview, but there is much more thorough documentation available on the Simulationcraft Wiki. A simc file is just a text file with the “.simc” extension – you can open it in your favorite text editor (I generally use Notepad++ on Windows, but the built-in Notepad application works just fine). Each line in the file tells the simulation one piece of information it needs to operate. For example, paladin="Paladin_Protection_T17N" tells the sim that we’re defining a new paladin (called an “actor” in SimC lingo) and we want to name him “Paladin_Protection_T17N.” If we wanted to, we could change that line to paladin="Bob" and the sim would work exactly the same, but our paladin would suddenly be named Bob. Likewise, subsequent lines tell the sim that Bob is a level 100 blood elf tank with Blacksmithing and Enchanting professions. It continues to specifiy talents, glyphs, and spec. The lines that start with a pound sign (#) are called comments. These are lines that are for informational purposes only, to help explain what’s going on. The simulation skips over them entirely when interpreting the file. This also means that if we want to disable something in the profile, we can put a “#” before that line to make it invisible to the sim. The next thing the profile specifies is the action priority list, or APL for short. This is where we specify our experimental procedure, by defining what (and under what conditions) the player will cast. The first section of lines which start with “actions.precombat” define the things we’ll be doing before combat starts, like applying flasks and food, choosing a seal, and pre-potting. This section is only run once, at the beginning of the simulation. The next section starting with “actions=/auto_attack” is the APL the sim uses during combat (also known as the “default” APL). You might note that the first line starts with “actions=” and the second with “actions+=”; this is an under-the-hood quirk related to C++ and the simulation internals, but it’s worth mentioning briefly. The line “actions=/auto_attack” defines a new text variable (known as a ‘string’ in computer science terminology) that contains “/auto_attack” and nothing else. In C++, “+=” is an operator that means “take the existing value of this variable and add whatever comes after to it.” So for example, in the pair of lines x=2; x+=3; the first line assigns the value 2 to the variable$x$, and the second adds 3 to the value of$x$. After executing both lines,$x$would contain the value 5. When using += with strings, it just concatenates the two strings. So the two lines actions=/auto_attack actions+=/arcane_torrent would leave an actions variable that contained /auto_attack/arcane_torrent. This is how SimC handles action priority lists – they’re just long strings of action names and conditions separated by slashes. The practical implication of this is that the very first action on the list has to be defined as actions=/action_name, otherwise the sim won’t know how to parse the input. The final section of the profile defines the character’s gear, one slot at a time. You’ll note that for most of these, we just specify the slot (e.g. “head”) and set it equal to an item descriptor containing the name and item id. A normal profile would also include enchants or gems, but I’ve removed most of these since the pre-made gear doesn’t come with enchants or gems. We don’t need to tell it all of the item stats, as it will reconstruct those stats from the game data based on the item id. Note that the name of the item isn’t important. We could call each of these items whatever we wanted. The sim will spit out a warning on the report if the names don’t match, but it will dutifully perform the simulation anyway assuming we know what we’re doing. I still recommend writing the item names in however, because the warning is quite useful when you accidentally make a typo in an item id (and thus aren’t using the item you thought you were!). We can also override the stats on an item, or create an entirely fake item with whatever stats we want on it. One thing I’ll frequently do is abuse the “shirt” slot to tweak a character’s stat. If I want to give the character 10k more mastery and 5k haste rating, I might add a line like shirt=thecks_shirt_of_haxx,stats=10000mastery_5000haste to arbitrarily tweak the character’s stats. Note that the “# Gear Summary” section below is completely irrelevant and unnecessary. Every line starts with a “#” so the simulation completely ignores it. This section is automatically generated, either by the script that puts together this profile or by the code that imports characters from the armory. You’re free to delete it if you don’t want it cluttering up the end of the character profile. If it looks like a daunting task to put together all of that from scratch, you’re in luck. You can import your character from the armory and Simulationcraft will automatically generate your profile, along with a default action priority list. You can then go hacking away at it from there to make it fit your experiment, as we’ll do shortly. The Starter’s Guide explains how to do that. However, if you’re on the PTR or Beta, you obviously can’t import from the armory. To help with that, I’ve written an addon that will generate a profile for your character in-game, which can then be copy/pasted into Simulationcraft. The addon is named, as you might guess, Simulationcraft. This is also useful if you want to test a bunch of configurations without having to log in and out repeatedly to update the armory; just change gear, type /simc, and copy/paste the new profile. Back to Experiments Now that we know what a SimC character profile looks like, let’s return to the topic at hand. Our profile is essentially the definition of our Simulationcraft “experiment.” We want to compare the results, so we want the simulation input to model the in-game experiment as much as possible, so it’s natural to expect that our constraints on the in-game experiment carry over to the simulation input. Thus, all of our earlier discussion about experimental design is equally applicable to designing the simulation input. For example, we want to try and minimize or eliminate dynamic effects that could compromise our results. We probably don’t want our strength to change during the test, so we wouldn’t be using potions. As such, our profile shouldn’t include pre-potting. We may decide to comment out that line of the profile, as well as any line in the combat APL which used a potion (if there was one). We could also just delete those lines if we’re sure we’ll never use them again – for example, if we’ve saved this as a separate copy somewhere and will only use it for this specific experiment. Since Primal Gladiator’s Insignia of Victory has a strength proc, we probably don’t want to use it during our testing. So we’d comment that line out in the profile and remove it from our character during the in-game test, just to make sure it didn’t taint our results. The Dancing Steel enchant on the weapon similarly has to go (the premade doesn’t actually have enchanted weapons – I just added this to the profile to illustrate the point). Recall that we talked about making other gear changes in the previous blog post due to versatility on gear. Any other gear changes we make in-game should also be reflected in the profile we feed to SimC. Likewise, we’re probably not going to bother using flasks or food in our in-game experiment just for convenience. Again, we should comment or remove those lines if that’s the case (and remember: if you remove or comment the first line of the list, you’ll need to change the new first line from actions.precombat+=/ to actions.precombat=/). However, note that there are cauldrons in Shattrath (Outland) on beta that give you full raid buffs and critical strike flask and food buffs. If you plan on using the cauldron, you’d want to modify these lines to reflect that. For reference, they would look something like this: actions.precombat=flask,type=greater_draenic_critical_strike_flask actions.precombat+=/food,type=blackrock_barbecue edit: It looks like paladins are bugged here and getting critical strike flask/food buffs regardless of spec. Other classes are getting a flask and food buff matching their spec’s secondary stat attunement. Thanks to Megan (@_poneria) for catching this. Which brings us to another issue: raid buffs. On beta, the cauldrons let you apply the full suite of raid buffs. But you may not always have access to that – maybe you’re testing something on live servers, or just testing in an area that doesn’t have these cauldrons handy, or turning some of them off to specifically test the way one of those buffs interacts with something. Simulationcraft is designed assuming you’re in a raid and you want all of those raid buffs, including Bloodlust/Heroism. If we want to disable them, we need to tell the simulation that. If you’re using the graphical user interface (GUI), you can toggle each buff on the Options -> Buffs/Debuffs pane. If you want to do it in the simc file, it only takes a single line of code: optimal_raid=0 That line, usually placed between the character details (level/race/etc.) and the action lists, turns off all of the externally-provided raid buffs, including Bloodlust. You’ll still be able to use any that your class brings as long as you have it in the APL. For example, if we added blessing_of_kings to the precombat action list we’d get the benefit of the 5% stats buff, even if we set optimal_raid=0. Likewise, if we want to enable specific buffs, we can do so using overrides in the code or the checkboxes in the GUI. By now, it should be clear that we’re going to have to go over the character profile with a fine-toothed comb to make sure it lines up as much as possible with our in-game test. Let’s say that for our in-game test, we’ve decided to attack a boss-level dummy with our level-100 pre-made character. We’ll only use auto-attacks, Crusader Strikes, and Judgments, while in protection spec and without any raid- or self-buffs. We won’t use any glyphs or talents that affect the damage of either spell, and we’ll un-equip our second trinket (which has a strength proc that we don’t want polluting our data). Looking through the profile, there’s a lot of extra fluff in here that we don’t need. We’re not going to be using Holy Avenger during this test, because it changes the amount of damage Judgment does. Since we’re just testing the damage of a few abilities, we can remove everything not related to those abilities from the action priority list. We’ll also get rid of all of the precombat actions other than applying Seal of Insight, and turn off all external raid buffs with the optimal_raid flag. There’s one more thing we need to change, though it isn’t obvious or intuitive. By default, Simulationcraft uses the average damage of an ability rather than making an actual damage roll. It does this mostly to save some time, because it executes a little faster. And in a normal simulation, where you’re making lots and lots of damage rolls and running for a few thousand or more iterations, using the average value instead of making individual damage rolls doesn’t have a significant effect on the statistics of the results. However, for this particular experiment we care a lot about it, because we’re going to want to compare the minimum and maximum damage values of our in-game tests to the values the simulation predicts. So we have to add the line average_range=0 to the profile somewhere. After doing all of that, Bob’s character profile looks like this: paladin="Bob" level=100 race=blood_elf role=tank position=front professions=Blacksmithing=600/Enchanting=600 talents=http://us.battle.net/wow/en/tool/talent-calculator#bZ!201121. glyphs=focused_shield/alabaster_shield/divine_protection spec=protection optimal_raid=0 average_range=0 iterations=50000 # Executed before combat begins. Accepts non-harmful actions only. actions.precombat=/seal_of_insight # Snapshot raid buffed stats before combat begins and pre-potting is done. actions.precombat+=/snapshot_stats # Executed every time the actor is available. actions=/auto_attack actions+=/crusader_strike actions+=/judgment head=primal_gladiators_plate_helm,id=111211 neck=primal_gladiators_choker_of_cruelty,id=111207 shoulders=primal_gladiators_plate_shoulders,id=111213 back=primal_gladiators_cloak_of_prowess,id=111206 chest=primal_gladiators_plate_chestpiece,id=111209 wrists=primal_gladiators_armplates_of_victory,id=111182 hands=primal_gladiators_plate_gauntlets,id=111210 waist=primal_gladiators_girdle_of_cruelty,id=111174 legs=primal_gladiators_plate_legguards,id=111212 feet=primal_gladiators_warboots_of_prowess,id=111178 finger1=primal_gladiators_signet_of_cruelty,id=111219 finger2=primal_gladiators_signet_of_accuracy,id=111220 trinket1=primal_gladiators_medallion_of_cruelty,id=111229 #trinket2=primal_gladiators_insignia_of_victory,id=111233 main_hand=primal_gladiators_hacker,id=111198 off_hand=primal_gladiators_shield_wall,id=111221 Considerably shorter! Note that while I deleted many lines, I simply commented out the second trinket slot, in case I decided I wanted to test with that trinket later. I’ve also added iterations=50000 to specify how many iterations I want to run (the default value is 1000). In practice, we may as well set our number of iterations high to improve our statistical knowledge of what the simulation is producing, even though we clearly don’t plan on logging several days worth of in-game testing. The more iterations we use, the more likely it is that we hit our extreme minimum and maximum values for each ability. Now that we’ve got both experiments (in-game and simulation) nailed down, let’s perform both of them and analyze the results. Collecting Data The Simulationcraft output generated by this character profile is here. While your usual method of reading a SimC report probably involves spending some time looking at the sections that summarize the overall stats like DPS, HPS, and so on, we’re not that interested in those. We’re going to skip right down to the “Abilities” section, which looks like this: The Simulationcraft report’s Abilities section. A veritable goldmine of information. This section gives you a great breakdown of statistics for each ability. It tells you stuff like how much DPS or HPS that ability does, how many times its cast per iteration (“Execute”) and the average time between casts (“Interval”), the average hit and crit sizes as well as the average damage per cast (“Avg”), and so on. Most people have at least seen this section before, though you may not have seen the new pretty version (with icons!) that we’ve implemented for WoD. What many people don’t know, but is crucial to you as a theorycrafter, is that we can get even more information. If you click on the ability’s name, it will expand that section to give you a lot more detail: Expanding the ability entry gives loads of additional information. This is a full stats breakdown for that ability. Of most relevance to us is the table that shows the statistics for each possible result of the action. By looking at the row labeled “hit” in the “Direct Results” column, we can see exactly how many of our casts were hits (79.47%) and their minimum, maximum, and mean values for the simulation overall (2675 to 3058 damage). There’s also plenty of other information here that you might find useful, including a bunch of details about the spell data near the bottom of the expanded section. If there’s interest, I may write another blog post in the future discussing what all of this stuff is, but for now let’s settle for being able to get our minimum and maximum values from the table. If we expand the sections for Judgment and melee, we find that Judgment’s hit damage ranges from 5523 to 5524, and our melee attacks hit for between 2384 and 2704. Now let’s look at the results of the in-game test. I smacked around a raid boss target dummy for about five minutes to collect the following data set. If you go to the “Damage Done” tab and mouse over the bars, you’ll see the breakdown by result type: The Warcraft Logs ability damage breakdown tooltip. Here we see that our minimum and maximum melee attacks hit for 2396 and 2702, respectively. We can extract similar limits for Judgment (5524-5525) and Crusader Strike (2685-3052). Now that we have the data we want, let’s analyze it. Analyzing Data We can summarize all of our relevant data in a quick table: Damage Results, Hits Only Ability Min(SimC) Max(SimC) Min(Game) Max(Game) CS 2675 3058 2685 3052 J 5523 5524 5524 5525 Melee 2384 2704 2396 2702 The first thing to note here is that for CS and Melee, SimC gives lower minimum bounds and higher maximum bounds. That’s to be expected, because we ran the simulation for a long time, but our in-game test was pathetically short (about 5 minutes). With only 50-100 casts, we just haven’t taken enough in-game data to reasonably expect to hit the boundaries. But it’s good enough to illustrate the basic process. We’d be a bit surprised if our in-game maximum was higher than our simulation maximum, or likewise if the in-game minimum was lower than the simulation minimum. While this could happen, statistically speaking it’s very unlikely for a long sim. That would be a strong indicator that our formula (in SimC) is off somehow, and we’d need to design an in-game experiment to test that. For example, we might have to collect data from a few hundred CS casts at several different AP values so that we can determine the proper AP coefficient. You may have noticed that the Judgment data doesn’t quite agree. Judgment is easy because (again, at least for the moment) it doesn’t have a damage range. If the damage formula the game uses spits out 5524.3, it’ll generate damage values of 5524 and 5525. The game does a floor(result+random(0,1)) to determine how often it uses each, so we can also use the frequency of each result as a debugging tool. Our simulation contains a systematic error in that it’s always off by exactly 1 damage. This could be due to an errant AP or SP coefficient (though Simulationcraft is actually extracting those directly from Blizzard’s spell data) or an errant base damage value (Judgment’s spell data still indicates it has a base damage of 1), or something else entirely. One way to check is to do a hand-calculation. The spell data claims that the SP coefficient is 0.5021 and the AP coefficient is 0.6030, and that it does a base damage of 1. You can get all of this information from the game files using Simulationcraft’s spell_query function, shown below (command-line only): Simulationcraft’s spell_query command and output for Judgment. The base damage and SP/AP coefficients are in Effect #1. What we call the base value is actually really the “Scaled Value” in the spell data. The default way WoW calculates ability damage is to add the spell power and attack power contributions to the base damage and then apply multipliers, or $${\rm damage} = ({\rm base\_damage} + {\rm SP\_coeff}*{\rm SP} + {\rm AP\_coeff}*{\rm AP}) * {\rm multipliers}.$$ Judgment is a rare spell that has both an SP coefficient and an AP coefficient – most spells only have one or the other. As for multipliers, we know that the Improved Judgment Draenor perk should boost the damage by 20%. Our versatility will also increase it by 1.72% based on the in-game tooltip (or by hand, 224 rating gives 224/130=1.7231% extra damage, or a multiplier of 1.017231). So if we want to calculate Judgment’s damage by hand, we could multiply all of that together appropriately: $${\rm damage} = (1 + 0.5021*4095 + 0.6030*4095)*1.2*1.017231 = 5525.25$$ That’s curious. This formula suggests we should be seeing 5525-5526 damage, which is higher than either of our experimental observations. We’re pretty confident in the AP and SP coefficients though, as well as the multipliers that get tacked on. So something else must be going on. By the way, I didn’t just fabricate this error for the blog post – I actually ran into this while writing it up, and ended up spending about 30 minutes figuring out the answer. So you’re witnessing real theorycrafting happening (albeit with a slight time lag, of course). At this point, we’d probably start trying things. I went into MATLAB and tried variations on that formula, particularly tweaking the way base damage is included since I suspected that to be the source of the error. It turns out that wasn’t the case, because no sane variation matched the damage range and the frequency of each result. Out of fifty casts, we have one 5524 result and forty-nine 5525 results, suggesting that we need to be getting something in the 5524.9ish region from our hand-calculation. Eventually I fired up Visual Studio and started debugging, which led me to notice that it was using 4094 AP during the damage calculations, even though it was reporting 4095 AP in the output. That accounts for the discrepancy between the SimC results and the in-game results, which is great, but it doesn’t explain why the hand-calculation doesn’t match. However, it gave me a hint as to what was wrong. The character has 3616 strength, and thus starts with 3616 attack power before we apply the multiplier from our mastery. The 13.24% mastery we have increases attack power by that amount, so our net result should be $${\rm Attack Power} = 3616*(1+0.1324) = 4094.7584$$ The character sheet is clearly rounding this up to 4095. Simulationcraft was applying a floor() function to turn it into 4094, at least for damage calculations. But neither of those give the observed damage range, as we’ve seen. The solution seems obvious here – what if attack power isn’t an integer? Let’s try that calculation one more time using the full decimal value of 4094.7584: $${\rm damage} = (1 + 0.5021*4094.7584+ 0.6030*4094.7584)*1.2*1.017231 = 5524.9284$$ Aha! That perfectly fits the range we observed in-game. Most of the time, we’ll get 5525, but once in a rare while we’ll get 5524. In the experiment, that’s exactly what we observed. So not only have we validated Judgment’s damage formula, we’ve also discovered that our attack power and spell power values aren’t integers, they’re floating point values! Why is that important to you as a theorycrafter? Well, if you use the integer values your character sheet gives you, it means you’re reducing the precision of your estimates by rounding them to the ones digit. As a result, you wouldn’t trust any results you get to be accurate to any more than about$\pm 1$damage. In all likelihood, your results might be off by one, just like our original hand-calculation was. In practice, there are ways to quantify this (for example, on a crit the error might increase to$\pm 2$or$\pm 3$). But as a rough rule of thumb it’s good enough to know that you might be off by one or two in the digit you’re rounding. More Complicated Testing Of course, this was just a simple test of ability damage. You can do quite a lot more with Simulationcraft, it all comes down to tweaking the character profile to fit whatever situation you’re trying to test. Sometimes that might not even require an in-game test for comparison. For example, you might decide to enable the fixed_time flag and count the number of ability uses to see if haste is being taken advantage of properly in the simulation – something you could compare to a simple hand calculation. You could perform similar tests to validate the uptimes of certain buffs or effects. On the other hand, sometimes you need a more complicated profile to test something like an interaction between two different abilities or effects. Often, that involves using conditionals on the action list. To illustrate that, let’s say we had a set bonus that gave us a chance on melee attack to proc a buff called “Super Judgment” which increased Judgment’s damage by 10%. We might want to know whether that bonus is multiplicative or additive with the Improved Judgment perk. In case it’s not clear what that means, let’s say Judgment does$X$damage before either effect. If the two effects are additive, then the total damage including both effects would be $$T = X * (1 + 0.2 + 0.1) = 1.3*X.$$ If the two are multiplicative, then the total damage would be $$T = X*(1+0.2)*(1+0.1) = 1.2*1.1*X = 1.32*X.$$ Since Judgment appears to do fixed damage (at least, right now…) this would be pretty easy to test. If it suddenly got a damage range, then we’d need to take a bunch of data and determine which version is correct based on the minimum and maximum damage values that we observe, just like we did above for Crusader Strike and melee attacks. If we want to find out whether Simulationcraft has this correct, we could just ask a developer. But it might be just as fast to run a test ourselves. With the APL, actions=/auto_attack actions+=/judgment,if=buff.super_judgment.react we would limit ourselves to using Judgment only when the buff was active. The react in that statement just tells the sim to consider the player’s reaction time – in other words, the buff.super_judgment.react conditional evaluates to true if the buff has more than a few hundred milliseconds remaining. Running the simulation for 50k or 100k iterations (which is relatively fast as long as you’re not doing anything fancy, like calculating stat weights) would give us pretty good maximum and minimum damage bounds that we could check against our in-game data. Another neat trick that most players aren’t aware of is the “Sample Sequence” part of the report. It’s buried in the “Action Priority List” section, shown below: A Simulationcraft report’s Action Priority List section. This section tells you about the action priority list you’re using, but at the bottom you get a sample cast sequence for the player. This can get really ugly if your APL has lots of different spells, especially if some are off-GCD like Shield of the Righteous or Word of Glory. Nonetheless it’s a tool you can use to try and debug rotations. For our simple APL, it’s quite useful. We might expect a nice sequence of CS-J-E-CS-E-J-CS-E-E, where the E’s are all empty GCDs. In other words, the sequence of casts would be CS-J-CS-J-CS, or 34343 if we replace each abbreviation with the number on the action priority list. Since that sequence repeats, our sample sequence in the report should be an unending string that looks like 34343343433434334343. If we look at what the sim produces, we get a single 2 in the front to indicate we’re starting our auto-attacks (in SimC we only cast this once at the beginning to turn them on on). But after that, we get the sequence 3434343-34343-34343-3434343; not quite what we were expecting. This is something we might want to investigate, because it tells us that sometimes the simulation is casting Judgment instead of Crusader Strike when they are both available, in theory. I also want to draw your attention to two other sections of the report that are useful to theorycrafters. The “Statistics & Data Analysis” section, shown below, gives you a thorough statistical breakdown of major encounter metrics like DPS, DTPS, TMI, and so on. Bob’s Statistics & Data Analysis section. Note that you can change the confidence intervals used by modifying the confidence option, as documented in the wiki. This section can be very useful if you want quantitative information about the distribution of the data across all iterations. Finally, you may already know about the “Stats” section, which documents your character’s stats: Bob’s character stats. This section can be immensely useful when trying to sync up in-game results to simulations. Comparing these stats to your character sheet values is a good way to identify discrepancies between the profile you’re simulating and the character you’re using to perform your in-game testing. In fact, I’ve spent a fair amount of time comparing this table to the stats given on the character sheet in beta to make sure we’re doing all of those calculations properly. The process of doing that led to some interesting discoveries about primary stats (hint: they’re not integers either – more on that in a future blog post!). How Not To Succeed In Theorycrafting Obviously, as your test gets more complicated, so does your APL. Eventually, it may include entire rotations. Which brings us to one of the biggest mistakes that we see beginners make. They fire up Simulationcraft, import their character, hit simulate, and then immediately compare their results to their most recent week’s raid logs. If you’ve been keeping up with this series of posts, you almost certainly recognize the error that was just made. Unfortunately, a lot of players don’t. And when the two don’t match very well, they decide that Simulationcraft must be in error and conclude that the tool is useless. I’d guess that the vast majority of people that tell me that Simulationcraft’s modeling isn’t very good are actually just using it wrong. Or in tech support speak, PEBKAC. However, having recently graduated from the Theck School Of Designing Good Experiments, you know that to have any hope of comparing an in-game result to a simulation, the two need to be as similar as possible. And a real raid environment is very different than a simulation in which you smack Patchwerk around. There is no encounter in Siege of Orgrimmar that is well approximated by a simple Patchwerk-style encounter in Simulationcraft – they all have some component that makes the comparison a little suspect. That certainly doesn’t mean the results are useless. We often glean insight from how a class performs in a Patchwerk encounter, and generalize that to apply it to real encounters. In some ways, a real encounter is a series of little Patchwerk sections interwoven with periods of movement, cleaving, and other mechanics. But it does mean that you generally won’t get the same DPS values when you compare a raid log to a Patchwerk simulation. Also note that you can do a lot more than just Patchwerk in SimC – there are a variety of different fight styles, and you can add your own custom raid events and customize the boss’s action priority list to try and mimic real boss encounters. If you’re going to try to test a rotation, you want to stick to the same principles you would use for a more basic test. The rotation you perform in game needs to match the action priority list you set up as closely as possible, as do the character properties, gear, talents, buffs, and so on. This is one of the hardest things to test since it can be tricky to perform a flawless rotation for long enough to collect a sufficient amount of data. Making a few mistakes probably won’t completely invalidate your results, but keep in mind that it’s very easy to sneak some systematic or random error into your comparison via your actual in-game rotation. And from the other side of things, it can pay to make sure the simulation is really doing what you think it is. For example, our simple CS/J rotation isn’t doing what we expect for some reason, and while it wasn’t very relevant in that test since we were only checking ability damages, it would be very relevant if we were trying to test a rotation. Before you try your in-game test, use output data like the Sample Sequence, ability interval times, and number of casts to make sure that your simulated rotation is what you’ll be replicating in-game. Going Further So far, this series has covered the bulk of the material necessary to start doing your own theorycrafting. There are lots of nitty-gritty details we could talk about, but I’m trying to write an introductory guide rather than an encyclopedia. I’m hoping to write a series of smaller blog posts over the course of the beta period tackling specific issues that highlight some of those details that you might not otherwise encounter. The one big omission is what I’d call “high-level theorycrafting” in an analogy to “high-level languages” in programming. The name is a little misleading, in that it doesn’t imply particularly complicated or amazing work. Instead, it’s “high-level” because it glosses over a lot of the details and assumes the underlying tool is accurately handling those details. To explain the etymology of that idea: C++ is one of many “high-level languages” because the person writing the code doesn’t have to worry about the ugly details of moving each bit of data from one memory location to another. By comparison, assembly (sometimes called “machine code”) is a “low-level language,” because you have to write out every single operation the processor performs. It’s tedious and difficult work, and not the sort of language you’d want to write an entire program in. Instead, we have an interface (called the compiler) that lets us write in a high-level language like C++, and translates that high-level code into low-level code for us. What I’ve taught you so far is “low-level theorycrafting.” You now know how to move all the bits around, one by one. You can test the most basic interactions in the game, describe them mathematically, and confirm whether or not those mechanics are properly represented in Simulationcraft. This is some of the hardest work theorycrafting has to offer, but also some of the most basic and important work that needs to be done. “High-level theorycrafting” is in many ways a lot easier. You fire up Simulationcraft and start tweaking the action priority list or gear set, and take notes on the simulation outputs. This is, in fact, how most people get their start in theorycrafting. There’s a fair chance that if you’ve read through this entire series of blog posts, you’ve already tried it. Maybe you ran your character through Simulationcaft twice with different trinkets to see which was better, or tweaked a line on an action priority list to see if it gave a DPS boost. All of that qualifies as high-level theorycrafting in my book. The problem with starting at that level is that you’re not yet equipped to know whether you can trust your results. If I handed you a magical black box that was able to evaluate a bridge design and tell you whether it would fall down or not, and asked you to design a bridge, could you? You could fumble around with designs of bridges you’ve seen, and maybe even get the box to approve your design. But you’re relying on the box being correct, and you wouldn’t have the tools to determine whether it’s making a mistake. That’s why real bridge engineers start by learning basic physics concepts like forces and kinematics, and work their way up to being able to design entire bridges. (Aside: these “magical black boxes” really do exist in bridge engineering – they’re software packages that do a lot of complicated math/physics to evaluate designs, and like any software package sometimes they have bugs. That’s why you have several real (human) bridge engineers double- and triple-check the work before you start construction.) That’s why we took the route we did through these posts. Because by building up your skills from the basics, you now have the knowledge and skills required to generalize to more complicated systems like rotations or gear sets. When you get results you don’t expect from your high-level work, you’ll be able to dig into the meat of the output and figure out the low-level reason why. While there are certainly some tips and tricks that are helpful when doing “high-level theorycrafting” in Simulationcraft, you don’t really need them to get started. I’m not even certain they warrant an entire blog post, but I hope to put one together to discuss those ideas in more detail anyway. I hope you’ve enjoyed this little tutorial, and more importantly I hope you’ve found it useful. As usual, I’m happy to entertain questions in the comments section if there’s anything you feel I’ve left out or want more information on. In addition, any suggestions you have for future installments of TC101 are most welcome. Theorycrafting Resources To end this series, I’d like to leave you with a few references you can go to for additional learning and/or help. The Simulationcraft Wiki has a lot of information about how to tweak profiles to get what you want. We try to keep the wiki up-to-date, but the documentation often lags development a little bit. When in doubt, you can always fire up your favorite IRC client and hop into our IRC channel at irc.stratics.com (#simulationcraft) and ask for help. Many of the devs make a habit of being in there and providing assistance, especially to theorycrafters that are interested in helping contribute. Note that we do eat and sleep from time to time, so don’t be discouraged if you don’t get an answer instantly – you may just have to try again another time or day when people are there. The Elitist Jerks forums are still a solid place for many classes. There have been complaints that the community is slowly dwindling, which may be true. I still post results there because the level of discourse is pretty high and posters tend to be pretty good at critical analysis. More importantly for the new theorycrafter, there’s a wealth of good posts and discussions from previous expansions to wade through that detail a lot of the game’s core systems. Some of that information is old, but much of it is still relevant, and the posts can be great examples of how to thoroughly research a topic and report your work. The MMO-Champion forums are another place you can check for theorycrafting, though just like Elitist Jerks, it varies in quality on a class-to-class basis. The Icy Veins and Wowhead forums may have information, but in my experience they tend to focus more on class guides and advice than on theorycrafting discussion. There are also a host of class- or role-specific sites. Tankspot is a good resource for tank-spec theorycrafting (especially warriors), as is Maintankadin for paladins, The Inconspicuous Bear for Guardian druids, How To Priest for priests, #Acherus chat for DKs, Altered Time for mages, and so on. I’m sure there are other sites for specific classes, but those are the ones I know off the top of my head. As a theorycrafter, you should probably already be aware of the major sites for your particular class. Both Wowhead and MMO-Champion’s wowdb are databases of spell information that can be helpful if you know what you’re looking for. Both have useful features like a “Modified By” tab that tells you what other spells affect an ability, which can help track down undocumented effects or set bonus spells. Wowhead also has a neat Changelog that shows you how the tooltip has changed in each patch. But don’t forget, tooltips can lie! WoWWiki and WoWpedia are both potential resources, though their information is frequently out of date. But they can still be quite useful for archival information, like how item stats are calculated (obviously changing in WoD, but…) or how spell resistances used to work. There are also plenty of personal blogs that discuss theorycrafting topics. One of the more general ones is Hamlet’s blog, where he posts a mix of healing theorycrafting, critical/conceptual analysis of wow (especially beta) mechanics, and mathematical treatments of mechanics. For example, in my last blog post I linked to his discussion of how wow spells calculate damage range, and in the past he’s posted on topics like how HoT mechanics work, how specific trinkets work, and how to compute uptimes of proc-based buffs. Digging through his archives is a great way to learn a little about math and WoW mechanics. There are far too many personal blogs to list all of them, so I won’t even attempt to try (that way I can’t accidentally miss someone and piss them off). Instead, if you have a blog where you talk about theorycrafting topics, please post a comment with a link to your blog and a brief description of what you do, particularly what class or classes you work on. As a theorycrafter, you should figure out which blogs cover material specific to your class and keep up with them. Taking some time to browse through their archives will probably teach you a lot as well. | | 9 Comments ## TC101: Experimental Design In the previous post, we talked about what theorycrafting means and worked through a basic example of beginner theorycrafting. In this post, I want to go into a little more detail about the laboratory-based part of theorycrafting – in other words, designing and carrying out in-game “experiments” to test how mechanics work. WoW Experiments Luckily, “experiments” in WoW are pretty simple in a relative sense. While the entire system may be complicated, we generally have a good idea about how things work and what’s causally related (and not). For example, I know when I press the key for Crusader Strike, my character will cast Crusade Strike on my target, if possible. I know that the damage it deals depends on a few factors: my weapon damage, my own stats and temporary buffs, any damage-increasing debuffs on the target, the target’s armor mitigation (which depends on its armor and both of our levels), and so on. Even if we don’t know exactly how those relationships work – that’s what we’re testing after all – we know that they exist or might exist. Likewise, we can also quickly rule out a lot of variables. We don’t expect our Crusader Strike damage to depend on the time of day or the damage that another player is doing to a different target. This sounds silly, but it’s actually a pretty big deal. In real experiments (i.e. in a research laboratory), there are loads of external factors that can affect results, and we have to take great care to identify and eliminate (or at least minimize) those factors. WoW experiments are incredibly easy because we don’t have to do much of that at all. To illustrate that thought, let me give you a real-life example. As an undergraduate, I spent one summer doing nuclear physics research at the University of Washington. One of the research groups there was making precise force measurements to test General Relativity. Their setup involved a very specially-designed arrangement masses and a smaller (but still hefty) hanging mass oscillator driven by a small motor. When they made their measurements, they found a deviation from what they expected. After hours and hours of brainstorming, adjustments, repeating the experiment, and what not, it was still there. They looked at every external factor they could think of that might affect the result, and nothing seemed to be the culprit. After a few months of this, the grad students were beginning to think that maybe they had made a breakthrough discovery. Their advisor, however, wasn’t as convinced. He made them continue searching for the error. I think he even made them build a second copy of the experiment from scratch to cross-check the results. In any event, eventually they narrowed down the culprit to the motor. As it turned out, the one they had been sent did not meet the manufacturer’s specifications (which had been pored over and chosen very carefully for exactly this reason!), and was malfunctioning in a very subtle way that caused the anomaly they observed. In WoW experiments, we rarely, if ever, need to worry about being influenced by factors that aren’t immediately obvious. Generally, we have a very limited set of variables to work with, so identifying and isolating problems is pretty easy. Basic Experimental Design Before performing any experiment, you should first make sure you can answer (or have at least tried to answer) all of these questions: 1. What am I trying to test (i.e. what question am I trying to answer)? 2. What am I going to vary (and how)? 3. What am I going to hold constant? 4. What am I going to measure (and how)? 5. How much data do I need to take? The first is pretty obvious – it’s hard to perform an experiment if you don’t have a clue what it is you’re trying to determine. Using our example from the previous post, if my question is “How does Judgment’s damage vary with attack power,” then the obvious answer to (1) is that we’re going to test whether Judgment’s damage changes when we change our character’s attack power. So far so good. Variables Our question also gives us the answers to questions (2) and (3). We’re going to vary attack power, and we want to (ideally) keep everything else constant. Implicit in this is a central tenet of experimental design that you should try to adhere to as often as possible: only vary one thing. That one thing is called the independent variable, in this case our attack power. In an ideal world, we only ever have one independent variable, so that we know for sure that whatever change we see in the measurement is due to that variable. For a concrete example of that, let’s say we have an ability that depends on both weapon damage and spell power. If we make our measurements in such a way that we’re changing both our weapon and our spell power, we have a giant mess. We’d have to untangle it to determine how much of the change in damage was due to the weapon and how much was due to the change in spell power. That task may not be impossible, but it usually significantly complicates our data analysis. In some cases, that complication is unavoidable. For example, if you look back at the most recent diminishing returns post, you’ll see that I was performing surface fits to three-dimensional sets of data with two independent variables. I had to do that to properly and accurately fit the two constants of the diminishing returns equation simultaneously, and only worked as well as it did because we already had a good idea what the formula looked like and what those constants were from prior experience and single-independent-variable experiments. In general, I wouldn’t recommend this technique to a beginner theorycrafter. So, in our case our independent variable is attack power, and we’re going to keep all of the other potential variables constant. These constants are often called controls, though I prefer to call them “fixed variables” or “variables held constant” because they are variables, just ones you don’t want to vary. So our list of fixed variables includes crit, mastery, multistrike, versatility, etc. This poses a potential problem for us, though. We haven’t yet answered the question, “how are we going to vary our attack power?” Normally, we would do this by putting on or taking off gear to change our strength. That seems pretty straightforward, but since gear has secondary stats, we’re also changing our crit, mastery, etc. at the same time! Sometimes you can get around this by using certain temporary effects. For example, if we have several trinkets with varying amounts of attack power on them (and nothing else), we could swap them around to isolate attack power as a dependent variable. But we aren’t always that lucky, so generally we’re going to need to make some compromises here. We might know that some of these are irrelevant – crit, for example, won’t change our results as long as we’re filtering out or adjusting for crits. Likewise, we probably don’t care what our multistrike chance is as long as we’re ignoring multistrike results. In cases where we’re sure, we can be more lenient about letting those factors vary. Since we know that crit and multistrike are independent events, we can safely ignore them as long as we’re careful during our data collection (see below). But sometimes we’re not sure about it – for example, we may not know whether mastery does or doesn’t affect the damage of Judgment. And for another example, Chaos Bolt damage does increase with crit. Since we don’t always know, it’s safer to try and keep everything else constant when possible. As an experiment designer, your goal is to juggle these constraints. You’ll be searching for ways to isolate a particular variable (say, attack power) while keeping certain other variables constant. But you’ll also have to decide when it’s acceptable for another variable to change value during an experiment, sometimes by confirming that the variable is irrelevant to the test at hand. Often this means thinking critically about how you’ll design your experiment. For example, if you were testing ability damage, you could safely ignore hit rating, but only if you made sure you ignored misses when you tallied up your results. When testing ability damage in previous expansions, we generally just took off gear to change attack power. This wasn’t a problem because hit, miss, crit, haste, dodge, and parry were all independent from the raw (non-crit) damage done by abilities when attacking a target dummy. Multistrike doesn’t appreciably complicate matters, but as you might guess, versatility is a big problem. Which means we have at least one serious constraint on our experiment: we want to use gear that doesn’t have any versatility. Again, if we didn’t have any other choice, we could use advanced techniques to get around this constraint, but it’s far simpler if we just adhere to the constraint. This brings up another major constraint on most in-game experiments: we don’t want any procs that change our stats temporarily. Otherwise, we’ll have periods where our experimental conditions have changed, which will make the data difficult or impossible to analyze properly. So generally, we want to take off any trinkets or special gear that has stat-changing or damage-increasing procs, and we want to avoid using weapon enchants like Dancing Steel that give temporary buffs. Measurements Which finally brings us to question (4). We’re going to measure Judgment’s damage, obviously. The thing we’re measuring is called the dependent variable, because its value may depend on the independent variable. But we don’t just need to know what we’re measuring, we also need to make sure we know how we’re going to perform that measurement and what we’re going to do with the result. For example, am I going to cast Judgment and write down the value listed in the combat log on a piece of paper? That might be fine if I only need one or two casts, but could quickly become cumbersome if I plan on collecting hundreds of casts worth of data. More commonly, we’d turn on combat logging (via the /combatlog slash command) so we have a text record of everything. From there, we could upload that result to a parsing site like Warcraft Logs, import the log into MATLAB, write a quick script to scrape the log and convert to a CSV file we can open in Excel, or any number of other analysis methods. Similarly, within those steps, are we just going to count normal Judgment hits and ignore crits and multistrikes entirely? Or are we going to try to use that extra data, say, by dividing each crit by two and each multistrike by 0.3 and using the adjusted values? The latter gets us more data faster, which could reduce the amount of time it takes. But it relies on two very specific assumptions: that crits do 2x as much damage and multistrikes do 1.3x. If either of those are wrong, for example because we have a crit meta gem on, or our spec’s multistrikes have a different modifier, then our data is polluted. Furthermore, we know we’re going to cast Judgment, but we haven’t specified how we’re going to cast it. Are we going to cast it on cooldown? Usually that’s the case, but sometimes we might have to wait longer to avoid another unwanted effect (think a single-target version of the Double Jeopardy glyph). Are we going to cast it on a single dummy, or tab-target around between different ones (for example, to test Double Jeopardy). If so, must those dummies all be the same level? Multi-target considerations are obviously even more important when testing AoE abilities. Are we only going to cast Judgment, or are we going to cast other things while we’re at it? Maybe we’ll do a single data collection run that combines multiple tests – say, simultaneously Judgement, Crusader Strike, and Avenger’s Shield damage. If we’re casting multiple things, are we sure they don’t interact at all? All of these are questions you’ll need to consider when deciding on your experimental method or procedure. No matter what we decide to do with the data or how we decide to collect it, we should know the entire plan ahead of time to make sure we’re collecting the right thing. It can be incredibly frustrating to take several hours worth of data, and later during analysis find out that your measurements depend on another factor that you forgot to record. This means that every time we swap gear, we might want to write down everything – the value of all potential variables (strength, agility, intellect, attack power, spell power, mastery, crit, multistrike, haste, versatility) – before we start taking data. That way, if we start analyzing our data and find an anomaly, we have some information we can use to determine what the problem was. For example, maybe we accidentally removed a piece with versatility on it that we intended to keep on for the entire test. If we’ve been recording versatility before every new set of data, we might be able to catch that after-the-fact and be able to salvage that data set, or at least know why we need to exclude it. Without that knowledge, we might have to re-take the data. Again, if you’re very certain that a particular stat doesn’t matter (haste, for example, in our case), you can skip recording it. In practice, I rarely record everything. By now, I’m familiar enough with what factors matter that I generally write down only the handful of things I care about. That sort of intuition will come with time, practice, and familiarity with the mechanics. But even I make mistakes, and end up re-taking data (versatility is especially bad in that regard, because I’m still not used to it), so for a beginner I’d recommend erring on the side of caution. Data Collection The next question we need to answer is how much data we need to collect. This will vary somewhat from experiment to experiment. For example, if we just want to know whether Judgment procs Seal of Truth, we might only need a single cast. But more often, we’ll need to invoke some statistics. In this section, we’ll give a brief overview of two common ways to use statistics to determine just how much data we need. Unknown Proc Rate For example, let’s say we’re trying to accurately determine the proc rate of Seal of Insight. We expect we’ll need to record a lot of auto-attacks and count the number that generate a proc. We can use statistics to figure out how many swings we need to make, at minimum, to feel confident in our result. That amount could be a few hundred swings or even several thousand depending on how accurately we want to know the proc rate. Proc-based effects are usually modeled by a binomial distribution because they’re discrete events with two potential outcomes (proc or no proc), every proc chance is independent (usually), and the proc rate is constant (again, usually). Most of the time, we can use something called the Normal Approximation Interval to estimate the possible error of our measurements, which we can reverse-engineer to figure out the number of swings we need. In short, thanks to the Central Limit Theorem we can approximate the error in our measurement of a proc chance$p$with the following formula: $$p \pm z\sqrt{\frac{p(1-p)}{N}}$$ where$N$is the number of trials (in our case, swings) and$z$is a constant that depends on how confident we want to be on the result. If you want to know how to calculate$z,$read up on Standard normal distribution, but most of us just use one of several precalculated values. The most common one is to use$z =1.96$, which corresponds to a 95% confidence interval. Other common values are$z=2.58$for a 99% confidence interval and$z=3.29$for a 99.9% confidence interval. If you’re lazy (like me) or don’t feel like memorizing or looking up those numbers, you can use$z=2$and$z=3$as rough approximations of the 95% and 99+% confidence intervals. The way this is normally used, you’d first run an experiment to collect data. So maybe we perform 100 swings and get a proc on 25 of them. We would then have$p=25/100=0.25$and$N=100$, and our 95% confidence interval is$0.25 \pm 0.0849$, in other words from 16.51% to 33.49% – a pretty wide range. We’d get a narrower range if we performed$N=1000$swings and got 250 procs;$p$is still 0.25, but the 95% confidence interval shrinks to$\pm 0.0268$, or from 22.32% to 27.68%. Since we’re dividing by$\sqrt{N}$in the formula, to increase the precision by another decimal place (factor of 10) we need to use 100 times as many trials. And of course, we can also use this formula to figure out how many iterations we need to reach a certain precision. Let’s say we want to know the value to precision$P=\pm 0.001$. We can set$P$equal to term that describes the interval: $$P = z \sqrt{\frac{p(1-p)}{N} }$$ and solve for$N$: $$N = \frac{z^2 p(1-p)}{P^2}$$ So for example, let’s say we suspect the proc rate is$0.25$(this would be our hypothesis). If we want to know the proc rate to a precision of$\pm 0.001$with 95% confidence, we need$N=720300$melee swings. There are two caveats here. First, this formula is only an approximation, which means it’s got a range of validity. In particular, it becomes poor if$p$is very close to zero or one and breaks down entirely if it’s exactly zero or one (though in those cases, the behavior is usually clear enough that we don’t need this method anyway). The rule of thumb is that it gives good results as long as$pN>5$and$(1-p)N>5$. Since this rule includes$N$, you can still use this approximation when$p$is very small by increasing the number of trials$N$to keep the product over 5. The second caveat is very subtle. Technically speaking, if we find the 95% confidence interval to be$0.25 \pm 0.05$, or 20% to 30%, that does not mean that the true value of the proc rate (let’s call it$\mu$) has a 95% chance to be between 20% and 30%. Instead, it means that if we repeat the experiment 100 times, and calculate confidence intervals for each of them, 95 of those confidence intervals will contain the true value$\mu$. The wikipedia article for confidence interval makes this distinction as clear as mud (though to be fair, it’s better than any other treatment of it I’ve read;$1-\alpha$is their representation of confidence, so for a 95% confidence interval$\alpha=0.05$): This does not mean there is 0.95 probability that the value of parameter μ is in the interval obtained by using the currently computed value of the sample mean. Instead, every time the measurements are repeated, there will be another value for the mean X of the sample. In 95% of the cases μ will be between the endpoints calculated from this mean, but in 5% of the cases it will not be. The calculated interval has fixed endpoints, where μ might be in between (or not). Thus this event has probability either 0 or 1. One cannot say: “with probability (1 − α) the parameter μ lies in the confidence interval.” One only knows that by repetition in 100(1 − α) % of the cases, μ will be in the calculated interval. In 100α% of the cases however it does not. And unfortunately one does not know in which of the cases this happens. That is (instead of using the term “probability”) why one can say: “with confidence level 100(1 − α) %, μ lies in the confidence interval.” Got all that? In practice, the distinction isn’t that important for us – it’s mostly a matter of semantics. If we were submitting our work to a scientific journal, we’d care, but for theorycrafting we can be a little loose and fast with our statistics. Just don’t tell the real statisticians. The key is to remember that you can run an experiment and get a confidence interval that doesn’t contain the value you’re looking for. In fact, it’s almost certain that you will see that happen if you’re using 95% confidence intervals, just because a 5% chance is pretty high. That’s one in every 20 experiments. When that happens, you may need to take further measures. That may mean repeating the experiment, or it may mean using a tighter confidence interval. Sometimes it means that you reject your hypothesis, because the value really isn’t inside the confidence interval. This is where the critical thinking aspect comes in – you have to interpret the data and determine what it’s really telling you. Remember that you can always increase$z$to calculate a more inclusive confidence interval, which doesn’t require taking extra data. Sometimes that will answer the question for you (“The result is outside the 99.9% confidence interval, the hypothesis is probably wrong, Seal of Insight’s proc rate is not 25%”). And on the other extreme, you can increase$N$to reduce the size of the confidence interval if you’re trying to increase the precision of an estimate. Though that obviously means taking more data! Unknown Proc Trigger Sometimes, we just want to know if an effect can occur. For example, on beta I was doing some testing to see exactly how Sacred Shield benefits from multistrike. To test that, I just kept re-applying Sacred Shield until I saw it generate bubbles that didn’t match the baseline or crit values. A single multistrike should generate a shield that is 30% larger than the baseline one, so I just kept going until I observed one. Likewise, I wanted to know if a double-multistrike proc would generate a shield that was 60% larger, so I kept casting until I observed one of those as well. This type of test, where a single positive result proves the hypothesis, can be very easy to perform if the chance is high. If I get a positive result very quickly, the test won’t take very long at all. But if the chance is low, you could be at it all day. And if the chance is actually zero (because your hypothesis is wrong), you could go forever without seeing the event you’re looking for. Again, statistics help us make the determination as to how many times we need to repeat the test before we can say with reasonable certainty that the event can’t happen. Generally in this type of test, you already know the proc rate. For example, if I have 10% crit and I want to know if a particular ability can crit, I know that it should have either a 0% chance (if it can’t crit) or a 10% chance (if it can crit) to do so. So my$p$here is 10%, or$p=0.10$. According to binomial statistics, if I perform the test$N$times, the probability of getting exactly$k$successes is calculated by: $$Pr(k; N, p) = {N \choose k} p^k (1-p)^{ N-k},$$ where $${N \choose k} = \frac{N!}{k!(N-k)!}.$$ This is relatively easy to evaluate in a calculator or computer for known values of$p$,$N$, and$k$. Many calculators have a “binopdf” function that will do the entire calculation; in others you may need to calculate the whole thing by hand. So let’s say we perform the experiment. We cast the spell 100 times and don’t observe a single crit. The probability of that, according to our formula, is: $$Pr(0; N, p) = {N \choose 0} p^0 (1-p)^N = (1-p)^N$$ Plugging in$N=100$and$p=0.1$we find that the probability is around$2.6\times 10^{-5}$, or around 0.00266%. Pretty unlikely, so we can probably safely assume that the ability can’t crit. Though again, there’s a 0.00266% we could be wrong! A related problem is if we want to know the probability of getting up to a certain number of procs or crits. For example, if we have 10% crit, what’s is the probability of getting five or fewer crits in 100 casts. To do that, we’d have to calculate the separate probabilities of getting exactly 0, 1, 2, 3, 4, and 5 crits and sum them all up. Mathematically, that would be: $$P(k\leq 5; N, p) = \sum_{k=0}^{5}P(k;N,p) = \sum_{k=0}^5 {N \choose k }p^k (1-p)^{N-k},$$ and at this point we’d probably want to employ a calculator or computer to do the heavy lifting for us. For our example, MATLAB gives us: >> sum(binopdf(0:5,100,0.1)) ans = 0.0576 So we’d have a 5.76% chance of getting less than five crits in 100 casts with a 10% crit chance. Ability Damage Tests Since we started this post discussing a test related to the damage formula of Judgment, I want to make one final note about testing ability damage formulas. Normally, you can get this information from tooltips or datamining. But tooltips can lie, especially during a beta, and sometimes even on live servers. Word of Glory’s tooltip was wrong for almost the first half of Mists of Pandaria, for example. So it can be useful to perform tests to double check them. One of my first tasks every beta is to take data on every spell in our arsenal and attempt to fit the results to confirm whether the tooltips are correct. Your first instinct when performing this type of experiment may be cast the spell a few hundred times and record the average damage, and then repeat at various different attack power (or spell power) values. However, while that works, it’s not always the most accurate (or efficient) way. Hamlet wrote an excellent article about this topic earlier this week, and you should really go read it if you want to understand why an alternative method (which I’ll briefly outline below, since I already had it written) is advantageous. Spells in WoW traditionally have had a base damage range (either by default, or based on weapon damage) and then some constant scaling with attack power and/or spell power. The base damage range was fixed and obeyed a uniform distribution, and accounted for all damage variation in the spell. So it was often more accurate to record the maximum and minimum, and average for each set of casts, and then attempt to fit the maximum and the minimum values separately. This was especially useful for abilities that did some percentage of weapon damage, because one could equip a weapon with a very small damage range (i.e. certain low-level common-quality weapons), at which point it might only take a handful of casts to cover the entire range. I’ve used the past tense here, because they’ve changed how abilities work in Warlords. They no longer have any base damage values, which means that they’ve had to change the method they use to make spell damage vary from cast to cast. I don’t know what they’ve chosen to do about that, because I haven’t had time to test it thoroughly. In my limited time on beta, I’ve noted that some spells don’t appear to vary at all anymore, while others do. For example, Judgment and Exorcism both do the same damage every time they connect, while the healing done by Flash of Light and Word of Glory still varies from cast to cast. Abilities based on weapon damage, like Crusader Strike and Templar’s Verdit, vary according to their weapon damage range. The ones that vary could just use a flat multiplicative effect, such that the spell always does$X \pm \alpha X$for some value of$\alpha$. In other words, maybe it always does$\pm 10%$of the base damage. But it could also be some other method. I’m sure we’ll figure this out as beta goes on (if nobody else has yet), but just keep in mind that this slightly changes the procedure above. You’d still be matching the min and max values, of course, but you’d potentially be looking for scale factors that are, say, 10% larger or smaller than the expected mean value. Coming Soon That wraps up our primer on performing in-game experiments. We could talk in a lot more detail about any of these examples and identify other nuances, tips, and tricks. But this post, while dense, covers the basics one would need to set up and perform an in-game experiment. Most of what we would gain by going into more depth is improvements in accuracy and efficiency. Obviously both of those are good things, but they’re not necessary for your first few attempts at in-game experimentation. In the future, I might write a few shorter articles that are more focused on the nuances involved in a particular kind of measurement, provided there’s interest in the topic. If you have something in particular you’d like me to write about in depth, please mention it in the comments. In the next post, I want to look at how we can use the results of in-game experiments to check Simulationcraft results. Which also means designing and executing “experiments” in Simulationcraft that we can use for comparison. Many of the same basic ideas will apply, of course; for example, eliminating as many variables as possible and making sure you’ve collected enough data. But in this case, we’ll be applying those principles to designing action priority lists and interpreting reports. | | 10 Comments ## TC101: Intro to Theorycrafting On more than a few occasions I’ve been asked some variation of the question, “How do I get started in theorycrafting?” Which is a tough question to answer, since there’s a variety of ways to get started depending on what you’re interested in and what talents or tools you have at your disposal. Someone proficient with spreadsheets might try to write one to model a rotation, for example. But one’s first foray into theorycrafting could be as simple as doing some “napkin math” to compare two talents. For example, my own entry into the world of theorycrafting happened when I took somebody’s prot paladin spreadsheet and translated it into MATLAB code. I wanted to analyze variation with several different input variables (i.e. the oft-misused term “scaling”), which is something that spreadsheets are traditionally poor at doing. Translating the formulas in the spreadsheet into MATLAB code provided two advantages: full text code is generally easier to debug than spreadsheet formulas are, and MATLAB is designed to work with flexible arrays of data in ways that spreadsheets simply aren’t. In the process of performing that translation I learned a lot about the way different spells interact, how some of the different game systems worked, and so on. In a lot of cases I corrected formulas that I discovered were in error, often because I explicitly tested the formula in-game to see if it was right. It was a slow but steady process of learning, testing, and refinement. And once it was done, the learning continued as I started to expand the sorts of questions that I wanted to answer with my code. But when somebody asks, “How do I get started,” they’re not usually thinking about a specific problem. They’re thinking about making the transition from being a person that reads guides and follows the advice given to someone who discovers and creates that advice. Sometimes, the person asking only has a vague understanding of what it means to “theorycraft.” Most players already know that theorycrafting produces numbers that can be used to evaluate performance and ask questions, of course. But what most players don’t know is how those numbers are produced from beginning to end. That’s what I hope to clear up with this series of blog posts. And the first step is to make it clear exactly what the term “theorycrafting” means. What IS Theorycrafting? At its root, theorycrafting is a process called mathematical modeling. We’re trying to take some sort of system – in this case game mechanics – and describe it mathematically so that we can generate predictive results. As with most mathematical modeling, it’s also somewhat directed. In other words, we’re not just doing this for the hell of it; we’re trying to answer specific questions, so our model is built around having the versatility to be able to answer those questions. Generally, that doesn’t happen by spontaneously creating a very complicated model that covers everything. It happens by creating a very simple model and then slowly refining it to include all of the complications necessary to make it accurate. In other words, you don’t start with a BMW. You start with a wheel, and maybe an axle. You put those together and start adding things, one by one, until you do have a BMW. I had a very interesting conversation with Steve Chick a few weeks ago, during which he provided a great flow chart that more or less describes this process: Flow chart that describes the process of learning. Source unknown. This is the basic process of problem solving (and a core part of the scientific method), and it applies equally well to theorycrafting and model creation. Steve and I have differing opinions about the best advice for how the “learn more things somehow” part should be accomplished, but we agree completely on the process. We have a question we’re trying to answer, like “How much DPS does Judgment provide,” and we’re attempting to break it down into smaller pieces that we can answer. The goal is to then put those pieces back together and come up with the answer to our original question. Which means that in a broader sense, theorycrafting is also an exercise in problem solving. Even though they may not quite realize it, the player asking how to get started with theorycrafting is really asking how to obtain the tools necessary to start solving problems on their own. What I hope to provide with this series of blog posts is a little guidance on exactly how to develop and use those tools. Theorycrafting 101 As an example, let’s say that is our question: “How much DPS does Judgment provide?” Let’s break that down as if we were complete newcomers to theorycrafting. First, do we know what “DPS” stands for, and how to calculate it? You, as a seasoned WoW player, laugh at that question. But in reality, it’s not something I’d expect a random WoW player to know. Even if they knew it meant “damage per second,” knowing how to properly calculate it wouldn’t be guaranteed. You’d be surprised how many college-age students struggle with simple ratio metrics like velocity (“meters per second”), current (“charge per second” or “mass per second” depending on whether we’re talking about electricity or fluid flow), or efficiency measures (“miles per gallon”). Let’s say we know the general concept – that we know we want to add up the damage we do in some period of time and then divide the total amount of damage by the length of time. How long a period do we use? Ten seconds? A minute? Ten minutes? An hour? The answer to that depends on not just accuracy, but the details of our rotation. If our rotation is a fixed, repeatable cycle (like CS-J-X-CS-X-J-CS-X-X) then we could plan on using one full cycle to give us the same precision as an infinite amount of time. But if it isn’t, we might have to decide what the cutoff is. Maybe we want to simulate 300 seconds of continuous combat, or maybe we only care about a 20-second window of a fight. Once we decide on the time, we need to figure out how to calculate the total amount of damage done by Judgment in that time. Intuition tells us that will be the average damage of each cast times the average number of casts in our time window. Again, some of that depends on rotation (number of casts). But we also need to know how we calculate the damage done per cast. So we’ve broken the problem down into two smaller problems: 1. How much damage does Judgment do per cast 2. What’s our rotation? • Determines number of casts and time interval (or equivalently, cast rate) And we’ve come up with an equation: $$DPS = ( {\rm Damage Per Cast} \times {\rm Number of Casts} ) / {\rm Time }$$ And note that we haven’t gotten any farther than deciding how to calculate a relatively simple metric like DPS! So now we try and answer each of those questions, and break them down further if we can’t. Let’s take #1 since it’s simpler – how much damage does Judgment do per cast? If we’re a complete newcomer to theorycrafting, we may not know any more than “we press a button and it does some damage.” So we need to figure out how to quantify that. This is the part where Steve and I disagree, by the way. He suggests that you should test it and figure it out yourself. In other words, go into the lab (i.e. in game) and set up an “experiment” to measure that damage and figure out how the game is calculating it. And there are definitely advantages to this approach. Learning is often significantly aided by firsthand experience, which is why laboratory exercises are so common in the sciences. This is, in fact, the approach we’ll use for our example. However, my first instinct is to look things up and see if someone’s done the hard work for me before. I know that I may learn something from the process of designing and carrying out an experiment, especially if I screw something up and have to re-do it (nothing aids learning like painful and/or time-consuming mistakes!). But I also know that it’ll probably be a lot faster to spend a few minutes googling. That may also be a generous way of saying “I’m lazy.” So let’s say we want to set up this experiment. What are we going to test? Or, put another way, what factors change the damage of Judgment? First, we might already know (or guess) that it changes when our attack power changes. We might also wonder if it varies with spellpower. Maybe we’re not sure if it depends on weapon damage, or if it has a base damage value. We do know from experience that it does more damage when we get a critical strike, and that there are a few effects that boost its damage (Glyph of Double Jeopardy, Avenging Wrath, Holy Avenger). So we need to test all of those things, and in some cases how they interact (for example, is Avenging Wrath’s 20% boost multiplicative or additive with Holy Avenger?) before we can put them together. In other words, we’ve just created a bunch of smaller questions to answer: 1. How much damage does Judgment do per cast 1. Does it vary with attack power (and if so, how)? 2. Does it vary with spell power (and if so, how)? 3. Does it have a base damage value? 4. Does it depend on weapon damage (and if so, how)? 5. How much more damage does it do on a critical strike? 6. How often do we get a critical strike? 7. How does Avenging Wrath affect the damage? 8. How does the Glyph of Double Jeopardy affect the damage? 9. How does Holy Avenger affect the damage? 10. How do G, H, and I interact? I’ve cheated a bit here and added (F) because I know there’s a hidden crit suppression against higher-level targets, but a new theorycrafter might not be aware of that fact. Similarly, they might skip test J because they’ve assumed (knowingly or not) that everything is multiplicative (it might be… or it might not be – Blizzard can be inconsistent on that from one effect to another). Both of those are errors that might not show up until a lot later (and with a lot more testing), which is one of the reasons I advocate doing a little reading first. I’ve separated these out because each of these is going to require its own experiment (or at least, its own calculations). So we’ve now got a long list of things to test, each of which is a small component of how Judgment’s damage is calculated. Pretty much all of these are as low-level as we can get, so there’s no point in breaking them down further. They’re each things we can either answer directly (i.e. “A critical strike does 2x the damage”) or measure through experiment and analysis. In the next blog post, I’ll talk in more detail about how we go about designing each of these experiments and putting the pieces together. For now though, I want to go back to the more abstract concept of putting the results together. Let’s say we perform some of these experiments and determine that (note that these are completely made up): 1. Judgment does 1000 base damage. 2. Weapon damage has no effect. 3. Every point of attack power adds 2 damage. 4. Every point of spell power adds 1 damage. 5. Crits do 2x damage, and 6. Crits occur with a probability equal our character sheet crit chance. So we have several small pieces we can put together. We know that ignoring crits, a Judgment will do on average about$1000 + 2\times AP + 1\times SP$damage. To apply the crits, we note that when we don’t crit (a probability of$1-C$, where$C$is our crit chance) we do 100% damage, and when we do crit (probability$C$) we do 200% damage. That gives us a factor of$1.00 \times (1-C)+2.00 \times C = 1 – C + 2C = 1 + C$So our average damage per Judgment is then $${\rm Judgment damage} = (1000 + 2\times AP + 1\times SP) \times (1+C).$$ And there we have it: our first model for Judgment damage. It’s not a complete model, obviously – we’d need to continue to refine it to account for all of the other effects that affect Judgment’s damage. And then we’d repeat this entire process for the rotation tree, and combine those results to create a model for DPS. But that’s the essence of theorycrafting. Start with a simple model, and eventually add more detail and complexity until the model is as accurate as you need it to be. We’ll talk a little more about determining accuracy and tolerances in the next two installments. Simulationcraft Simulationcraft is, as you might expect, just a really big, complex numerical model. And it’s built up in exactly the same way that we built our model for Judgment damage. There are literally thousands of small moving parts within SimC taking care of each of the details that one might care about. For example, there’s an entire system of functions to accurately calculate your hit, miss, dodge, parry, block, and crit chances against a target based on your combat ratings, the target’s base avoidance, block, and crit suppression values, and the level difference between the two of you. Another function takes all of that information and constructs attack tables and performs the rolls that determine whether you hit or miss, whether your attack is a critical strike or not, and whether the attack is blocked (provided it can be blocked at all!). All of that is done with pinpoint accuracy because we have a good understanding of how combat rolls work thanks to years of theorycrafting. Likewise, in the paladin class module, there are special functions that handle things like Hand of Light damage, seal procs, Grand Crusader, and so on. Lots of little moving pieces that each handle one small detail, each one improving the accuracy of the model bit by bit. Which brings us to another statement I see fairly frequently: “I’d like to contribute to Simulationcraft, but I don’t know C++.” It’s true that Simulationcraft is written in C++, and while the intent is that you don’t really need to know it to maintain a class module, in my experience our class modules simply aren’t user-friendly enough for that to be realistic. However, not all contributions to Simulationcraft require coding knowledge. The great part about SimC is that it outputs a report that doesn’t require any programming experience to read and interpret. There are plenty of things that someone can do just by tweaking an action priority list and looking at how the output changes. One way to think about it is that Simulationcraft has several layers. At the top, there’s the “theorycrafting layer,” where you only need the basic knowledge of how to manipulate action priority lists and read the reports the simulation generates. I call it the theorycrafting layer because this is where you try out new ideas for optimizing a character or compare simulation results to in-game testing to check for errors. In the middle, there’s the mechanics layer. This is where the class module developers (i.e. coders) come in, because it’s the layer where the mechanics that we discover in-game get coded into the simulation. But even here, there’s room for non-coders, because we don’t always have class developers that are experts on each class. We have quite a few talented people writing code, but none of them may be experts on your class or spec. But if someone who is an expert on that spec can explain how the mechanics work to a developer, we can support that spec anyway. At the bottom is the core layer, which is all of the under-the-hood subsystems that run the simulation. Things like how events are scheduled and executed and how (and what) data is stored. This layer really does require C++ knowledge, but we have several really dedicated devs that already take care of most of this stuff. While I’m sure they would love help, realistically the greater need is in the top two layers, since that’s the bulk of the work when we’re staring down a new expansion. The point of all of this is that we don’t need a host of C++ gurus to help make SimC better for everyone. We need more people that can properly test and describe the mechanics to a coder, so that the coder can implement those features. In other words, we need theorycrafters more than we need code monkeys. Coming Soon One goal of this series of blog posts is to give prospective theorycrafters a better idea of what they’re getting into. Another is to help them put together the basic toolbox they’ll need to actually start solving problems. Both of those aims are well served by showing actual examples of theorycrafting, like we did with Judgment in today’s post. Not coincidentally, this is exactly the same approach that most introductory textbooks take. As you may have guessed, theorycrafting employs many of the basic techniques that any scientist would learn before going into a laboratory. So the next two blog posts will be focused on developing and understanding common experimental methods. In the second part of this series, we’ll talk more about how to properly design in-game experiments to test and verify mechanics. Then, in the third part, we’ll focus on methods for comparing those in-game results to Simulationcraft results to check for consistency. | | 8 Comments ## Velvet Resolver On Monday, Celestalon kicked off the official Alpha Theorycrafting season by posting a Theorycrafting Discussion thread on the forums. And he was kind enough to toss a meaty chunk of information our way about Resolve, the replacement for Vengeance. Resolve: Increases your healing and absorption done to yourself, based on Stamina and damage taken (before avoidance and mitigation) in the last 10 sec. In today’s post, I want to go over the mathy details about how Resolve works, how it differs from Vengeance, and how it may (or may not) fix some of the problems we’ve discussed in previous blog posts. Mathemagic Celestalon broke the formula up into two components: one from stamina and one from damage taken. But for completeness, I’m going to bolt them together into one formula for resolve$R$: $$R =\frac{\rm Stamina}{250~\alpha} + 0.25\sum_i \frac{D_i}{\rm MaxHealth}\left ( \frac{2 ( 10-\Delta t_i )}{10} \right )$$ where$D_i$is an individual damage event that occurred$\Delta t_i$seconds ago, and$\alpha$is a level-dependent constant, with$\alpha(100)=261$. The sum is carried out over all damaging events that have happened in the last 10 seconds. The first term in the equation is the stamina-based contribution, which is always active, even when out of combat. There’s a helpful buff in-game to alert you to this: In-game tooltip for Resolve, out of combat. My premade character has 1294 character sheet stamina, which after dividing by 250 and$\alpha(90)=67$, gives me 0.07725, or about 7.725% Resolve. It’s not clear at this point whether the tooltip is misleadingly rounding down to 7% (i.e. using floor instead of round) or whether Resolve is only affected by the stamina from gear. The Alpha servers went down as I was attempting to test this, so we’ll have to revisit it later. We’ve already been told that this will update dynamically with stamina buffs, so having Power Word: Fortitude buffed on you mid-combat will raise your Resolve. Once you’re in combat and taking damage, the second term makes a contribution: In-game tooltip for Resolve, during combat. I’ve left this term in roughly the form Celestalon gave, even though it can obviously be simplified considerably by combining all of the constants, because this form does a better job of illustrating the behavior of the mechanic. Let’s ignore the sum for now, and just consider an isolated damage event that does$D$damage: $$0.25\times\frac{D}{\rm MaxHealth}\left ( \frac{2 ( 10-\Delta t )}{10} \right )$$ The 0.25 just moderates the amount of Resolve you get from damaging attacks. It’s a constant multiplicative factor that they will likely tweak to achieve the desired balance between baseline (stamina-based) Resolve and dynamic (damage-based) Resolve. The factor of$D/{\rm MaxHealth}$means that we’re normalizing the damage by our max health. So if we have 1000 health and take an attack that deals 1000 damage (remember, this is before mitigation), this term gives us a factor of 1. Avoided auto-attacks also count here, though instead of performing an actual damage roll the game just uses the mean value of the boss’s auto-attack damage. Again, nothing particularly complicated here, it just makes Resolve depend on the percentage of your health the attack would have removed rather than the raw damage amount. Also note that we’ve been told that dynamic health effects from temporary multipliers (e.g. Last Stand) aren’t included here, so we’re not punished for using temporary health-increasing cooldowns. The term in parentheses is the most important part, though. In the instant the attack lands,$\Delta t=0$and the term in parentheses evaluates to$2(10-0)/10 = 2.$So that attack dealing 1000 damage to our 1000-health tank would give$0.25\times 1 \times 2 = 0.5,$or 50% Resolve. However, one second later,$\Delta t = 1$, so the term in parentheses is only$2(10-1)/10 = 1.8\$, and the amount of resolve it grants is reduced to 45%. The amount of Resolve granted continues to linearly decrease as time passes, and by the time ten seconds have elapsed it’s reduced to zero.  Each attack is treated independently, so to get our total Resolve from all damage taken we just have to add up the Resolve granted by every attack we’ve taken, hence the sum in my equation.

You may note that the time-average of the term in parentheses is 1, which is how we get the advertised “averages to ~Damage/MaxHealth” that Celestalon mentioned in the post. In that regard, he’s specifically referring to just the part within the sum, not the constant factor of 0.25 outside of it. So in total, your average Resolve contribution from damage is 25% of Damage/MaxHealth.

Comparing to Vengeance

Mathematically speaking, there’s a world of difference between Resolve and Vengeance. First and foremost is the part we already knew: Resolve doesn’t grant any offensive benefit. We’ve talked about that a lot before, though, so it’s not territory worth re-treading.

Even in the defensive component though, there are major differences. Vengeance’s difference equation, if solved analytically, gives solutions that are exponentials. In other words, provided you were continuously taking damage (such that it didn’t fall off entirely), Vengeance would decay and adjust to your new damage intake rather smoothly. It also meant that damage taken at the very beginning of an encounter was still contributing some amount of Vengeance at the very end, again, assuming there was no interruption. And since it was only recalculated on a damage event, you could play some tricks with it, like taking a giant attack that gave you millions of Vengeance and then riding that wave for 20 seconds while your co-tank takes the boss.

Resolve does away with all of that. It flat-out says “look, the only thing that matters is the last 10 seconds.” The calculation doesn’t rely on a difference equation, meaning that when recalculating, it doesn’t care what your Resolve was at any time previously. And it forces a recalculation at fixed intervals, not just when you take damage. As a result, it’s much harder to game than Vengeance was.

Celestalon’s post also outlines a few other significant differences:

• No more ramp-up mechanism
• No taunt-transfer mechanism
• Resolve persists through shapeshifts
• Resolve only affects self-healing and self-absorbs

The lack of ramp-up and taunt-transfer mechanisms may at first seem like a problem. But in practice, I don’t think we’ll miss either of them. Both of these effects served offensive (i.e. threat) and defensive purposes, and it’s pretty clear that the offensive purposes are made irrelevant by definition here since Resolve won’t affect DPS/threat. The defensive purpose they served was to make sure you had some Vengeance to counter the boss’s first few hits, since Vengeance had a relatively slow ramp-up time but the boss’s attacks did not.

However, Resolve ramps up a lot faster than Vengeance does. Again, this is in part thanks to the fact that it isn’t governed by a difference equation. The other part is because it only cares about the last ten seconds.

To give you a visual representation of that, here’s a plot showing both Vengeance and Resolve for a player being attacked by a boss. The tank has 100 health and the boss swings for 30 raw damage every 1.5 seconds. Vengeance is shown in arbitrary units here since we’re not interested in the exact magnitude of the effect, just in its dynamic properties. I’ve also ignored the baseline (stamina-based) contribution to Resolve for the same reason.

As a final note, while the blog post says that Resolve is recalculated every second, it seemed like it was updating closer to every half-second when I fooled with it on alpha, so these plots use 0.5-second update intervals. Changing to 1-second intervals doesn’t significantly change the results (they just look a little more fragmented).

Vengeance and Resolve timelines. Boss hits for 30% of tank health every 1.5 seconds, no variation.

The plot very clearly shows the 50% ramp-up mechanism and slow decay-like behavior of Vengeance. Note that while the ramp-up mechanism gets you to 50% of Vengeance’s overall value at the first hit (at t=2.5 seconds), Resolve hits this mark as soon as the second hit lands (at 4.0 seconds) despite not having any ramp-up mechanism.

Resolve also hits its steady-state value much more quickly than Vengeance does. By definition, Resolve gets there after about 10 seconds of combat (t=12.5 seconds). But with Vengeance, it takes upwards of 30-40 seconds to even approach the steady-state value thanks to the decay effect (again, a result of the difference equation used to calculate Vengeance). Since most fights involve tank swaps more frequently than this, it meant that you were consistently getting stronger the longer you tanked a boss. This in turn helped encourage the sort of “solo-tank things that should not be solo-tanked” behavior we saw in Mists.

This plot assumes a boss who does exactly 30 damage per swing, but in real encounters the boss’s damage varies. Both Vengeance and Resolve adapt to mimic that change in the tank’s damage intake, but as you could guess, Resolve adapts much more quickly. If we allow the boss to hit for a random amount between 20 and 40 damage:

Vengeance and Resolve timelines. Boss hits for 20%-40% of the tank’s hit points every 1.5 seconds.

You can certainly see the similar changes in both curves, but Resolve reacts quickly to each change while Vengeance changes rather slowly.

One thing you’ve probably noticed by  now is that the Resolve plot looks very jagged (in physics, we might call this a “sawtooth wave”). This happens because of the linear decay built into the formula. It peaks in the instant you take the attack – or more accurately, in the instant that Resolve is recalculated after that attack. But then every time it’s recalculated it linearly decreases by a fixed percent. If the boss swings in 1.5-second intervals, then Resolve will zig-zag between its max value and 85% of its max value in the manner shown.

The more frequently the boss attacks, the smoother that zig-zag becomes; conversely, a boss with a long swing timer will cause a larger variation in Resolve. This is apparent if we adjust the boss’s swing timer in either direction:

Vengeance and Resolve timelines. Boss hits for 20-40 damage every 1.0 seconds.

Vengeance and Resolve timelines. Boss hits for 20-40 damage every 2.0 seconds.

It’s worth noting that every plot here has a new randomly-generated sequence of attacks, so don’t be surprised that the plots don’t have the same profile as the original. The key difference is the size of the zig-zag on the Resolve curve.

I’ve also run simulations where the boss’ base damage is 50 rather than 30, but apart from the y-axis having large numbers there’s no real difference:

Vengeance and Resolve timelines. Boss hits for 40-60 damage every 1.5 seconds.

Note that even a raw damage of 50% is pretty conservative for a boss – heroic bosses in Siege have frequently had raw damages that were larger than the player’s health. But it’s not clear if that will still be the case with the new tanking and healing paradigm that’s been unveiled for Warlords.

If we make the assumption that raw damage will be lower, then these rough estimates give us an idea of how large an effect Resolve will be. If we guess at a 5%-10% baseline value from stamina, these plots suggest that Resolve will end up being anywhere from a 50% to 200% modifier on our healing. In other words, it has the potential to double or triple our healing output with the current tuning numbers. Of course, it’s anyone’s guess as to whether those numbers are even remotely close to what they’ll end up being by the end of beta.

Is It Fixed Yet?

If you look back over our old blog posts, the vast majority of our criticisms of Vengeance had to do with its tie-in to damage output. Those have obviously been addressed, which leaves me worrying that I’ll have nothing to rant about for the next two or three years.

But regarding everything else, I think Resolve stands a fair chance of addressing our concerns. One of the major issues with Vengeance was the sheer magnitude of the effect – you could go from having 50k AP to 600k AP on certain bosses, which meant your abilities got up to 10x more effective. Even though that’s an extreme case, I regularly noted having over 300k AP during progression bosses, a factor of around 6x improvement. Resolve looks like it’ll tamp down on that some. Reasonable bosses are unlikely to grant a multiplier larger than 2x, which will be easier to balance around.

It hasn’t been mentioned specifically in Celestalon’s post, but I think it’s a reasonable guess that they will continue to disable Resolve gains from damage that could be avoided through better play (i.e. intentionally “standing in the bad”). If so, there will be little (if any) incentive to take excess damage to get more Resolve. Our sheer AP scaling on certain effects created situations where this was a net survivability gain with Vengeance, but the lower multiplier should make that impossible with Resolve.

While I still don’t think it needs to affect anything other than active mitigation abilities, the fact that it’s a multiplier affecting everything equally rather than a flat AP boost should make it easier to keep talents with different AP coefficients balanced (Eternal Flame and Sacred Shield, specifically). And we already know that Eternal Flame is losing its Bastion of Glory interaction, another change which will facilitate making both talents acceptable choices.

All in all, I think it’s a really good system, if slightly less transparent. It’s too soon to tell whether we’ll see any unexpected problems, of course, but the mechanic doesn’t have any glaring issues that stand out upon first examination (unlike Vengeance). I still have a few lingering concerns about steady-state threat stability between tanks (ironically, due to the removal of Vengeance), but that is the sort of thing which will become apparent fairly quickly during beta testing, and at any rate shouldn’t reflect on the performance of Resolve.

## Cumulative Loot

Earlier this week Blizzard published a Dev Watercooler describing the changes in raiding in Warlords of Draenor. I don’t think anything in this article was news, in that all of these changes had been announced at Blizzcon. The major addition was a detailed discussion of the rationale behind the changes.

But this post isn’t about dissecting that discussion – I agree with pretty much everything Ion wrote in regards to the “why” of the changes. Instead, I want to revisit a topic that we’ve touched on before: raiding, burnout, and loot.

The key points of the watercooler article that are relevant to us are these:

• LFR, normal, heroic, and mythic raids are on separate lockouts. In other words, you can run each one for loot each week.
• LFR, normal, and heroic are flexible-size loot-based lockouts, which means you can run them as many times per week as you like, but you’ll only get loot from the boss the first time you kill it on each difficulty.
• Mythic is a fixed-size boss-based lockout, meaning that it works just like MoP normal/heroic raid lockouts do. Once you kill a boss, you get an instance ID and you’re stuck with that instance ID all week.
• LFR will likely not contain set items and specific highly-sought-after trinkets in order to prevent heroic/mythic raiders from feeling like they need to run LFR.

Again, most of this is not news – the last bit is the only tidbit we didn’t already know last November. However, the watercooler triggered a lot of the same negative reactions that were elicited after the announcement at BlizzCon.

In particular, raiders complained that in order to remain competitive, they would feel pressure to clear the same instance several times a week on different difficulty levels to maximize loot income. This in turn contributes to higher burnout rates amongst those raiders and a less fun experience. Our own Anafielle has been one of the more vocal people involved in this debate, even as far back as the early days of LFR.

Why Should Blizzard Do Anything?

You could argue (and many people have) that this is a self-inflicted problem. That these hardcore players are victims of their own inability to set boundaries, and that they just have to learn to manage their time better. I don’t think that’s a reasonable response, because it glosses over a lot of subtleties about the differing motivations gamers have, how we approach games, and the behavioral psychology involved in playing a game. I also think it incorrectly assumes that this is an issue which only affects mythic raiders.

Some players simply cannot enjoy a game unless they feel they’re doing everything they can to advance their character. This isn’t a new phenomenon, and it isn’t limited to mythic raiders. I’ve known players who never stepped foot in a heroic (MoP) raid, but still felt this way about their character. It’s sort of the “type A personality” equivalent in gaming, and I think every raider has a little bit of that tendency in them. For some people, it’s the cause of the bulk of the satisfaction they get from a game.

You may recall that I’ve covered this topic once before, when flex raiding came out, so I won’t re-hash all of the arguments for why raider burnout is a legitimate concern. It’s also got strong similarities to the issues raiders had with the valor point grind before the introduction of heroic scenarios. Each of these activities adds a chunk of time that a raider can spend to further their character, raising the bar a little bit higher. And there’s a strong social incentive to do so in most cases. Perhaps your guild explicitly states that they expect it of you, or maybe peer pressure is enough because you don’t want to be “that guy” that’s letting down the team.

So rather than brushing the issue aside with an “it’s not my problem” response, it’s worth considering the situation with a critical eye and asking, “is there a good way to fix this?”

Cumulative Loot

The last time I touched on this topic, I laid out several potential systems that removed or reduced the incentive raiders had to run lower difficulty levels of the same raid. Some of them, like the increased ilvl gap between LFR and Normal, have already become a reality. But the one I want to dwell on today is a system I called the “Cumulative Loot System.”

The idea I’ve liked the most so far is one proposed by Thels. …. In short, when you kill a normal or heroic boss, you also automatically get your personal loot rolls for LFR and/or Flex.  You could imagine various permutations of how this would work; maybe a normal kill gives you your LFR roll, while a heroic kill gives you both LFR and Flex rolls.  But the simplest case is just that you get both rolls on any normal or heroic kill.

The basic premise behind this system is that if you can kill a boss on heroic mode, then the normal and LFR versions are obviously beneath your skill and gear level. There’s no challenge involved in doing so for your raid group anymore, it’s just an arbitrary time sink that’s probably not very much fun. But due to the way loot drops are structured, there may be a significant benefit to doing so thanks to set items and trinkets.

So instead of asking you to dump that time into the drudgery of another instance clear, it just gives you that loot when you kill the boss on heroic in addition to your usual heroic loot.

In other words, the system accepts that it is the game’s fault that it is providing an incentive for you to do busywork. It’s sort of like your professor writing an exam problem that’s a little too hard, and then giving you a bit of a curve to compensate. Not that I’ve ever done that. I’m just saying… some professors might have. At some point in history. Definitely not me though.

When I mentioned this idea on Twitter yesterday, it set off a flurry of retweets, favorites, and responses. So I felt it was worth clarifying some of the details in a place where I’m not limited to 140 characters at a time.

The idea is most succinctly explained via an example. Let’s say my raid group kills the new boss Ogre McOgreton on mythic difficulty. He drops mythic-quality loot just like usual for my raid leader to distribute.  However, at the same time, I get the option to automatically get the results of my personal loot rolls for that boss from heroic, normal, and LFR difficulties. Doing so “consumes” my loot lockout for that boss on each of those difficulties that week.

Note that this isn’t guaranteed loot, because it’s a personal roll. I’m not suggesting the boss drops X additional heroic-quality items for your raid leader to distribute. I’m suggesting that the game makes up to three extra loot rolls for you, using the personal loot system, for each of the lower difficulty levels. Sometimes you might get 3 items from those three rolls (one heroic quality, one normal quality, one LFR quality). Other times you’ll get no items (use your best sarcastic Pat Krane voice and say “Triple Gold! Thanks Blizz!”). But no matter what, your loot lockout for that boss is flagged so that you don’t need to run the lower difficulty levels.

Recall that in Warlords, LFR, normal, and heroic all use loot-based lockouts. So being locked to a certain boss doesn’t prohibit you from joining a new group and killing that boss on that same difficulty again, it just prevents you from getting loot from the boss a second time. So this system doesn’t prevent a player in a guild clearing mythic difficulty from joining his friend’s raid and helping out. It just removes the loot-based incentive to do so, provided that player opted to get their loot during their mythic raid.

It also doesn’t penalize guilds that want to clear a lower difficulty early in the week and attempt a harder difficulty later in the week like a shared lockout system (i.e. MoP normal/heroic) does. If you clear normal (WoD) quickly and decide to give heroic (WoD) a try, great – you’ll get better loot when you kill that first heroic boss. One way to think of it is as a one-way lockout system – it only locks you out of lower difficulties (after giving you the loot, of course!).

There’s one significant modification suggested by Brian Packer that I think really makes the idea shine. He suggested that this system be integrated into garrisons via a follower mission. In other words, if I kill Ogre McOgreton on mythic, it unlocks a follower mission to “retrieve” my extra loot from LFR, normal and heroic. The next time I go back to my garrison, I can tell my follower to go “loot the body” or some such, and the next day he’ll return with my extra loot rolls.

This solves pretty much all of the major problems with the idea, most of which involved UI concerns like “how does this work with loot spec” and “how do I tell the game whether I want to use each roll or not.” It codifies the system as an optional thing rather than automatic, and the follower mission interface can handle the choice of different quest for each difficulty level and loot spec. It also puts a nice linkage between raiding and garrisons without relying on raw power boosts or buffs, so it’s not in any way mandatory.

I’ll also note that it still leaves open the possibility of setting up multiple runs combining mains and alts to more effectively funnel loot to a group of main raiders. In theory, you might still get more efficient loot allocation pulling those sorts of tricks, because you can funnel everybody’s loot “rolls” to the people who need it. But Cumulative Loot does severely reduce the benefit of doing that, simply because the personal loot rolls are guaranteed to be for the spec you want. If you go the “funnel via alts” route, the boss could drop three bows in a raid with no hunters. It basically reduces the reward-to-time-spent ratio of having multiple alt runs to the point that it’s not even worth considering for guilds outside the top 10 or 20.

Summa Cumulative Laude

In the eight or so months since that last blog post I’ve discussed the idea with a fair number of people. The criticisms generally fall into one of two arguments, neither of which holds much merit in the Warlords raiding system.

The first criticism is that it means fewer mythic and heroic raiders will participate in LFR, and that those players are necessary to carry LFR groups kicking and screaming to their eventual loot drops. While this may be at least partially true in the Mists of Pandaria LFR design, the blog post by Watcher explicitly states that it is not the case in Warlords. LFR is being tuned around the expectation that those players are not present. Which also means it’s no longer a limitation to this sort of loot system.

The second and most common response has been, “but that just gives mythic raiders loot they didn’t earn.” But that argument is fundamentally flawed because it’s built on an incorrect assumption.

When you kill a Mythic boss, what do you “earn” exactly? What’s the appropriate reward for doing that? Higher-ilvl loot, obviously, but how much higher and how much more? We’ve seen various different iterations of this in wow’s history, where killing hard-mode bosses rewarded more loot (Ulduar) and/or higher-ilvl loot (Ulduar and everything since). But the amount of extra loot has changed, as has the ilvl gap.

The truth is, the “amount” of those extra rewards is completely arbitrary. It’s whatever Blizzard decides it’s worth. They have an incentive to make it worth enough that people want to engage in all levels of content, of course. But whether you take the idealistic stance that they’re doing it to make the best game possible or the pessimistic stance that they just want to maximize subscriber numbers, either way, their choice is pretty much arbitrary. It is based more on relative ilvl gaps and power increases than on some nebulous idea of “mythic boss A is X% harder than heroic boss A, thus should give Y additional loot.” And in fact, as we’ve seen, one of the factors that goes into the determination of those ilvl values is how much incentive it gives players to run lower difficulty tiers!

When people suggest that Mythic raiders would be getting gear they “didn’t earn,” they’re making the implicit assumption that such a nebulous connection exists, when in reality, it’s an arbitrary reward. It’s a lot like complaining about having 10 levels per expansion instead of 5 because it’ll take so much longer to level. There’s an implicit (and incorrect) assumption in there that a “level” is some well-defined quantity of experience or time, rather than an amount set arbitrarily by Blizzard to ensure that reaching max level takes around 20 hours (or whatever their target is). And that incorrect assumption means the whole argument topples over under scrutiny.

Cumulat-usions?

I’m not suggesting that this is the only way to address raiders’ concerns of multiple loot lockouts. But out of all of the solutions I’ve seen, this one seems to have the most positives and fewest negatives. As I mentioned in August, it’s got plenty of additional benefits:

• Since everyone gets extra loot, it feels good.  It feels like a bonus, whereas the traditional shared difficulty lockout feels punitive and restrictive.
• It makes it clear that the real reward is time – specifically, time you don’t have to spend mindlessly clearing the same instance and can spend on other things.
• It eliminates worries about LFR or normal mode loot being attractive to mythic raiders, which means that LFR and normal loot can be significantly better (i.e. a smaller ilvl gap). That makes LFR and normal raiders happy.

It’s pretty rare to stumble across a system that works this well without any major downsides. And yet, here it is. I’d also like to point out that I shouldn’t get most of the credit for the idea. It first came up in discussions with Thels on maintankadin, and of course Brian Packer gets all the credit for the exceptional idea of tying it into garrisons.

What I like most about the system is that it’s intuitive. If I clear a challenge mode in time to get the gold achievement, I don’t need to go back and clear it again to get the silver one. It’s clear from my accomplishment that I can do that, so the game doesn’t ask me to go back and spend another 20 minutes proving it. There’s no reason raiding can’t work the same way.

From a skill perspective, it’s sort of like performing a track and field event. If I can clear a 4′ hurdle track, it’s  pretty clear I can clear a 3′ one, or a 2′ one. There’s little point in making me re-run those to “prove” anything – it’s not a test of my skill at all at that point, it’s just another chunk of time I need to spend to clear a trivial hurdle (yes, I just used a hurdle metaphor in a hurdle analogy). Cumulative Loot just builds that into the reward structure. It says “sure, here’s the loot from all the lower difficulty levels that we know you can clear, great job on the mythic kill.”