## Updated Diminishing Returns Coefficients – All Tanks

A few weeks ago, Taser contacted me about calculating the avoidance diminishing returns coefficients for monks to greater accuracy.  As you might remember, we managed to do this for paladins a little less than a year ago by using an addon called Statslog.  Prior to that post, we had already determined coefficients for PaladinsDruids and Brewmaster Monks, but via a less rigorous data collection method (the character sheet) that gave less accurate estimates of the true value of these constants.

For example, we were able to peg the k-factor for monks at $k=1.422$, but the dodge and parry caps had a much wider range of possible values.  The best we could do for those were:
$C_p = 91 \pm 7$
$C_d = 505 \pm 25$

That’s not super-accurate, and while they give reasonable enough results, it’s always nice to have better accuracy.  We’re also aided by the fact that we now know the exact conversion rates of strength to parry and agility to dodge for all five tanking classes, which eliminates some unknowns.

I asked Taser to collect a data set for me using Statslog so that I could perform a more accurate analysis on monks, and he graciously agreed.  After calculating the results and updating Simcraft, I realized that we may as well cover the other three tanks, so I collected data sets for druids, death knights, and warriors.

I also spent some time updating my methods for performing these fits.  In the past, calculating these fits generally meant spending a bunch of time in MATLAB hand-arranging the data and then working with the curve fitting toolbox GUI.  It also means that every time I sit down and do these calculations, I end up with slightly different methods.  Thus, I had a folder full of MATLAB files from different months, all doing the same thing in slightly different ways.

I decided it was time to streamline, so I cleaned up all of that mess and built an efficient system that I can use for all five classes.  Now all I need to do is change the text file containing the data, and the various functions I’ve written to parse, arrange, and fit the data do all the hard work for me.

Monk

This is Taser’s data, which was rather extensive.  He was very clever and used the Steadfast Talisman of the Shado-Pan Assault to achieve incredible coverage of the dodge surface, which led to an excellent fit.  Below are the plots, followed by the fitting data.  After the goodness-of-fit information, I’ve given the values for $C$ and $k$ to roughly the accuracy that the fit allows.

Monk dodge surface fit generated from Taser’s 5.4 PTR data

Monk dodge surface residuals (deviation of fit from measured values)

##### Dodge Fit #####

General model:
dfit(x,y) = 3+111/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       501.3  (501.3, 501.3)
k =       1.422  (1.422, 1.422)

sse: 1.0609e-010
rsquare: 1.0000
dfe: 151
rmse: 8.3822e-007

C = 501.25348 +/- 0.00032
k = 1.422000108 +/- 0.000000038

Monk parry surface fit using Taser’s 5.4 PTR data.

Monk parry surface residuals (deviation of fit from measured values)

##### Parry Fit #####

General model:
pfit(x,y) = 8+95/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       90.64  (90.64, 90.64)
k =       1.422  (1.422, 1.422)

sse: 4.0761e-011
rsquare: 1.0000
dfe: 151
rmse: 5.1956e-007

C = 90.64244 +/- 0.00014
k = 1.42200013 +/- 0.00000018

As you can see, this method does a LOT better.  We’ve confirmed our $k$ value to more decimal places and have narrowed down the range on $C_p$ and $C_d$ to almost 4 decimal places.

To summarize the results:

$k = 1.422000(13) \pm 0.00000018$
$C_d = 501.253(48) \pm 0.00032$
$C_p =90.642(44) \pm 0.00014$

Monks also get 1% parry from 10000 strength and 1% dodge from 951.158596 agility.

Druid

This is data that I collected using a pre-made character on the PTR.  Thus, I didn’t have access to the valor trinket, but I made up for it by using the free T16 gear (thanks Flaskataur!) and trying lots of reforge and gemming configurations to try and cover as much of the surface as possible.  We also don’t need to worry about parry for the druids, since that’s identically zero.

This time I’ll link the residuals plots, but not show them explicitly.

Druid dodge surface using 5.4 PTR data.

residuals

##### Dodge Fit #####

General model:
dfit(x,y) = 5+99/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       150.4  (150.4, 150.4)
k =       1.222  (1.222, 1.222)

dgof =

sse: 6.4926e-011
rsquare: 1.0000
dfe: 115
rmse: 7.5138e-007

C = 150.375938 +/- 0.000041
k = 1.222000009 +/- 0.000000045

Again, this gives a better confidence interval than our previous attempt.  Last time, we were only certain of the dodge cap to $\pm 0.2$.  This time, we’re accurate to $\pm 0.000041$.  To summarize

$k = 1.2220000(09) \pm 0.000000045$
$C_d =150.3759(38) \pm 0.000041$

Druids also get 1% dodge from 951.158596 agility, and obviously gain no parry from strength.

Death Knight

Again, this is my own data set, using a pre-made character and buying all of the T16 gear.  This time I used both DPS and Tanking sets to try and get wider coverage of both surfaces.

Death Knight dodge surface using 5.4 PTR data.

residuals

##### Dodge Fit #####

General model:
dfit(x,y) = 5+131/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       90.64  (90.64, 90.64)
k =       0.956  (0.956, 0.956)

dgof =

sse: 9.8586e-012
rsquare: 1.0000
dfe: 129
rmse: 2.7645e-007

C = 90.642574 +/- 0.000010
k = 0.956000090 +/- 0.000000018

Death Knight parry surface using 5.4 PTR data.

residuals

##### Parry Fit #####

General model:
pfit(x,y) = 3+209/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       237.2  (237.2, 237.2)
k =       0.956  (0.956, 0.956)

pgof =

sse: 1.9542e-010
rsquare: 1.0000
dfe: 129
rmse: 1.2308e-006

C = 237.18614 +/- 0.00015
k = 0.956000019 +/- 0.000000055

Again, excellent accuracy.  Here’s the summary:

$k = 0.9560000(90) \pm 0.000000018$
$C_d = 90.6425(74) \pm 0.000010$
$C_p = 237.186(14) +/- 0.00015$

I could probably improve the parry fit a little by grabbing more points in the high-parry-rating, low-strength region, but this is good enough to give exact character sheet results.

Death Knights also get 1% parry from 951.158596 strength and 1% dodge from 10000 agility.

Warrior

Warriors took the longest because there’s block to consider.  So I had to gem/gear for strength, then convert to dodge, then parry, then mastery.  And then write another fitting function for the block fit.  However, I’m quite pleased with the results.

Warrior dodge surface using 5.4 PTR data

residuals

##### Dodge Fit #####

General model:
dfit(x,y) = 5+133/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       90.64  (90.64, 90.64)
k =       0.956  (0.956, 0.956)

dgof =

sse: 8.2286e-012
rsquare: 1.0000
dfe: 164
rmse: 2.2400e-007

C = 90.6425465 +/- 0.0000052
k = 0.956000078 +/- 0.000000011

Warrior parry surface using 5.4 PTR data

residuals

##### Parry Fit #####

General model:
pfit(x,y) = 3+206/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
Coefficients (with 95% confidence bounds):
C =       237.2  (237.2, 237.2)
k =       0.956  (0.956, 0.956)

pgof =

sse: 7.6114e-011
rsquare: 1.0000
dfe: 164
rmse: 6.8126e-007

C = 237.186091 +/- 0.000057
k = 0.956000014 +/- 0.000000022

Warrior block curve using 5.4 PTR data. Block only depends on one independent variable (mastery), so it’s not a surface.

residuals

##### Block Fit #####
Warning: Ignoring extra legend entries.
> In legend at 294
In blockFit at 58
In warrior at 28

pfit =

General model:
pfit(x) = 13+1/(1/C+k/round(128*x)*128)
Coefficients (with 95% confidence bounds):
C =       150.4  (150.4, 150.4)
k =       0.956  (0.956, 0.956)

pgof =

sse: 9.6280e-011
rsquare: 1.0000
dfe: 164
rmse: 7.6621e-007

C = 150.37568 +/- 0.00015
k = 0.955999849 +/- 0.000000067

The block fit was sort of irritating, because I found out during data processing that the game doesn’t re-calculate block immediately. I had some data points where I had exactly the same mastery but different block values. After looking at the timestamps and the stat changes, it was clear that this is just a reporting error on the game’s part. For example, I’d find two time-adjacent data points that had different mastery rating values but identical block chances. The next data point has the same mastery rating value as the previous one, but the block chance had finally updated.

It’s curious because I didn’t see this effect with any of the other stats – dodge and parry always updated immediately.  It may be that block is calculated less frequently, or done server-side, or some other oddity.  I’m not really sure.  I ended up omitting these obviously-errant data points before performing the fit.  They were easy to find, since they were all extremely far off of the curve created by the rest of the points.

So, to summarize for Warriors:

$k = 0.9560000(78) \pm 0.000000011$
$C_d = 90.64254(65) \pm 0.0000052$
$C_p=237.1860(91) \pm 0.000057$
$C_b =150.375(68) \pm 0.00015$

Warriors also get 1% parry from 951.158596 strength and 1% dodge from 10000 agility.

Summary Table For All Classes

Since it might be convenient to have everything in one place, here’s a table listing the different coefficients for each class.  The paladin data is from last year’s post.

Since it’s pretty clear that the $k$ values are nearly exact to three digits, I’m going to make the assumption that they are, as it has no significant effect on the results.

Class $k$ $C_d$ $C_p$ $C_b$
Death Knight $0.956$ $90.6425(74) \pm 0.000010$ $237.186(14) \pm 0.00015$ -
Druid $1.222$ $150.3759(38) \pm 0.000041$ - -
Monk $1.422$ $501.253(48) \pm 0.00032$ $90.642(44) \pm 0.00014$ -
Paladin $0.886$ $66.56744(62) \pm 0.0000060$ $237.1860(40)\pm 0.000055$ $150.3759(469)\pm 0.0000094$
Warrior $0.956$ $90.64254(65) \pm 0.0000052$ $237.1860(91) \pm 0.000057$ $150.375(68) \pm 0.00015$

In this arrangement, it’s easy to see that the plate tanks all share the same parry cap $C_p$.  Death Knights and warriors have the same dodge cap $C_d$ and $k$ value, but paladins differ slightly in both departments.  The block cap $C_b$ is the same for both blocking classes.  Druids and monks both do their own thing, though the monk parry cap is identical to the warrior/DK dodge cap.

Here’s a second table listing the strength-to-parry and dodge-to-agility conversions for each class.  This is sort of obvious, since you get 1% avoidance from 951.158596 of your primary stat, and 1% avoidance from 10000 of your non-primary stat, but I’m including it for completeness.

Class Str->Parry Agi->Dodge
Death Knight 951.158596 10000
Druid 0 951.158596
Monk 10000 951.158596
Warrior 951.158596 10000

## Simulationcraft v530-6

Simcraft version 530-6 was released the other day, and it has a whole host of improvements.  You can get it here.  And you can check out my Getting Started guide here.

Bugfixes and New Features

There have been a number of bugfixes since 530-5.  Here are the ones that apply to all tanking classes:

• Vengeance calculations revamped to be more accurate
• Stat weights are now normalized to Stamina for tanks (instead of Strength)
• Fixed base avoidance for all tanking classes (many were too high)
• Fixed level-dependent avoidance modifiers to correct dodge/parry/miss calculations (bosses shouldn’t miss us now)
• Major corrections to attack table calculations, specifically regarding blocks.
• Implemented an “incoming_damage_X” conditional – more on that shortly
• New “Health Gains” pie chart show you your healing breakdown from all sources
• Damage and Healing abilities are now split into separate tables in the Abilities section of the html report
• Command-line option for TMI bosses
• Command-line option for disabling external healing for TMI calculations

And of course, there are quite a few paladin-specific changes:

• Selfless Healer talent implemented
• Sanctified Wrath’s +20% healing bonus for protection implemented
• Shield of Glory duration scales properly with holy power spent
• Holy Prism now triggers its 20-second cooldown and costs mana
• Pre-combat Sacred Shield casting is now supported
• Action Priority List improvements
• Sacred Shield precast before combat
• Shield of the Righteous now uses a shifting queue (SH1)
• HotR removed from PTR single-target APL
• Divine Protection added once again
• Devotion Aura implemented
• T16 set bonus detection implemented
• All PTR changes through build 17252 are implemented

The paladin module should be basically feature-complete for live servers now.

There have also been a lot of fixes to the warrior module.  Below are the ones that apply to protection, most of which have been motivated by Tengenstein‘s feedback.  Two of the other devs (Max and Alex) have been making lots of other changes as well, mostly implementing 5.4 mechanics (which you can enable using the “ptr=1″ option).

• Fixed several bugs with Impending Victory / Victory Rush and Bloodthirst  heals
• Fixed a major bug with Shield Barrier that was causing it to ignore AP scaling
• Second Wind talent implemented
• Deep Wounds damage calculation fixed

And while I haven’t been keeping careful notes on the DK module, it too has seen fairly significant upgrades, thanks in part to Mendenbarr, who’s been interacting closely with one of the other developers (Navv).

I want to briefly go over some of these changes in a little more detail.

Incoming_Damage_X

This is a new conditional for the action priority list that lets you use abilities after taking spike damage.  For example, the line

/shield_of_the_righteous,if=incoming_damage_1500ms>health.max*0.3

will use Shield of the Righteous if you’ve taken more than 30% of your health worth of damage in the last 1.5 seconds.  The time X can be specified in seconds or milliseconds, but has to be an integer.  In other words, incoming_damage_5s and incoming_damage_5000ms both work and will give the identical results, but incoming_damage_4.5s will not. If you want to use fractions of a second, you need to specify in milliseconds (i.e. 4500ms for 4.5 seconds).

The new default action priority list for Protection uses this condition on Sacred Shield to produce the SH1 shifting queue we’ve been using in the MATLAB simulations.

Improved Reporting

The first thing you might notice is that damage and healing spells now each have their own table in the Abilities section. This should make it a lot easier to read, especially for abilities like Light’s Hammer or Holy Prism that do both simultaneously.  It should also be easier to detect bugs, like an ability incorrectly doing damage rather than healing.

Damaging and healing abilities are now in two separate tables in version 530-6.

There’s also a new pie chart in the report that shows your health gains. This chart will show healing received and absorption effects consumed from all sources, including external healers. So not only can you see your own healing breakdown, you can also see how it changes when you add a healer.

The new “Health Gains” pie chart shows you how much healing you received from each source.

Finally, when you simulate stat weights the report will include a link to AskMrRobot that will automatically load your character from the armory and import the stat weights that SimC has generated for the spec that you simmed. This should make it much easier to go back and forth between the two tools to fine-tune optimizations. For example, I’ll often optimize my Ret spec in AMR, then copy the new gear setup into SimC and simulate stat weights. Then I’d transfer those new stat weights back into AMR by hand and re-run the optimization, and repeat the process. This link saves the hand-copying so that each step is only a few clicks.

TMI Options

To make it easier to perform standardized TMI measures, I’ve added a command-line option for TMI bosses. The option tmi_boss=T15H will automatically load the T15H standard TMI boss as your enemy. Just swap T15H with T15N or T15LFR to change which boss you’re up against.

In addition, there are two new options for calculating TMI while healers are present.  The player option tmi_self_only=1 will ignore heals and absorbs from external sources while calculating your TMI, so you can sim with healers and still calculate a non-trivial TMI value.  Note that it will count effects from your own pets (i.e. Bloodworms). The global option tmi_actor_only=1 will enable this mode for all players and bosses in the simulation.

A word of warning: due to the way absorption effects are calculated, this can give some funky results if you have a Disc Priest healing you.  In my own testing, I noticed that often their absorbs would be consumed before Sacred Shield, which can lead to some weird spike behavior.  For example, if several attacks in a row are fully absorbed by Power Word:Shield and other absorption effects, those will be treated as full hits for TMI calculations even if you had an unused Sacred Shield bubble active.  This mode also ignores overhealing, treating all of your Seal of Insight ticks (and all other self-heals) as if they always heal for the full amount.  So don’t be surprised if the results differ when you add a healer, and I’d suggest avoiding the use of Disc Priests.

APL improvements

I’ve made several improvements to the default action priority list. In addition to  implementing the shifting queue, I’ve also set it up to pre-cast Sacred Shield and added Divine Protection back into the rotation. In a later patch I’ll probably add GAnK and Ardent Defender and set up conditionals so that we chain cooldowns intelligently instead of blindly stacking them all at once.

Bug Reports

I think that the prot module is fully-functional for 5.3 mechanics now, but it’s not feasible for me to test every possible combination of glyphs, talents, actions, and gear. So I’m sure there are still bugs, though hopefully far fewer than the previous builds. That’s where you come in. The more people actively using SimC to test their character, the more likely we’ll stumble across those bugs and fix them.

While it’s fine to discuss potential bugs in the comments here, it’s actually far easier for me and the other devs to manage the process of verifying and correcting bugs if they’re submitted through the Issues system.  You’ll need a Google code or Gmail account (I think), but other than that it’s fairly painless. If possible, including the .simc and/or .html files demonstrating the problem is a big help too.

So if you find some spare time this week, please try importing your tank and running some simulations, and let me know if you see anything funny.  I’m sort of curious to see whether warriors and death knights are able to achieve much more competitive TMI scores now that their modules have been improved.

## Slinging Shields in Slo-Mo

Today’s post is just a quick one, since I’ve been really busy with SimC work this week.  However, it’s something that is a little more immediate, namely the Sacred Shield global cooldown (GCD) bug we’ve been struggling with all expansion.

For those that don’t stack haste, or are otherwise unaware of the bug: Sacred Shield is ostensibly a spell (it would be tough to argue that it’s a melee attack). Most spells in the game trigger a hasted GCD – that is to say, the GCD length is reduced by your spell haste.  However, on live servers this isn’t the case, as Sacred Shield incurs a full 1.5-second global cooldown.  The same bug affects two of our level 90 talents as well: Execution Sentence and Light’s Hammer.  I would have guessed that it’s an issue with spells that are granted through talents, but curiously neither Holy Prism nor Hand of Purity exhibit the behavior.

While it’s mostly a quality-of-life issue, it’s extremely jarring to be cruising along with nearly 1-second GCDs and then all of the sudden hit a 1.5-second GCD with one of these three skills.  It’s like a sudden, poorly-marked speed bump on your rotational highway.

Unfortunately, the bug still hasn’t been fixed on the PTR, at least as of build 17227.

Testing

To demonstrate, I performed the following tests on the PTR.  I used my own character in protection spec, but wearing high-haste ret gear.  In that gear I was able to get 32.32% melee haste and 45.55% spell haste with Seal of Insight active.  At those haste levels, the global cooldown should be:

Melee GCD: 1.50/1.3232 = 1.134 seconds
Spell GCD: 1.50/1.4555 = 1.031 seconds

To get a numerical measurement of the GCD, I used Gnosis Castbars, which has a GCD monitor showing the remaining duration on the GCD.  I tweaked the settings to make the text large so that we could easily see it.  Then I captured a video of me wailing on a dummy with Fraps set to 60 frames per second so that I could go back and step frame-by-frame to see the maximum number Gnosis displays.

This is the annotated video showing the effect, running at 1/4 speed (15 FPS):

Since it’s hard to see the instant that the GCD starts even when played back at 1/4 speed, here are the observed GCD times in the video by stepping frame-by-frame:

Ability GCD
CS 0.94
J 0.92
HW 0.90
CS 0.76
J 0.78
SS 1.27
CS 0.90
J 0.90
HW 0.78
CS 0.83
SS 1.28
J 0.90

There’s clearly some delay involved between the GCD being registered and Gnosis displaying the number, as the CS casts are only being shown almost 200 ms after they’re happening (i.e. the countdown timer should start at 1.134, but Gnosis doesn’t get around to displaying them until 0.94).  In extreme cases, it’s delaying as much as 400ms (the 0.76 CS cast).  But note that this is an asymmetrically distributed error – it can only make the numbers smaller, not larger.  So we can be confident that the largest number we see is a minimum bound for the length of the GCD.

And it’s clear from the data that Sacred Shield is an outlier.  We never see a GCD time above 0.95 seconds for Crusader Strike or Judgment (melee), or above 0.90 seconds for Holy Wrath (spell).  Those are consistent with what we expect after accounting for ~200 ms of display lag, give or take.  But Sacred Shield is clocking in at about ~1.3 seconds.  Once you include the display lag, that gives us our full 1.5-second GCD.

Also note that this cannot be a display bug, as the GCD for Sacred Shield should never be above 1.03 seconds if it was a spell or 1.134 seconds if it were affected by Sanctity of Battle.  It would only ever show a time shorter than those two, not >100 ms longer.

I’ve also tested Execution Sentence and Light’s Hammer on the PTR, and both are giving full 1.5-second GCDS:

Execution Sentence triggers a >1.30 second GCD.

Light’s Hammer triggers a >1.30 second GCD

Again, Holy Prism and Hand of Purity seem to be working properly.  The highest I was able to achieve with either on the PTR was about 0.85 seconds, which puts them both solidly within “hasted GCD” territory.  Less quantitatively, both of them felt like hasted GCDs when I tried to perform a full rotation.

Conclusion

It’s clear from this testing that these three abilities aren’t exhibiting hasted GCDs yet.  What’s not clear is, “why?”  I don’t think it’s the result of a conscious choice on the part of Ghostcrawler & co.  For one thing, treating these abilities differently than most other spells in the game doesn’t make much sense.  And doing so doesn’t solve a significant balance problem, nor does it have a significant impact on the value of haste.  And it’s certainly not a deciding factor in choosing our level 45 talents.  It’s mostly just a minor annoyance, not a creative way to fight back against the haste machine.

No, more than likely it’s probably just an oversight, especially given that Holy Prism and Hand of Purity are properly affected by haste.  My guess is that the three affected abilities went through more (or fewer) iterations in the Mists of Pandaria beta, and along the way someone just forgot to flip the “GCD affected by spell haste” switch on them.

But with more paladins reaching the 50% haste mark in 5.4, this minor annoyance will become even more noticeable.  So it would be a nice quality-of-life buff if it were to be fixed.  Hopefully there’s still time to influence that change before 5.4.

Posted in Tanking, Theck's Pounding Headaches | Tagged , , , , , , | 24 Comments

## Simulationcraft 101: Getting Started

As I mentioned on twitter last week, version 530-5 of Simulationcraft has been released.  This is the first version to include the Theck-Meloree Index, a damage smoothing metric we developed in a series of previous blog posts.

However, Simcraft is a bit daunting to some players.  The program is very versatile, but that also means there are lots of options, and it can be confusing to understand exactly what’s going on.  Sometimes, it helps to have someone guide you through the process.

This is the first in a series of “Simulationcraft 101″ blog posts designed to do exactly that.  The hope is that by the end of this blog post, a new user will be able to:

• Import their character from the armory
• Run a quick test simulation and calculate their TMI
• Generate stat weights using the new TMI metric

In future installments, we’ll break down some of the options in more detail and talk about how to interpret the myriad of statistics Simcraft provides in its reports.

Disclaimers

First, I want to start off with some disclaimers.  Simulationcraft is not perfect, and the paladin module especially so.  While I have most things implemented, there are a number of bugs that have slipped through into 530-5.  Some of them have already been fixed for 530-6, but I’m sure there are some I haven’t even discovered yet.  One of the goals of this blog post is to get more people running the program and simulating their characters so that the remaining bugs can be identified and corrected.  So if you get funny results, please share them with me.  Either post in the comments here, upload the output HTML files and send me a link, or give me the contents of the Simulate window text via a pastebin link so that I can verify the results.

Here are the paladin-specific bugs that I am aware of in 530-5, as well as a few talents that aren’t properly implemented:

• Selfless Healer isn’t implemented
• Sanctified Wrath’s +20% healing received bonus isn’t implemented
• Shield of Glory (T15 2-piece bonus) duration is fixed at 5 seconds rather than scaling with Holy Power spent
• Holy Prism’s cooldown is not being invoked, allowing it to be spammed every GCD
• Vengeance gain is a little wonky, though it should have little effect on TMI results

Outside of those, I think everything is working.  The only other thing that’s missing is the ability to perform shifting queues (i.e.SH1), which is something I’m working on for 530-6.  For now, we’re limited to a simple SotR spam queue.

The first step is to obtain and install the program.  To do that, we go to Simulationcraft’s Google Code page and click on the Downloads tab.  Pick the appropriate Windows, Linux, or Mac download for your system.  If you’re not sure about whether you have 32-bit or 64-bit Windows, just grab the 32-bit one to be safe.  Save this file somewhere convenient.

After downloading the file, you’ll need to unzip it to a location.  This can be anywhere you like, but since the program doesn’t need to be installed in the traditional sense, you may as well choose the final location you want to put it.  In our example, I unzip it into D:\Simcraft\

Unzipping the files to D:\Simcraft\. I have several earlier versions in this folder, you’ll probably only have simc-530-5-win32.

If you’re using Linux, you’ll have to build the program yourself.  I’m not going to provide instructions for that here; the Google code wiki has fairly clear instructions on how to do this if you need them.

To run the program, we open the \simc-530-5-win32\ folder and run SimulationCraft.exe:

The SimulationCraft.exe file. Run this.

which brings up the SimulationCraft GUI.  If you’re on Linux or Mac you’re on your own for this step, as I’m not sure what the files are called off the top of my head.

You navigate the GUI by moving between tabs.  The “Welcome” tab has a pretty good introduction to the overall layout if you’re interested, but we’re going to skip around to quickly get our Sim on.  From the top tab menu, choose “Import.”  This opens a set of sub-tabs with different options for importing.  You can import directly from the battle.net armory in addition to other sources.  For this example we’re going to use the armory.

This interface should look fairly familiar – it’s literally the armory webpage loaded in a browser, complete with an address bar at the very bottom.  You can navigate it as usual to find your character (if you’re EU, change the URL in the address bar first).  Once you do, hit the “Import” button at the bottom right.  Make sure you’re in protection spec, though!

The Import screen. Click the “Import” button on the bottom right after finding your character.

When you click Import, Simcraft will grab your character information and generate a simulation file from it.  This is displayed in the Simulate tab.  SimC will automatically use the default action priority list that I’ve programmed into it, so you don’t need to tweak this tab at all.

The Simulate screen.

There is a big “Simulate” button at the bottom right of this screen.  Don’t push it.

….

You pushed it, didn’t you.  All right then, let’s just see what it spits out.

Simulation Reports

This is the html file Simcraft produces when I sim Theck:

Let’s briefly look at a few features of this report.  First, this is the section containing the broad overview of the results:

Results section of the report.

The first line gives the character name and a bunch of information: our DPS, DTPS, and TMI score.  The tables under “Results, Spec, and Gear” give us a more detailed breakdown of these quantities, including error estimates.  This sim was only 1000 iterations, which is the default size, but in a few minutes we’ll see how we can increase that to improve accuracy.

The next section contains a bunch of charts showing damage per execute time (DPET), DPS and Vengeance timelines, damage source breakdown, and a slew of other statistics in chart form.

The Charts section. This is the part that always makes the ladies swoon. Ladies love graphs.

Further down the report are breakdowns of ability usage, buff uptimes and details, resource gains and losses, even more charts, proc counters, and then a bunch of statistics.  We’re going to skip over the rest of that for now, because for today we’re only interested in calculating TMI and smoothness scale factors.  To do that, we need to change some of the options.

The Options Screen

Go back up to the top tab bar and choose “Options.”  It should bring up a set of sub-tabs, with the “Globals” sub-tab displayed:

The Options tab, Globals section

There are a lot of choices here, some of which are obvious and some of which aren’t.  We’ll explore all of the choices here at a later date, but for now, we want to make the following changes:

• Iterations – increase to 10k or higher.  Larger numbers of iterations give more accuracy, but also take longer.
• Threads – This increases the number of threads SimC can use to run the simulation, which increases simulation speed.  If you have a quad-core processor or higher, set this to 4.  If you have a dual-core, set it to 2.  If you’re not sure, leave it at 1 and plan on grabbing a drink while the program simulates.
• TMI Standard Boss – this drop-down lets you select one of the standardized TMI boss configurations.  Pick the option that’s most appropriate to the content level you usually play at.  All of the standard bosses assume 25-man raiding (i.e. T15H hits as hard as Lei Shen does on 25-man heroic mode), so you may want to drop back one category if you’re a 10-man raider.  The “custom” option uses the SimC default.

Next, we want to shift over to the “Scaling” sub-tab.  This has all of the different options for testing scaling:

The Scaling Options tab.

As you can see, I’ve checked the boxes for Strength, Stamina, Expertise, Hit, Crit, Haste, Mastery, Armor, Dodge, and Parry.  Most importantly though, at the very bottom, I’ve changed the Scale Over option to “tmi” to tell Simcraft that I want scale factors based on the Theck-Meloree Index.

Ok, now we’re ready.  Hit the simulate button again.  Note that with this many stats, it may take a while unless you’re using 4+ threads.  With 10k iterations and 4 threads, it takes about a minute on my i7-2600k.  Here’s the result:

In addition to a more accurate estimate of my TMI (because we used more iterations), I now have a new chart in the Charts section:

Scale factors generated for Theck using 10k iterations.

These are my smoothness scale factors, complete with handy error bars to tell us how confident SimC is about those values.  If we increased the number of iterations to 25k or 50k, we’d get even smaller error bars and a better estimate ofeach scale factor.

Note that these scale factors are all negative.  That’s because TMI uses “golf rules,” meaning that a lower score is better.  So the scale factors are negative because, for example, each point of haste reduces my TMI score by about -1.  On the other hand, critical strike rating has almost no effect on TMI, which is reassuring because it shouldn’t have any effect on damage smoothing.

Note that unlike what we usually do here on Sacred Duty, these scale factors are not normalized for itemization.  In other words, this is directly comparing 1 stamina to 1 haste to 1 armor, and so forth.  So for example, with these stat weights a stamina trinket would hold more value than a haste trinket because the weights are pretty close but you get 50% more stamina on a trinket than you get haste.  On the other hand, a haste gem would be worth more smoothing than a stamina gem, because you get 33% more haste than stamina on gems.

Saving and Exporting

You may have noticed that there’s a nice “Save!” button at the bottom right of the report tab, so that you can save these results for future reference.  Unfortunately, the button doesn’t do anything.  Oops.  This bug should be fixed in 530-6.  For now, though, you’ll have to save the results manually.  You can do that by going to the \simc-503-5-win32\ folder and finding the “simc_report.html” file, which is your latest simulation result.  You can rename that file to something memorable (like “theck_10k_scale.html”) to save the results.

You can also automatically export the results to a number of websites.  If you check under the “Results, Spec, and Gear” section of the report, you’ll notice a new table containing scale factor information:

The scale factors table, which contains normalized scale factors as well as links to export these scale factors to various websites.

As you can see, this table contains both the unnormalized scale factors as well as a normalized (i.e. positive) set that you can use in gear ranking websites.  In 530-5 these are normalized to strength, but in 530-6 they’ll be normalized to stamina (since it’s sort of silly to normalize scale factors to strength for the agility tanks).

Rough Rules of Thumb

Without a little context, it’s tough to make heads or tails of the TMI number that Simcraft puts out.  For example, is a TMI of 10k good or bad?

The answer to that is somewhat relative, of course.  It doesn’t matter what your TMI is if all you care about are scale factors.  But the intent is that, when using the standard boss of the appropriate content type for your gear level, you *should* get a TMI value of around 5k-10k.  As you start to overgear a tier of content, your TMI should go down

For example, the T15H boss is expecting an ilvl of about 535.  But I have an ilvl of 546, so I already overgear the T15H boss.  Thus, my TMI is relatively low at around 2-3k.  If I compared myself to the T15N boss, it would be even lower (around 700), which would make the stat weight calculations a lot more sensitive to noise.  So it’s generally good to sim against the boss that most closely approximates your gear level, and when in doubt, aim high.

If I sim Rhidach, who has an ilvl of 527, I’d probably want to use the T15N boss, because that boss expects an ilvl of ~522.  If I do that, I get a TMI of around 5300, which is about right because he’s starting to overgear normal content.  If I sim him against the T15H boss, though, I get a TMI of around 67k, about an order of magnitude worse!

The reason the difference is so large is that the metric is normalized to the paladin’s health.  Poor Rhidach only has about 720k hit points fully buffed, so if you pit him against a boss that can melee for 340k after armor mitigation, he’s in danger of death from 2 full melees plus a stiff gust of wind, or even a full melee plus some blocked/mitigated attacks.  The 6-second moving average is going to include a fair number of these events clocking in at 120% or more of his health, and since the weight function is exponential in percentage health, they cause a significant increase in score.

This is by design of course – the point is to heavily penalize large spikes that put you in danger of death, and a 120% health spike certainly fits that description.  Note that the scale factors are still going to be pretty similar, though.  The relative rankings will be the same, though the values may shift around some because not all stats scale similarly with boss hit size.  So even if you get a TMI score in the millions, the stat weights will still be very reliable.

Finally, note that I’ve only tested this extensively for paladins.  A warrior, DK, druid, or monk tank at a similar ilvl may not fall into the 5k-10k range that we do.  Again, that’s by design, because we don’t want to normalize across tanking classes.  If a DK takes spikier damage intake than a paladin, that’s something we want to know, and TMI should properly reflect that by giving us a larger value.

This was a quick-and-dirty introduction to Simcraft.  I plan on going more in-depth about many of the options and reported statistics in later blog posts, but if you don’t feel like waiting, the Simulationcraft Wiki has lots of information on how to get started, tweak options, and interpret results.

Also note that the simulation output is only as good as what you put into it.  I think the prot warrior module is fairly complete, but I’m not sure about the DK, druid, or monk modules.  The DK module in particular suffers from the lack of a good “recent damage taken” conditional, because it limits them to spamming Death Strike rather than reacting to large health changes.  Since that’s very similar to the sort of information we would like to have for shifting queues, I’m working with another dev to get an action priority list option implemented for that in the near future.

In addition, for the more technically-minded, I’ve written up a TMI Standard Reference Document.  This outlines the official calculation method and specifies standard conditions for comparing TMI between different tanks.  Since this is the initial version of the SRD, feedback on the details is greatly appreciated.

## Blood, Toil, Tears, and Threat

A few days ago, my friend Llarold asked me if I had done any calculation about the new threat bonus on taunts in 5.4.  In the past, I had worked out a rough formula for how long it took to lose aggro after taunting in the past (roughly the duration of the encounter divided by five).  So we were both curious how the threat buff affects that estimate.

The first thing to do was to verify how the threat buff works.  It claims to “increase threat that you generate against the target by 200% for 3 seconds,” but doesn’t make it clear whether that’s multiplicative or additive with the +400% threat modifier granted by Righteous Fury.  In other words, if we normally get $\text{damage}\times 5$ threat, will the buff give us $\text{damage}\times 7$ or $\text{damage}\times 15$?

So, I hopped on PTR to test.  Below are the test results from two consecutive Holy Wrath casts, which do a fixed amount of damage.  The first is before taunting and the second immediately after taunting:

Damage Threat Threat_Diff Threat_Diff/Dmg
35k 174k 174k 5.0
35k 698k 524k 15.0

There’s no ambiguity in that data, we get $\text{damage}\times 15$ during the buff.

Next, we want a mathematical model.  To model threat on taunts, we make a few simple assumptions.  First, we assume that threat generation is continuous and uniform so that we can easily integrate.  In reality it’s discrete and uneven, but if we averaged over all possible swing timer offsets it would be roughly continuous, and it makes the math easier.

We’ll also ignore base threat output and assume all output comes from Vengeance AP.  Again, this makes the math simpler, and doesn’t really break much since both tanks are presumably generating roughly the same amount of threat in the absence of Vengeance anyway, so it would mostly cancel out.

We’ll let $T_1(t)$ describe the threat of tank #1 as a function of time, and $T_2(t)$ will describe the threat of tank #2.  At time $t=0$, tank #2 taunts off of tank #1.  At that instant, both tanks have $T_0$ threat, and we’ll let $G_1=G$ represent the rate at which tank #1 is generating threat.  Tank #2 generates threat at a rage $G_2$, which we’ll define shortly.

Under those general circumstances, the threat as a function of time for each tank can be represented as follows:

$\displaystyle T_1(t) = T_0 + \int_0^t G dt’$
$\displaystyle T_2(t)= T_0 + \int_0^t G_2 dt$

This has all the makings of a basic kinematics problem.  Normally you’d be concerned with velocity and acceleration; here we’re concerned with constant threat generation rates (velocity) and time-dependent threat generation rates (acceleration).  Though as we’ll see shortly, the form of the acceleration is very different here than it is in kinematics.

There are two ways we can go about solving this problem.  The first is to ignore acceleration entirely.  What that means is that after taunting, tank #2 gains no more Vengeance from the boss.  This models the “worst-case” scenario, where you taunt and the boss decides to turn and cast something, or tank #1 gets a number of lucky crits at exactly the wrong time.

The other way is to try and model the Vengeance ramp-up you get after taunting, which massively complicates the problem.  However, the results are sort of interesting, so we’ll tackle that problem as well.  First, though, let’s do the easy version, starting with the “before” case (i.e. 5.3 mechanics) and then the “after” case (5.4 mechanics).

Without Acceleration – Before

In the “no acceleration” case, the equations are very easy.  When tank #2 taunts, he gets 50% of tank #1′s Vengeance, and thus is generating threat at a rate $G_2 = G/2$.  So our “equations of motion” for this threat system are:

$\displaystyle T_1(t) = T_0 + G t$
$\displaystyle T_2(t)= T_0 + G_2 t = T_0 + \frac{G}{2} t$

Tank #1 will pull threat if he exceeds 110% of tank #2′s threat, which is mathematically expressed like this:

$T_1(t) \geq 1.1 \times T_2(t)$

If we plug in our expressions for $T_1(t)$ and $T_2(t)$, we can solve this equation to find the time $t$ at which tank 1 pulls threat:

\begin{align} T_0 + G t &\geq 1.1 \left ( T_0 + \frac{G}{2} t \right ) \\ 0.45 G t &\geq \frac{T_0 }{10} \vphantom{\frac{G}{2}} \\ t &\geq \frac{T_0 }{4.5 G} \vphantom{\frac{G}{2}} \end{align}

Let’s make another simplifying assumption, namely that the initial threat at the time of taunting ($T_0$) is linearly dependent on how long the fight has been going on.  In other words, tank #1 started generating threat at rate $G$ from the very beginning of the pull, which is the same as assuming there was no ramp-up time on their Vengeance.  This is actually an over-estimate, which means our $T_0$ value will be a little higher than it would be in reality; thus our time $t$ will be a slight over-estimate as well.  But given that assumption, $T_0 \approx G \tau$, and we get the final expression for $t$:

$\displaystyle \large t \geq \frac{\tau}{ 4.5}$

Now let’s see what happens after the 5.4 buff.

Without Acceleration – After

In 5.4, you will generate +200% threat for the first three seconds after you taunt.  Thus, our threat generation rate $G_2$ changes to a piecewise function:

$\displaystyle G_2 =\cases{\frac{3G}{2} & \text{if }t \leq 3 \cr \frac{G}{2} & \text{if } t > 3 }$

The threat for tank #2 can still be expressed by $T_2(t) = T_0 + \int_0^t G_2 dt’$, but now we have to split that integral up when we cross over the $t=3$ boundary.

First, let’s consider what happens if $t<3$.  Our threat equation for tank #2 is then

$\displaystyle T_2(t) = T_0 + \frac{3G}{2} t$

And if we solve our inequality we find:

\begin{align} T_0 + G t &\geq T_0 + \frac{3G}{2}t \\ G t \left (1 – \frac{3.3}{2} \right ) &\geq \frac{T_0}{10}\end{align}

We can actually stop there, because this inequality literally cannot be satisfied.  $G$, $t$, and $T_0$ must be positive values, so we have an inequality that requires a negative number be greater than a positive number.  Thus, even without the fixate effect, you can’t lose aggro after a taunt in this continuous model.

What if $t>3$?  Then we split our integral up into two parts as follows:

\begin{align} T_2 (t) &= T_0 + \int_0^3 \frac{3G}{2} dt’ + \int_3^t \frac{G}{2} dt’ \\ &= T_0 + \frac{9G}{2} + \frac{G}{2}\left ( t-3 \right ) \\ &= T_0 + 3 G + \frac{G}{2} t \end{align}

and when we solve our inequality we find:

\begin{align} T_0 + G t & \geq 1.1 \left ( T_0 + 3 G + \frac{G}{2} t \right ) \\ T_0 + G t & \geq 1.1 T_0 + 3.3 G + 0.55 G t \vphantom{\frac{G}{2}}\\ G t \left ( 1 – 0.55 \right ) & \geq \frac{T_0}{10} + 3.3 G \\ 0.45 t& \geq \frac{\tau}{10} + 3.3 \\ t & \geq \frac{\tau}{4.5} + 7.\bar{3} \vphantom{\frac{G}{2}} \end{align}

In other words, the +200% threat buff adds $7\frac{1}{3}$ seconds to the $\tau/4.5$ seconds we normally have before tank #1 pulls aggro back.  I’ve illustrated that graphically below by plotting tank threat vs. time for a situation where $\tau=20$.  In the “before” model, tank #2 loses threat at a little under 5 seconds (green line).  In the “after” model, the tank doesn’t lose aggro until almost 12 seconds have passed (red line).

Tank threat vs. time for tank #1 (blue), tank #2 before 5.4 (green), and tank #2 after 5.4 (red).  The +200% threat buff adds over 7 seconds to the time tank #2 has to establish aggro.

This has two important consequences.  First, it guarantees that for any nontrivial $\tau$ you’ll have at least 8 seconds of aggro, which means your taunt cooldown will be back up before you’re at risk of losing threat.  Second, it makes the “no acceleration” model incredibly unlikely, because within 10 second of taunting the boss should be hitting you and generating more Vengeance.

Really, that’s enough of a calculation to satisfy ourselves that the new threat buff will really eliminate threat problems.  If you have 10+ seconds without acceleration, then clearly you’ll have even longer if we do include a Vengeance ramp-up.  However, it was an interesting calculation, so I’ll share it with you below.

A word of warning though: it involves come calculus.

With Acceleration – Before

First, we need to decide on how to model the “acceleration” term that describes Vengeance ramp-up.  It seemed to me that a fairly standard decay model would apply here, so I chose to use the form:

$\displaystyle \large r(t) =\left ( 1-0.1^{t/20} \right ) = \left ( 1-e^{\ln(0.1) t / 20} \right )$

That looks something like this:

Threat ramp function $r(t)$.

In other words, it acts like an exponential decay in reverse.  It starts at $r(0)=0$ and rises fairly quickly, eventually reaching $r(20)=0.9$ after $t=20$ seconds have passed, asymptotically approaching $r(\infty)=1$.  The choice of 20 and $\ln ( 0.1)$ as our rise-time are somewhat arbitrary, but should model how Vengeance actually builds up fairly well. Remember that this is only going to affect the acceleration term, which is 50% of tank #2′s overall threat generation, so after 20 seconds have passed they will be at $0.95G$.

There’s one more variable I want to introduce into this expression.  We may want to know what happens if the boss delays its attacks by a few seconds – for example, if it’s casting something when you taunt.  So I want to introduce an offset into that ramp-up function, which is accomplished by replacing $t$ with $(t-a)$, where $a$ is the time at which the boss resumes melee attacks.

With those conventions, our threat generation rate function looks like this:

$\displaystyle G_2 = \cases{ \frac{G}{2} & \text{if } t \leq a \cr \frac{G}{2}+ \frac{G}{2}\left ( 1 – e^{\ln (0.1) t / 20 } \right ) & \text{if }t > a }$

For $t<a$, this works exactly the way our “no acceleration” model does.  So we only need to consider $t>a$.  We perform the usual integral of $G_2$ from $0$ to $t$ to find our expression for $T_2(t)$:

\begin{align} T_2(t) &= T_0 + \int_0^t G_2 dt’ \\ &= T_0 + \int_0^t \frac{G}{2}dt’ + \int_a^t \frac{G}{2}\left ( 1-e^{\ln (0.1) (t’-a) / 20} \right ) dt’ \end{align}

As you can see, we’ve split up the integrals, and the second one is only evaluated for $t>a$ since the threat ramp is inactive before that point.  The first integration is easy, so we can trivially perform that:

$\displaystyle T_2(t) = T_0 + \frac{G}{2}t+ \frac{G}{2}\int_a^t \left ( 1 – e^{\ln(0.1)(t’-a)/20} \right ) dt’$

The second one is trickier, but not that bad.  Part of the reason I’ve used the form $e^{\ln (0.1) t/ 20}$ rather than $0.1^{t/20}$ is that every first-year calculus student knows how to integrate $e^x dx$.  So we could use a technique commonly called “u-substitution” to show that (proof left as an exercise for the reader)

$\displaystyle \int e^{\ln (0.1) (t-a)/20} = \frac{20}{\ln (0.1)}e^{\ln (0.1)(t-a)/20}$

And with that, we can perform the second integral:

\displaystyle \begin{align} T_2(t) &= T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t' - \frac{20}{\ln (0.1)} e^{\ln (0.1) (t'-a) / 20} \right ]_a^t \\ T_2(t) & = T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t - a - \frac{20}{\ln ( 0.1 )} \left ( 0.1^{(t-a)/20} - 0.1^{(a-a)/20} \right ) \right ] \\ T_2(t) & = T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t - a + \frac{20}{\ln ( 0.1 )} \left ( 1- 0.1^{(t-a)/20} \right ) \right ] \end{align}

The form of this equation makes it difficult to solve for $t$ in the inequality, so rather than trying to do that we’ll use graphs to interpret how this works.  But first we’ll consider the “after” case and put them all on the same plot for easier comparison.

With Acceleration – After

This version is a bit ugly.  Because we’ve left $a$ as an arbitrary turn-on time, we don’t know whether that happens before or after the threat buff expires at $t=3$.  So we have to take that into account in our expression for $G_2$.  Here is the complete version of $G_2$ for all four possible situations:

$\displaystyle G_2 = \cases{ \frac{3G}{2} & \text{if } t\leq 3, t \leq a \cr \frac{3G}{2}+\frac{3G}{2}\left ( 1 – e^{\ln (0.1) (t-a) / 20 }\right ) & \text{if } t<3, t>a \cr \frac{G}{2} & \text{if } t>3, t<a \cr \frac{G}{2}+\frac{G}{2}\left ( 1 – e^{\ln (0.1) (t-a) / 20 }\right ) & \text{if } t>3, t>a }$

Ew.  This also makes the limits of integration particularly ugly, as you’ll frequently run into a limit that would have to be described like “the smaller of $x$ or $y$,” or $\text{min}(x,y)$ (and in other places, the corresponding $\text{max}(x,y)$).  However, there is a particular combination of these constraints that simplifies the limits a lot.  We’ll define that combination $b$ as follows:

$b = \text{max}(\text{min}(t,3),a)$

In plain words, “compare $t$ to $3$ and take whichever is smaller, then compare that to $a$ and take whichever is larger.”  With that definition, the representation of $T_2(t)$ looks like this:

\begin{align} T_2(t) = T_0 & + \int_0^{t\leq 3} \frac{3G}{2}dt + \int_3^{t>3}\frac{G}{2}dt’ \\ &+ \int_a^b \frac{3G}{2}\left ( 1 – e^{\ln (0.1) (t-a)/20} \right )dt’ + \int_b^{t>a}\frac{G}{2} \left ( 1-e^{\ln (0.1) (t-a)/20} \right ) dt’ \end{align}

Still quite a mouthful.  The first and second terms are the continuous threat contribution from our “no acceleration” model.  I’ve put slightly more specific upper limits on the integrals just to make it clear how the behavior below $t=3$ occurs; if $t < 3$, then the first integral is $\int_0^t$ but the second integral is $\int_3^3$, which goes to zero identically.  And if $t>3$, the first integral is just $\int_0^3$ and the second is $\int_3^t$.  We already know that $t<3$ isn’t going to be interesting because we can’t lose threat even in the “no acceleration” model, but it matters a bit if you want the plots to look correct.  For example, when we perform the first integration, rather than $\frac{3G}{2}t$ we would express it $\frac{3G}{2}\text{min}(t,3)$ in MATLAB.

The last two terms describe threat generation from our Vengeance ramp function.  The first one, with limits $\int_a^b$ describes threat due to that ramp function between the turn-on time of the ramp ($a$) and the turn-off time of the threat buff ($t=3$).  Our definition of $b$ automatically collapses this function in certain cases: if $a>3$ or $t<a$, the upper limit becomes $a$ and the integral $\int_a^a$ is identically zero.

The second term describes threat generation due to the ramp after the threat buff turns off.  Again, our definition of $b$ ensures that if $t<3$, the lower limit is $t$ such that $\int_t^t$ goes to zero.  If $t<a$, both upper and lower limits are $a$ and the integral also goes to zero.

Again, we can perform these integrations pretty easily, so we’ll do so.  The result is a complicated piecewise function thanks to all of the conditionals, but with liberal use of the $\text{min}()$ and $\text{max}()$ functions it can be condensed into a single expression:

\begin{align} T_2(t) = T_0 &+ \frac{3G}{2}\text{min}(t,3) + \frac{G}{2}\text{max}(t-3,0) \\ &+ \frac{3G}{2}(b-a) + \frac{60G}{2\ln (0.1)} \left ( 1-0.1^{(b-a)/20} \right ) \\ &+ \frac{G}{2}\text{max}(t-b,0) – \frac{20G}{2\ln (0.1)} \left ( 0.1^{\text{max}(t-a,0)/20} – 0.1^{(b-a)/20} \right ) \end{align}

Again, it may be hard to get much intuition from the expression, but we can plot the results to see how this works for various values of $a$:

Tank threat vs. time for a variety of situations. Tank #1 is shown as a solid black line T1.  Tank #2′s threat before 5.4 is shown with dashed lines for various values of a (0, 1, and 2).  Tank #2′s threat after 5.4 is shown with solid lines for a=0, 1, 2, and 5.

First, let’s consider the “before” curves, which are shown in dashed lines.  For $a=0$, which is the case where the ramp starts immediately upon taunting, we still lose threat somewhere around $t=7$.  However, each second of delay we add to the ramp function lowers that crossover point by about a second.  If the boss doesn’t start meleeing tank #2 again until $t=2$, they lose threat five seconds after taunting.

And keep in mind that this is still the idealized, continuous case.  In reality, any time the threat curves for tanks #1 and #2 come close to one another, tank #2 is in danger of losing aggro thanks to a lucky (or unlucky, depending on your point of view) crit by tank #1.

The after curves are much more forgiving.  Just as before, tank #2 has great threat generation for the first three seconds.  And even with a five-second delay on the threat ramp function, they still manage to maintain aggro indefinitely.  That huge, discontinuous threat boost at the beginning gives them the wiggle room tank #2 needs to deal with unlucky boss behavior.  The curves start to get close after 14-16 seconds, but by that point tank #1′s Vengeance should be ready to decay as well.

So in short, even when threat ramp-up is included, threat can be a problem in the “before” model, but will rarely be an issue in the “after” model.

Summary

The entire point of this post wasn’t really to “prove” anything.  Pretty much every tank knew that threat was dicey after a taunt early in an encounter, the only question was how they dealt with that problem.  Some players use a Righteous Fury /cancelaura macro to make their co-tank’s lives easier (protip: Righteous Fury is on the GCD, but does not incur one, so you can cast RF and then immediately cast something else, making the cost of doing this very small).  Other tanks, especially ones taunting off of classes that can’t turn off their 500% threat multiplier buff, just shrug their shoulders and play through it, taunting back as soon as it’s available.

Instead, the goal of this post was twofold.  First, I wanted to illustrate mathematically why the problem exists in the first place.  Second, I wanted to determine whether the 5.4 threat buff fixes that problem, and if so how much wiggle room it gives you.

The short version is that it adds about 7 seconds to the expected “time to threat loss,” which should make it much easier to maintain aggro after a taunt within the first 30-40 seconds of the encounter.  Note that this is also applicable to any new mob – it’s not really encounter time that matters, it’s time spent building threat on the current target by both tanks.

Of course, the model is fairly limited.  We’re only looking at continuous threat generation, when in practice everything is discrete.  It’s still going to be possible to lose aggro after a taunt, but it should be rare.  It will almost require that you stop pressing buttons, or that the tank you’re taunting off of suddenly crits with several big abilities in a row while you only connect with weak attacks.

It will still be good practice to try and time your taunts such that you can follow them with Judgment, Avenger’s Shield, or Holy Wrath to make sure you have a heavy-hitter landing in that 3-second window.  But whereas before 5.4 that would only reduce your chance of having aggro ripped away, after 5.4 it should more or less guarantee it.

## The Making of a Metric: Part 3

In our last installment, we nailed down the weight factors we’ll use for our smoothness metric.  Today we’re going to wrap it up by specifying normalization conditions for the histogram and formally defining the metric. To refresh your memory a bit (or if you’ve joined us mid-stream), to get to this point we’ve done the following:

1) Recorded a damage and healing (or “tank health change”) timeline during a simulation.  This is basically a list of every time you take damage or healing along with the timestamp at which that event occurred.

2) Calculated a moving sum of that timeline over 4 boss attacks, or equivalently over 6 seconds of real-time.  This gives us a new array representing all of the potential 4-attack damage spikes we could take, and is the source data we use for the smoothness analysis tables we’ve been using for the past 6 months or more.

3) Generated a histogram of that moving sum data.  Again, this is just like what we’ve presented in the smoothness analysis tables, just done graphically and with finer bins.

4) Developed a weight function that we can use with the histogram.  Multiplying the histogram by the weight function will preferentially value high-damage spikes and devalue weak spikes.

5) Roughly defined the metric as this multiplication of histogram and weight function.

Now we want to refine the metric by considering the appropriate normalization conditions.

Normalization

There are a few reasons for normalizing the histogram before computing TMI.  The first and foremost is that it makes the number you get more consistent between different experimental setups. In the previous two posts, the data I provided was normalized only by player health (i.e., along the x-axis).  Everything else was left as-is for a 4-attack moving sum.  Note that I said sum, not average – I wasn’t even normalizing with respect to time.  That’s why we got numbers that looked like this:

|    Set |    TMI |
|   C/Ha |  18332 |
|   C/St |   7895 |
|   C/Sg |  16102 |
|  C/Shm |  22631 |
|   C/Ma |  41994 |
|   C/Av |  63949 |
|  C/Bal |  40468 |
|   C/HM |  23096 |
|     Ha |  49835 |
|  Avoid | 231586 |
| Av/Mas | 229068 |
| Mas/Av | 190308 |
|   Ha/h |  31126 |
|  Ha/he |  27795 |
|  C/Str |  66023 |

However, that was with 10k minutes of simulation.  If we had run for 20k minutes, they would be roughly double those values, and if we had run for 5k they would be half as large. It would be ideal if they all gave roughly the same ballpark TMI value within error since they’re all simming the same setup, just to different levels of precision.  So one additional variable we want to normalize with respect to is simulation length.

Similarly, if we decided to calculate TMI with a 5- or 6-attack moving average instead of a 4-attack moving average, it would be nice if the values came out relatively close.  As we’ll see, we can’t make them perfect, but we can get them in the right ballpark.  So that’s another variable we want to include in our normalization: the time window over which we perform our moving average.  In essence, this is really just saying that we want to perform a true moving average of the damage timeline rather than a moving sum.

The first of those two is very easy, so we’ll save it for later.  Let’s instead look at the time window normalization.  To illustrate the point a little more clearly, here are the histograms that you get if you perform a moving sum of the damage timeline from the repeatability data set I used in the last blog post for different numbers of attacks $N$ ranging from 2 to 7:

Histogram after only health normalization. Each panel shows the histogram for a different number of attacks being considered.

It should be obvious that the distribution is shifting upwards roughly linearly, because we’re adding successively more attacks together in our moving sum.  If we were to apply the weight function at this point, we’d get weighed histograms that look like this:

Weighted histogram after only health normalization. Each panel shows the histogram for a different number of attacks being considered.

Of course this skews the TMI values you get pretty heavily.  This is what the TMI looks like for those different plots:

 # attacks 2 3 4 5 6 7 TMI 468 2041 18071 109300 504459 4723791

Remember that this is the same data, just averaged differently.  Ideally we want this to be a little more stable.

The first step is the obvious one: use a moving average instead of a moving sum.  In other words, divide each moving sum by the appropriate $N$.  I’m going to add one wrinkle to that procedure: I’m also going to multiply by 4.  Why?  Because so far, we’ve been designing the metric around a 4-attack moving average, which nicely puts the bulk of the distribution’s value around the 100% of our health mark.  I wouldn’t need to do this, of course – I’m just multiplying by an arbitrary constant, so it won’t change the relative values of anything.  But it will make the plots look nicer and keep consistency with what we’ve done already.

So if we multiply each moving sum by $4/N$, we get unweighted histograms that look like this:

Raw histogram after health and time normalization. Each panel shows the histogram for a different number of attacks being considered.

That looks a lot better.  The distributions all have the same mean value now (a little less than 0.5, or 50% player health), so that should fix up our TMI weightings, right?  Well, not quite.  Here’s what you get for TMI in this case:

 # attacks 2 3 4 5 6 7 TMI 369918 28821 18071 10168 6349 5884

We now have the opposite problem: TMI is going down as $N$ goes up.  What’s going on here? The answer lies in the histogram plots above.  But as a hint, here are the associated weighted histograms.  See if you can figure out what’s wrong:

Weighted histogram after health and time normalization. Each panel shows the histogram for a different number of attacks being considered.

The problem we’re seeing is actually caused by two factors.  The first is that while the distribution may be centered at the same value, it’s not the same width.  A 7-attack moving average gives a much narrower distribution than a 2- or 3-attack moving average.  The second factor is our exponential weight function, which magnifies that difference.  Wider distributions include more values at higher percent-health values, which get weighted exponentially more.

If we wanted to model this exactly, we’d estimate the distribution as a Gaussian function of the form $e^{-a(x-1)^2}$ and then multiply by our weight function $w(x)=e^{10\ln(3)(x-1)}$.  Treating these as continuous functions and making the change of variables $y=x-1$, we get the following integral:

$$TMI \propto \int e^{-ay^2+by}dy$$

where I’ve used $b=10\ln(3)$ to make it simpler. By completing the square we can show that this expression evaluates to

$$TMI \propto \sqrt{\frac{\pi}{a}}e^{b^2/4a}$$

Now, here’s the important part.  The constant $a$ is related to the width of the distribution – it’s actually inversely proportional to the square of that width.  And since the width seems to be inversely proportional to $N$, that means $a$ is directly proportional to $N$.  Technically it’s proportional to some function of $N$, because we don’t know exactly how the two are related, but we can estimate it as a power-law effect.  So given that $a \propto N^{2k}$ and throwing away all unnecessary constants, we have:

$$TMI \propto \frac{e^{~c/N^{2k}}}{N^k}$$

where $c$ is a constant determined by the exact composition of $b$ and $a$.  Thus, if we want to normalize our data properly, we’d want to multiply our current TMI metric by the inverse of this, namely $N^k e^{-c/N^{2k}}$.  We could try to fit our data to this form and get a value for $c$ and $k$ (and I did), but in practice that’s not so useful.  First, because our histogram isn’t really Gaussian to begin with, especially for lower $N$ values.  Second, because the histogram shape changes from gear set to gear set, so even if we could nail down $c$ and $k$ for one gear set it may differ for another.

Instead, I’m going to take a less accurate but simpler approach.  The point of this normalization was not to make the numbers uniform across all moving average lengths, just to bring them closer together.  So we’ll drop the exponential factor and just try multiplying by $N^k$ while calculating TMI.  Fooling around a bit, $k=2$ seemed to be fairly effective; here’s what we get if we do that:

 # attacks 2 3 4 5 6 7 TMI 1479670 259389 289132 254188 228567 288322

Much better! Now they’re all within a moderately small range, from 250k to 290k, except for the 2-attack moving sum. The 2-attack moving average is too far gone to fix, to be honest.  That part of the curve is where the exponential factor we dropped makes a big difference, and it’s also the distribution that deviates most from Gaussian.  Since a 2-attack moving average isn’t something we ever worry about much anyway, it’s reasonable to exclude it as irrelevant and focus on making the 3- to 7-attack moving averages better.

There’s one more normalization step, which is the one I mentioned at the beginning: simulation length.  This one is easy though, because we just end up dividing by a constant value.  In this case, it’s the number of attacks we’ve received, which is 400k.  So we do that, which gives us:

 # attacks 3 4 5 6 7 TMI 0.64847 0.72283 0.63547 0.57142 0.72081

Pretty nice!

There’s one final step I want to include though.  While it doesn’t make any difference in the results, I want to multiply by a constant factor of 10000.  Why?  Most people will have more trouble remembering and interpreting a decimal like 0.7208 than they will a rough estimate like 7000.  Keep in mind that we expect to see smaller TMI values, and it will get unwieldy to try and describe TMIs of 0.02 vs. 0.03 vs. 0.04 when we could just be talking about 200, 300, and 400.  It also gives a clearer impression of the amount of change, because going from 0.02 to 0.04 doesn’t seem like a big difference, but 200 to 400 does.

That gives us values that look like this:

 # attacks 3 4 5 6 7 TMI 6485 7228 6355 5714 7208

We now have a complete definition of TMI. We’re not quite done yet though, as we can make a fairly significant simplification.

Cutting out the middle man

Up until now I’ve framed everything in terms of analyzing histograms because that’s what we do when we make our qualitative assessments.  But it’s not actually necessary for the numerical version – in fact, it decreases accuracy to use it in the process.

To illustrate why, here’s a simple example.  Let’s say we have the data set:

{ 2, 3, 4, 5, 6, 7, 8, 9 }

Let’s also assume that we use coarse bins for the histogram of this data, perhaps 3 units wide centered at 0, 3, 6, 9, 12, etc.  Our histogram would look like this:

0: 0
3: 3
6: 3
9: 2
12: 0

Now let’s say we perform the weighted average of the histogram, but for simplicity we use a flat weighting rather than an exponential one.  Averaging the histogram gives us:

$$\frac{0*0 + 3*3 + 6*3 + 9*2 + 12*0}{3+3+2} = \frac{45}{8} = 5.625$$

But if we just took the average of the source data, we’d get:

$$\frac{2 + 3 + 4 + 5 + 6 + 7 + 8 + 9}{8} = \frac{44}{8} = 5.5$$

Now of course, the result gets closer the more bins we use.  If we used a bin width of one centered at 0, 1, 2, 3, etc., we’d get identical results.  But at that point you’re not really accomplishing anything with the histogram at all, because every data point has its own bin.

The same is true in our case.  We have an array of moving average values, and while it’s convenient to bin them and show them as a histogram for plots, it’s not at all necessary for calculations.  Rather than calculating a weight factor based on the bin center and multiplying by the number of elements in the bin to get our weighted result, we could just calculate the weight function based on each data point itself and sum the result.

So we can completely cut out the histogram and go directly from the moving average array to the final TMI calculation.  With that simplification, we have the final process we’ll use to calculate TMI.  I’ll reiterate that entire process below so that we have it all in one place.

Formal definition of the Theck-Meloree Index

Note: While I’ve done everything so far in MATLAB, eventually we’ll want to do this all in Simcraft.  So even though I’ve used boss melee attacks as my default time window (i.e. a 4-attack moving average), we’re going to define the metric in terms of seconds here instead.

To calculate TMI from a damage (and healing) timline $D$ with time bins of width $dt$, we perform the following operations:

1) Calculate the 6-second equivalent moving average array of damage over $T$ seconds for the entire simulation length $\tau$ (also expressed in seconds).  We will use $T_0=6$ to represent this standardized window size.  This step can then be formally expressed for the $i^{th}$ element of the resulting moving average array as:

$$MA_i = \frac{T_0}{T}\sum_{j=1}^{T / dt} D_{i+j-1}$$

which produces an array $MA$ of length $M=(\tau – T)/dt$.  It is also acceptable to use an apodized moving average that produces an array of length $M=\tau/dt$ Note that this step includes normalization for the time window we’re considering ($T_0/T$).

2)  Calculate the exponentially-weighted average of the moving average array as follows

$$\Large {\rm TMI} = \frac{10000 T^2}{M} \sum_{i=1}^M e^{10\ln(h)(MA_i/PH-1)}$$

where $PH$ is the player’s health.  Note that this step normalizes for player health, fight duration (through $M$), and includes the normalization factor for moving average length ($T^2$).

And that’s it!  At some point in the future I’ll post a complete standardization reference that includes this information and more (including things like the standard boss settings), probably as a separate post.  But for now that should work for us.  Note that this is really a proto-definition; we’ve found numbers that work well in MATLAB with this particular standard boss, but when we implement this in Simcraft we may need to tweak the normalization factors slightly.  For example, I changed $N^2$ to $T^2$ when it should really be $(T/1.5)^2$, which adds a multiplicative factor of 2.25.  I’m not worrying about that level of detail just yet though, as we can just change 10000 to whatever we want to soak up those flat multiplicative variations.

The only thing I want to add at this point, because I’ll be using it in the next section, is a nomenclature detail.  The term TMI will properly refer to the metric as calculated using a 6-second (or 4-attack) moving average (i.e. $T=T_0 = 6$).  If we want to refer to the metric as calculated using a different $T$, we will make that clear by calling it TMI-T, such as TMI-9 for a number calculated using a 9-second moving average.  Note that it’s still normalized to $T_0=6$, just like we’ve done in the histogram figures above.  That also means that TMI-6 is the same thing as just saying TMI.

Comparing gear sets

Now that we’ve got the final form of TMI, let’s see how this works for the gear sets we investigated in Part 2.  Here’s the full TMI matrix for TMI-4.5 (3 attacks) through TMI-10.5 (7 attacks) for all of the gear sets

pct=100.00, N=200, vary hdf

|    Set | TMI-4.5|   TMI | TMI-7.5| TMI-9 | TMI-10.5|
|   C/Ha |   6510 |  7333 |   6448 |  5782 |    7308 |
|   C/St |   2579 |  3158 |   3186 |  3133 |    4032 |
|   C/Sg |   5595 |  6441 |   5956 |  5487 |    7027 |
|  C/Shm |  12329 |  9053 |   9317 |  8583 |    8681 |
|   C/Ma |  35402 | 16798 |  14027 | 12395 |   12189 |
|   C/Av |  69139 | 25579 |  23279 | 22235 |   21745 |
|  C/Bal |  26551 | 16187 |  15309 | 13198 |   12505 |
|   C/HM |   9400 |  9238 |   7226 |  6235 |    7706 |
|     Ha |  23534 | 19934 |  13139 | 12983 |   13479 |
|  Avoid | 293086 | 92635 |  74288 | 60576 |   49226 |
| Av/Mas | 289307 | 91627 |  70596 | 53821 |   42384 |
| Mas/Av | 198467 | 76123 |  58722 | 43421 |   34818 |
|   Ha/h |  14302 | 12451 |   9569 |  9155 |    9841 |
|  Ha/he |  10460 | 11118 |   9084 |  7801 |    9395 |
|  C/Str |  65271 | 26409 |  25203 | 22577 |   20106 |

Not bad.  We no longer have crazy TMI values in the 200 thousands, though the avoidance sets do get up near 100k.  But of the useful gear sets, the numbers are pretty reasonable.  C/Ha comes in a little over 7k, and it’s clear that C/St is a significant improvement at 3k while C/Sg is only a small improvement at a little over 6k.  Just like our qualitative assessments suggested.

Comparing bosses

Before we quit for the day, I want to demonstrate another quirk of TMI.  I mentioned a few paragraphs ago that I’ll be defining a “standard boss” in a future post.  I mentioned the reason why in passing in Parts 1 and 2, but now I want to formally explain why we need to do this.

First, consider what happens if we halved the size of the boss’s melee attacks.  The entire histogram would shift to the left because each spike just became half as large as it was before (if not smaller, thanks to absorb effects). That looks something like this, where we’ve reduced the boss’s melees from 350k (after mitigation) to 200k:

Health-normalized histogram for a boss that swings for 200k after mitigation.

And of course, if we then perform our weighted-average calculation on the histogram we get a much smaller number.  For example, here’s the TMI values we get with the above histogram:

pct=100.00, N=200, vary hdf

|    Set |   TMI |
|   C/Ha | 113.2 |
|   C/St |  81.8 |
|   C/Sg | 109.9 |
|  C/Shm | 126.8 |
|   C/Ma | 176.4 |
|   C/Av | 217.0 |
|  C/Bal | 156.8 |
|   C/HM | 118.2 |
|     Ha | 154.8 |
|  Avoid | 301.1 |
| Av/Mas | 286.0 |
| Mas/Av | 280.2 |
|   Ha/h | 132.3 |
|  Ha/he | 129.3 |
|  C/Str | 153.3 |

The relative ordering is the same, of course – that’s courtesy of our smart choice of an exponential weight function.  But the values are much lower than they were in the first table.  Now consider what happens if you compare the value for Av/Mas from this table to C/Ha from the first table.  It looks like Av/Mas wins, doesn’t it? But that’s only because we cheated, and weren’t comparing apples to apples.

In some sense, we were measuring using different scales.  The lower table could be in feet and the upper table in centimeters, for example, and there’s no doubt that 3 cm is less than 1 foot.  But if you leave off the units, it looks like it’s just 3 vs. 1.

We could normalize for this if we had to, just like we normalized out other factors.  But this one is more complicated and less useful.  First of all, what do we normalize by?  Boss melee attack size?  That’s all well and good until we start introducing magic damage into the mix, at which point our normalization doesn’t work correctly anyway.

We could normalized based on raw boss DPS from all sources before mitigation.  That seems like it should work, but introduces another wrinkle.  What if player A calculates their TMI with a boss doing 1 million raw DPS with only melee attacks and player B calculates their TMI with a boss that does 1 million raw DPS with only spells?  Will it be valid to compare the two?  Well, no, not really, because physical and spell damage function differently.  So even this normalization doesn’t make it any easier to compare TMI values across different situations.  We’d still need to specify what the boss details are, so we may as well have not bothered with the normalization in the first place.

There’s another reason I prefer not to normalize by boss DPS.  If we compare the bosses doing 350k and 200k damage per swing, the differences between gear sets are much smaller.  While the relative ordering is the same, it’s clear that the impact of changing from a C/Ha gear set to Av/Mas is not that big.  And that’s actually useful information!  It’s telling you that for this boss, there isn’t a huge advantage to any of the gear sets in terms of survivability.  In other words, it suggests that you significantly overgear the boss, which is a hint that you can start shifting to DPS stats.

So that’s why we have to specify a standard boss.  Rather than doing that now, I’m going to wait until I can collect all of the relevant specifications into a single blog post (along with the SimC implementation, which I haven’t finished yet at the time of this blog post’s writing).  But it will likely be primarily physical damage with a sprinkling of magic damage via a DoT effect.

Summary

There aren’t really any “conclusions” to draw from today’s post.  We were mostly fine-tuning the details of the metric we’ve developed in the last two blog posts.  But we can briefly summarize what we’ve done.

First, we discussed and developed normalization conditions for the metric.  This doesn’t actually change the results any, it just scales them to be more convenient.  Rather than comparing numbers like 0.012 to 0.024, we can use normalization to turn those into more easily-interpretable numbers, like 1200 and 2400.  The normalizations we’ve applied attempt to keep the value semi-reasonable under a wide range of simulation variables.

We also made note of the fact that the histograms we’ve been showing were an unnecessary middle-man in the calculation process, and removed them from the process when we provided the formal definition of the TMI metric.

Then, we tested the normalized function with a bunch of gear sets just to make sure the results still made sense and agreed with our previous results.  Of course, since normalization factors don’t change the relative values, there’s no way anything we did could have changed the results unless we screwed something up (spoiler: we didn’t).

Finally, we briefly touched on the reason that we need to define a “standard boss” for use with the metric.  The metric will certainly work with any boss definition you like, but the values you get out will depend heavily on how that boss is configured.  So if you want to be able to make comparisons (like between two different classes, for example), having a standard is really useful.

The next post on this topic probably won’t be until next week.  It will discuss the Simulationcraft implementation of the metric and any modifications we’ve had to make to get it working well.  It will also formally define the metric, including details on the standard boss, and provide brief instructions on how to load your character in Simcraft and calculate your own scale factors.

## The Making of a Metric: Part 2

In our last post, we decided on a functional form for our metric.  And while I didn’t write it out in full mathematical formalism, the pieces were all there.  In short, we start with a damage histogram $H(x)$ computed by taking the moving average of our damage (and healing) taken timeline.  $H(x)$ looks like this:

Histogram of the 4-attack damage string data.

From that histogram we can calculate the unnormalized Theck-Meloree Index or TMI as follows:

$\displaystyle \Large {\rm TMI} = \int_{-\infty}^\infty H(x)e^{10\ln(h)(x-1)}$

where $H(x)$ is the continuous damage intake histogram and $h$ is the health decade factor, or HDF.  That equation assumes you have a continuous histogram, which generally isn’t going to be the case.  Normally we’ll have a data set like the one in the figure, where the data is divided into individual bins.  Each bin is centered at a value $x_i$ (for the $i^{\rm th}$ bin) at which the histogram has a value $H_i$.  As a result, we can write the discrete form of the TMI as:

$\displaystyle\Large {\rm TMI} = \sum_{i=1}^M H_i e^{10\ln(h)(x_i-1)}$

Where $M$ is the total number of bins we’re considering.  As of this point, I have left out normalization conditions.  We won’t tackle that topic today, so for those of you who care about such things, just assume for the moment that there is a specified normalization condition on $H(x)$.  When we do tackle normalization, that’s likely the way we’re going to approach it anyway.

I also want to be very clear that this is our “working definition” for the metric.  As in, it’s not the final version.  We still want to fool around with it some, normalize it, and so on.  And in fact, in the next post I’ll show that we can significantly simplify the process.  But for now, this is the clearest way of saying “here’s the definition we’re thinking about” while we test it to see if we’re happy with the definition.

We decided last time that we were likely to focus on an HDF of around 3.  That determination was based on a few factors, but mostly based on the relative stat weights it produced for a Control/Haste gear set.  However, we want to make sure that this metric works well for a variety of different gear configurations, so we need to check them.  In this post, we’ll look at a variety of gear sets and see whether the TMI metric matches our qualitative observations.

I’m including all of the gear sets we’ve used in the past in this post.  We’ll be looking at a couple in detail first to see how stat weights vary from gear set to gear set.  Then we’ll look at the raw TMI score for all of the gear sets together.  Below is the exhaustive list of all the gear sets we’ll consider in this post. Note that I’ve transposed the usual table format because we’re considering so many different gear sets, and this arrangement is just easier to read in blog format.

|   Set: |   Str |   Sta | Parry | Dodge | Mastery |  Hit |  Exp | Haste |
|   C/Ha | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 | 12000 |
|   C/St | 15000 | 34000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|   C/Sg | 15000 | 31000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|  C/Shm | 15000 | 31000 |  1500 |  1500 |    4750 | 2550 | 5100 |  4750 |
|   C/Ma | 15000 | 28000 |  1500 |  1500 |   13500 | 2550 | 5100 |     0 |
|   C/Av | 15000 | 28000 |  7500 |  7500 |    1500 | 2550 | 5100 |     0 |
|  C/Bal | 15000 | 28000 |  4125 |  4125 |    4125 | 2550 | 5100 |  4125 |
|   C/HM | 15000 | 28000 |  1500 |  1500 |    6750 | 2550 | 5100 |  6750 |
|     Ha | 15000 | 28000 |  1500 |  1500 |    1500 |  500 |  500 | 18650 |
|  Avoid | 15000 | 28000 | 10825 | 10825 |    1500 |  500 |  500 |     0 |
| Av/Mas | 15000 | 28000 |  7717 |  7717 |    7716 |  500 |  500 |     0 |
| Mas/Av | 15000 | 28000 |  4000 |  4000 |   15150 |  500 |  500 |     0 |
|   Ha/h | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 |  500 | 16600 |
|  Ha/he | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 2550 | 14550 |
|  C/Str | 27600 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 |     0 |

In no particular order, let’s consider a few of these gear sets.  Fair warning: the rest of this post is very number-crunchy and not very visually aesthetic.  If numbers aren’t really your thing, I’ve provided a pretty good summary of what’s going on in the tables, but it’s still probably going to bore you to death.  You may want to skip to the conclusions in that case.

Control/Balance

This gear set was a compromise originally, and an attempt to model a character that’s using a mix of different gear rather than strictly following a particular philosophy.  But we’ve only ever compared this to other gear sets; we’ve never really looked at stat weights in this configuration.  This time, we’ll do that by adding 1000 stamina, haste, mastery, dodge, or parry, or subtracting 1000 hit or expertise.  Here’s th data we get if we do that:

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-0

| Set: |  C/Bal |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.275 |  0.275 |  0.282 |  0.281 |  0.267 |  0.268 |  0.268 |  0.269 |
|  std |  0.113 |  0.113 |  0.116 |  0.116 |  0.111 |  0.113 |  0.114 |  0.113 |
|   S% |  0.452 |  0.452 |  0.440 |  0.445 |  0.462 |  0.453 |  0.453 |  0.453 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 51.610 | 49.807 | 53.683 | 53.190 | 49.052 | 49.120 | 49.260 | 49.574 |
|  60% | 36.136 | 32.477 | 38.333 | 37.721 | 33.432 | 34.597 | 33.936 | 34.209 |
|  70% | 21.892 | 16.894 | 24.227 | 23.770 | 19.467 | 18.576 | 20.289 | 20.507 |
|  80% |  9.240 |  8.283 | 11.012 | 10.594 |  7.714 |  8.272 |  8.391 |  8.527 |
|  90% |  4.066 |  3.338 |  5.147 |  4.842 |  3.160 |  3.583 |  3.572 |  3.669 |
| 100% |  1.440 |  0.907 |  2.033 |  1.871 |  1.068 |  1.303 |  1.257 |  1.295 |
| 110% |  0.414 |  0.356 |  0.645 |  0.584 |  0.288 |  0.371 |  0.353 |  0.366 |
| 120% |  0.169 |  0.112 |  0.274 |  0.251 |  0.112 |  0.160 |  0.144 |  0.147 |
| 130% |  0.030 |  0.016 |  0.065 |  0.056 |  0.015 |  0.033 |  0.024 |  0.022 |
| 140% |  0.004 |  0.002 |  0.010 |  0.007 |  0.001 |  0.004 |  0.002 |  0.002 |

This is actually sort of curious.  Most of the old standbys seem to hold true – stamina and haste are both strong, though in this case haste is actually neck and neck with stamina.  Hit and expertise are still the strongest.  But dodge and parry are actually slightly beating out mastery here.  This is probably because of the interactions I talked about long ago, in beta – haste and mastery make each other better, while dodge and parry make mastery worse.  Here we have the perfect storm: relatively low haste and high dodge/parry, both of which keep mastery weak.  Another way to look at it is this: in the low-haste regime, the peak events are going to be strings of attacks that occur without much SotR coverage.  Mastery may help soften one of those attacks, at best, but does nothing for the heavy-hitters.  Avoidance helps break up those strings a little more effectively because it can eliminate one of the unmitigated hits.

In any event, for this data set we would probably argue that hit>exp>haste>stamina>dodge>parry>mastery for the purposes of smoothing.  So our metric ought to give the same results.  Let’s see if it does.

First, let’s fix $h$ at 3 and see how the result varies with the percentage of attacks we consider.  Just like last time, we’ll cherry-pick the top 1% to 10% of the histogram and calculate the metric that way, and then finish the table with the 100% (all-inclusive) calculation.

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  9250 | 18218 | 12760 |  8793 |  1840 |  4128 |  3425 |
| 0.020 | 10715 | 19426 | 13648 |  9816 |  2616 |  4605 |  3852 |
| 0.030 | 11742 | 19714 | 13846 | 10277 |  3107 |  4943 |  4069 |
| 0.040 | 10994 | 19872 | 13964 | 10395 |  2612 |  5004 |  4134 |
| 0.050 | 12034 | 20191 | 14239 | 10683 |  3977 |  5110 |  4233 |
| 0.060 | 12151 | 20226 | 14244 | 10736 |  3398 |  5146 |  4264 |
| 0.070 | 11668 | 20353 | 14364 | 10834 |  2851 |  5197 |  4306 |
| 0.080 | 11860 | 20359 | 14365 | 10865 |  3062 |  5266 |  4373 |
| 0.090 | 11529 | 20446 | 14439 | 10916 |  3124 |  5287 |  4385 |
| 0.100 | 11928 | 20469 | 14466 | 10963 |  3433 |  5319 |  4416 |
| 1.000 | 12135 | 20578 | 14559 | 11144 |  3517 |  5529 |  4599 |

Even with an HDF of 3, we’re getting edge effects in the mastery column.  The reason mastery always seems to be suffering more than the other stats is due to how mastery affects the histogram.  Increasing it slightly tends to shift the entire distribution to the left a little, especially at the top end.  As a result, it will frequently shift a chunk of the histogram over that arbitrary percentile cutoff, making the stat weight more volatile.  Of course, this effect is suppressed in the row that accounts for 100% of all events, which is one of the main reasons I think it will make for a better metric.

In any event, the general trends are all here.  Hit and expertise dominate, followed by stamina and haste.  Stamina uniformly leads haste in these weights, but not by much. Our qualitative assessment put them pretty close to one another, and I think that stamina is inching ahead due to its gains in the 100% and 80% categories.  Also keep in mind that the data table has a very coarse binning, which obscures things somewhat.  If haste’s entries tend to be near the top of a bin while stamina’s entries tend to be lower, for example, they’d look similar on the table but stamina would generate a better stat weight.  That seems to be what’s happening here, and it’s one of the reasons a more finely-grained numerical metric is more reliable than our coarse-binned qualitative method.

Let’s vary the HDF and see what happens:

pct=100.00, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  6621 |  7204 |  5395 |  6343 |  4716 |  4264 |  3572 |
| 1.6 |  6695 |  7592 |  5665 |  6367 |  4530 |  4114 |  3450 |
| 1.7 |  6768 |  7993 |  5943 |  6405 |  4348 |  3994 |  3349 |
| 1.8 |  6868 |  8435 |  6249 |  6476 |  4182 |  3910 |  3279 |
| 1.9 |  7006 |  8935 |  6594 |  6588 |  4038 |  3864 |  3240 |
| 2.0 |  7189 |  9503 |  6988 |  6747 |  3915 |  3855 |  3231 |
| 2.1 |  7423 | 10148 |  7435 |  6952 |  3812 |  3882 |  3252 |
| 2.2 |  7709 | 10876 |  7939 |  7206 |  3728 |  3941 |  3301 |
| 2.3 |  8049 | 11695 |  8505 |  7510 |  3660 |  4033 |  3376 |
| 2.4 |  8446 | 12612 |  9138 |  7864 |  3606 |  4155 |  3477 |
| 2.5 |  8900 | 13633 |  9840 |  8270 |  3566 |  4308 |  3602 |
| 2.6 |  9416 | 14766 | 10616 |  8729 |  3537 |  4490 |  3753 |
| 2.7 |  9994 | 16018 | 11471 |  9244 |  3518 |  4703 |  3928 |
| 2.8 | 10638 | 17398 | 12410 |  9816 |  3510 |  4947 |  4127 |
| 2.9 | 11350 | 18915 | 13438 | 10448 |  3509 |  5222 |  4351 |
| 3.0 | 12135 | 20578 | 14559 | 11144 |  3517 |  5529 |  4599 |
| 3.1 | 12994 | 22396 | 15781 | 11905 |  3533 |  5870 |  4874 |
| 3.2 | 13932 | 24382 | 17108 | 12735 |  3556 |  6246 |  5174 |
| 3.3 | 14953 | 26545 | 18548 | 13637 |  3586 |  6658 |  5500 |
| 3.4 | 16061 | 28897 | 20107 | 14616 |  3623 |  7108 |  5854 |
| 3.5 | 17260 | 31452 | 21792 | 15674 |  3667 |  7597 |  6236 |
| 3.6 | 18555 | 34221 | 23611 | 16816 |  3717 |  8128 |  6646 |
| 3.7 | 19949 | 37219 | 25572 | 18047 |  3775 |  8702 |  7085 |
| 3.8 | 21449 | 40460 | 27682 | 19370 |  3839 |  9321 |  7554 |
| 3.9 | 23060 | 43959 | 29951 | 20790 |  3911 |  9987 |  8053 |
| 4.0 | 24785 | 47731 | 32387 | 22313 |  3990 | 10703 |  8584 |
| 4.1 | 26632 | 51795 | 35000 | 23942 |  4076 | 11471 |  9147 |
| 4.2 | 28606 | 56166 | 37800 | 25683 |  4171 | 12293 |  9741 |
| 4.3 | 30711 | 60862 | 40797 | 27542 |  4274 | 13171 | 10369 |
| 4.4 | 32956 | 65904 | 44002 | 29523 |  4385 | 14109 | 11031 |
| 4.5 | 35345 | 71310 | 47426 | 31633 |  4505 | 15109 | 11726 |

An HDF of 2.0 is clearly too low here, as mastery starts pulling ahead of dodge and parry, and haste catches up to expertise.  2.5 is still a little low as well, since parry is barely edging ahead of mastery.  But 3.0 looks pretty solid; dodge and parry are ahead of mastery by about 1/3, and expertise pulls ahead of haste by a reasonable margin.  By the time we hit 3.5 we’ve definitely gone too far the other way, though.  Avoidance isn’t twice as good as mastery, nor is haste about 5x better than mastery.

I think anything between about 2.8 and 3.3 seems to give reasonable values that sync up pretty well with our qualitative conclusions.  The choice within that band is somewhat arbitrary, but with any luck we’ll be able to narrow it down some more by looking at other data sets.  So let’s do that.

Control/Haste+Mastery

The Control/HM set was added to explore the synergy between haste and mastery.  Back before Seal of Insight and Sacred Shield were added to the model haste and mastery were fairly close in value.  They also had a feedback-like effect on one another, in that each improved the other.  So there was speculation that splitting itemization between the two might be beneficial.  When SoI and SS were added to the simulation haste went up in value, but mastery didn’t change significantly, which put the nail in the haste+mastery coffin.  Haste was just flat-out better at that point.  Still, this set is a useful test bench because it has low avoidance but a sizable amount of both haste and mastery.

Here’s how the simulation data turns out when we start with C/HM as a baseline and add (or subtract in the case of hit/exp) 1000 of each stat.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-1

| Set: |   C/HM |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.259 |  0.259 |  0.268 |  0.265 |  0.252 |  0.253 |  0.253 |  0.253 |
|  std |  0.104 |  0.104 |  0.107 |  0.107 |  0.102 |  0.104 |  0.104 |  0.105 |
|   S% |  0.472 |  0.471 |  0.458 |  0.463 |  0.483 |  0.472 |  0.472 |  0.472 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 46.061 | 45.590 | 49.014 | 48.251 | 43.967 | 45.005 | 44.244 | 44.295 |
|  60% | 29.677 | 26.089 | 32.732 | 31.949 | 27.267 | 27.601 | 27.981 | 28.078 |
|  70% | 14.807 | 13.454 | 17.204 | 16.728 | 13.011 | 13.599 | 13.664 | 13.856 |
|  80% |  6.978 |  6.225 |  8.656 |  8.181 |  5.906 |  6.644 |  6.380 |  6.496 |
|  90% |  3.061 |  1.813 |  4.178 |  3.907 |  2.468 |  2.432 |  2.773 |  2.883 |
| 100% |  0.830 |  0.554 |  1.319 |  1.190 |  0.627 |  0.681 |  0.734 |  0.786 |
| 110% |  0.198 |  0.194 |  0.361 |  0.293 |  0.130 |  0.174 |  0.145 |  0.178 |
| 120% |  0.068 |  0.040 |  0.152 |  0.110 |  0.041 |  0.065 |  0.054 |  0.056 |
| 130% |  0.006 |  0.003 |  0.030 |  0.019 |  0.002 |  0.006 |  0.005 |  0.006 |
| 140% |  0.001 |  0.000 |  0.006 |  0.002 |  0.001 |  0.000 |  0.001 |  0.000 |

Hit and expertise are still going to be clear winners here, with haste and stamina both coming in at a distant third.  Curiously, parry and mastery seem pretty well-matched here, though mastery seems to have a slight edge.  Dodge, on the other hand, is still a little ahead of mastery in this set.  I think what we’re seeing here is a sort of innate diminishing returns on mastery – if you don’t have a lot, then it’s pretty strong.  But as you get more of it, the benefit to SotR is less and less important because the hits it mitigates already fall in the middle of the histogram.  The peaks we’re seeing at the top here are mostly affected by the block chance contribution, which is fairly weak.

In any event, the results seem pretty clear from this data:
hit>exp>(haste/stamina)>dodge>mastery>parry
Let’s see if the metric agrees.

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |  Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  4663 | 14455 | 7289 |  3799 |  1959 |  2239 |  1271 |
| 0.020 |  4828 | 15236 | 7857 |  4365 |  1745 |  2533 |  1488 |
| 0.030 |  5500 | 15445 | 8035 |  4458 |  2361 |  2570 |  1543 |
| 0.040 |  4629 | 15718 | 8227 |  4619 |  2082 |  2663 |  1614 |
| 0.050 |  5237 | 15818 | 8285 |  4763 |  2229 |  2745 |  1679 |
| 0.060 |  4984 | 15884 | 8316 |  4784 |  2045 |  2766 |  1700 |
| 0.070 |  4578 | 15900 | 8330 |  4845 |  2068 |  2808 |  1767 |
| 0.080 |  4840 | 15976 | 8377 |  4910 |  2159 |  2832 |  1794 |
| 0.090 |  5303 | 16036 | 8445 |  4978 |  2449 |  2889 |  1832 |
| 0.100 |  5303 | 16036 | 8445 |  4978 |  2449 |  2889 |  1832 |
| 1.000 |  5150 | 16122 | 8525 |  5070 |  2382 |  3002 |  1955 |

I’m not sure these “top X% of histogram” rows are going to be very useful.  They just keep re-affirming that edge effects matter a lot, and we really can’t have that degree of volatility in the metric.  Mastery is uniformly behind Dodge in each row, but the value fluctuates an lot more than I’m comfortable with.  That’s going to introduce noise, which makes our estimates less accurate, and that’s no good.  Luckily, the 100% row seems to match our expectations: hit and expertise far ahead, stamina and haste very close, dodge beating mastery, and parry bringing up the rear.

Let’s see how HDF affects that ordering.

pct=100.00, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  4336 |  7316 |  5227 |  4571 |  3162 |  3086 |  2607 |
| 1.6 |  4257 |  7457 |  5260 |  4464 |  2963 |  2933 |  2404 |
| 1.7 |  4178 |  7618 |  5294 |  4365 |  2783 |  2803 |  2231 |
| 1.8 |  4114 |  7824 |  5349 |  4286 |  2630 |  2701 |  2090 |
| 1.9 |  4073 |  8087 |  5431 |  4234 |  2505 |  2627 |  1978 |
| 2.0 |  4057 |  8414 |  5544 |  4207 |  2406 |  2578 |  1893 |
| 2.1 |  4065 |  8810 |  5690 |  4206 |  2330 |  2552 |  1831 |
| 2.2 |  4097 |  9279 |  5868 |  4229 |  2274 |  2545 |  1788 |
| 2.3 |  4152 |  9823 |  6080 |  4274 |  2238 |  2556 |  1764 |
| 2.4 |  4230 | 10446 |  6325 |  4338 |  2218 |  2582 |  1755 |
| 2.5 |  4330 | 11154 |  6604 |  4422 |  2213 |  2623 |  1760 |
| 2.6 |  4452 | 11949 |  6917 |  4522 |  2222 |  2677 |  1777 |
| 2.7 |  4594 | 12838 |  7265 |  4638 |  2244 |  2742 |  1806 |
| 2.8 |  4758 | 13826 |  7648 |  4769 |  2279 |  2819 |  1845 |
| 2.9 |  4943 | 14919 |  8068 |  4913 |  2325 |  2906 |  1895 |
| 3.0 |  5150 | 16122 |  8525 |  5070 |  2382 |  3002 |  1955 |
| 3.1 |  5378 | 17445 |  9021 |  5238 |  2450 |  3108 |  2023 |
| 3.2 |  5628 | 18892 |  9556 |  5416 |  2530 |  3223 |  2100 |
| 3.3 |  5901 | 20474 | 10132 |  5604 |  2620 |  3347 |  2187 |
| 3.4 |  6197 | 22197 | 10751 |  5799 |  2721 |  3478 |  2281 |
| 3.5 |  6517 | 24071 | 11414 |  6002 |  2833 |  3618 |  2385 |
| 3.6 |  6862 | 26104 | 12123 |  6210 |  2956 |  3766 |  2496 |
| 3.7 |  7232 | 28308 | 12879 |  6423 |  3090 |  3922 |  2616 |
| 3.8 |  7629 | 30691 | 13684 |  6640 |  3236 |  4086 |  2745 |
| 3.9 |  8054 | 33265 | 14540 |  6858 |  3394 |  4257 |  2881 |
| 4.0 |  8506 | 36041 | 15449 |  7077 |  3565 |  4436 |  3027 |
| 4.1 |  8989 | 39030 | 16412 |  7295 |  3748 |  4623 |  3181 |
| 4.2 |  9502 | 42245 | 17433 |  7509 |  3944 |  4817 |  3343 |
| 4.3 | 10047 | 45697 | 18512 |  7719 |  4154 |  5019 |  3514 |
| 4.4 | 10625 | 49401 | 19652 |  7923 |  4378 |  5228 |  3694 |
| 4.5 | 11238 | 53370 | 20855 |  8117 |  4616 |  5444 |  3883 |

The lower HDF values perform a little better here than they did with the last set. Even $h=2.5$ looks reasonable, though I think it undervalues dodge a little bit.  3.0 still looks fine, but by 3.3 we’re seeing a little more of a gap between stamina and haste than should probably exist.  I’d probably put the upper limit on this data set at 3.2 or 3.3, which narrows our range a little bit more.

Avoidance

Finally, let’s try an avoidance set.  This is the set we wouldn’t generally use because it fares poorly in smoothness tests.  You can see on the table below that it permits spikes up to 160% of your health while the other sets generally cap out at around 140%.  Nonetheless, it will be a good thing to check this gear set as well because it represents another extreme that the metric may have to deal with.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-2

| Set: |  Avoid |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.277 |  0.278 |  0.283 |  0.283 |  0.271 |  0.273 |  0.271 |  0.272 |
|  std |  0.138 |  0.138 |  0.141 |  0.139 |  0.136 |  0.136 |  0.137 |  0.138 |
|   S% |  0.366 |  0.367 |  0.355 |  0.359 |  0.374 |  0.366 |  0.367 |  0.367 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 51.268 | 51.107 | 52.768 | 52.609 | 49.450 | 50.602 | 49.814 | 49.760 |
|  60% | 38.659 | 37.795 | 40.442 | 40.063 | 36.801 | 37.879 | 37.169 | 37.206 |
|  70% | 26.734 | 24.499 | 28.408 | 28.149 | 25.071 | 25.590 | 25.468 | 25.477 |
|  80% | 16.868 | 14.017 | 18.432 | 18.075 | 15.631 | 14.414 | 15.962 | 15.998 |
|  90% |  8.755 |  7.228 |  9.910 |  9.537 |  7.979 |  7.960 |  8.154 |  8.225 |
| 100% |  3.687 |  3.517 |  4.398 |  4.205 |  3.312 |  3.586 |  3.453 |  3.465 |
| 110% |  1.665 |  1.526 |  2.110 |  1.949 |  1.470 |  1.573 |  1.558 |  1.556 |
| 120% |  0.791 |  0.699 |  1.045 |  0.955 |  0.696 |  0.730 |  0.728 |  0.748 |
| 130% |  0.347 |  0.259 |  0.483 |  0.452 |  0.296 |  0.330 |  0.324 |  0.316 |
| 140% |  0.130 |  0.096 |  0.186 |  0.179 |  0.101 |  0.125 |  0.116 |  0.119 |
| 150% |  0.025 |  0.030 |  0.041 |  0.039 |  0.020 |  0.025 |  0.020 |  0.024 |
| 160% |  0.002 |  0.000 |  0.006 |  0.006 |  0.003 |  0.003 |  0.002 |  0.002 |

Stamina fares very well in the avoidance set, though it’s a little hard to tell because of the coarse binning.  Emptying the top category is very strong, as are the gains in the 130% and 140% categories, though the representation in the 150% category actually goes up.  Haste lags stamina slightly, followed by dodge and parry, and mastery brings up the rear once again.  So unsurprisingly, we expect the usual hit>exp>stamina>haste>dodge>parry>mastery ordering.  Just for completeness, let’s look at the percentile table:

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 0.010 | 43283 | 77820 | 60492 | 24904 |  5891 | 17175 |  9204 |
| 0.020 | 51380 | 81516 | 63535 | 27400 | 11212 | 18623 | 10511 |
| 0.030 | 47283 | 81644 | 63646 | 27787 |  8947 | 18860 | 10753 |
| 0.040 | 52137 | 82979 | 64524 | 28396 | 11561 | 19467 | 11190 |
| 0.050 | 52137 | 82979 | 64524 | 28396 | 11561 | 19467 | 11190 |
| 0.060 | 50710 | 83411 | 64838 | 28795 |  9867 | 19706 | 11514 |
| 0.070 | 49724 | 83484 | 64878 | 28779 |  9012 | 19719 | 11556 |
| 0.080 | 50326 | 83484 | 64824 | 28896 | 10539 | 19831 | 11641 |
| 0.090 | 51530 | 83674 | 65085 | 29072 | 10647 | 19890 | 11726 |
| 0.100 | 50357 | 83695 | 65116 | 29072 |  9548 | 19884 | 11720 |
| 1.000 | 50785 | 83883 | 65288 | 29371 | 10093 | 20198 | 12017 |

Nothing too surprising here.  The lead stamina has is a little larger than I would have expected, but is probably due to the complete elimination of the top category.  The slight loss in the 150% category doesn’t seem to have hurt it too much.  Otherwise though, we get exactly the ordering we expected, and we see mastery struggling with edge effects throughout the upper regions of the table.  Every one of these tables is throwing more fuel on the fire for the percentile-limited calculation; we’ll be burning it in effigy soon enough, I think.

On to the more interesting table, where we vary HDF for the all-inclusive calculation.

pct=100.00, N=200, vary hdf

| hdf |   Stam |    Hit |    Exp |  Haste |  Mast |  Dodge | Parry |
| 1.5 |   8205 |   8966 |   6782 |   5935 |  4121 |   4408 |  3988 |
| 1.6 |   9072 |  10325 |   7746 |   6384 |  4330 |   4617 |  4090 |
| 1.7 |  10048 |  11896 |   8883 |   6908 |  4547 |   4884 |  4229 |
| 1.8 |  11178 |  13743 |  10241 |   7532 |  4785 |   5226 |  4416 |
| 1.9 |  12497 |  15921 |  11865 |   8275 |  5052 |   5656 |  4658 |
| 2.0 |  14041 |  18493 |  13805 |   9154 |  5349 |   6185 |  4958 |
| 2.1 |  15843 |  21523 |  16111 |  10183 |  5679 |   6824 |  5320 |
| 2.2 |  17940 |  25081 |  18842 |  11381 |  6042 |   7587 |  5747 |
| 2.3 |  20371 |  29245 |  22061 |  12763 |  6439 |   8487 |  6242 |
| 2.4 |  23178 |  34101 |  25839 |  14349 |  6870 |   9540 |  6810 |
| 2.5 |  26406 |  39746 |  30256 |  16159 |  7333 |  10765 |  7455 |
| 2.6 |  30107 |  46285 |  35397 |  18214 |  7828 |  12179 |  8182 |
| 2.7 |  34333 |  53834 |  41359 |  20537 |  8354 |  13805 |  8996 |
| 2.8 |  39145 |  62521 |  48249 |  23154 |  8908 |  15666 |  9903 |
| 2.9 |  44606 |  72486 |  56183 |  26089 |  9489 |  17788 | 10907 |
| 3.0 |  50785 |  83883 |  65288 |  29371 | 10093 |  20198 | 12017 |
| 3.1 |  57757 |  96879 |  75705 |  33028 | 10717 |  22925 | 13237 |
| 3.2 |  65601 | 111656 |  87584 |  37093 | 11357 |  26004 | 14574 |
| 3.3 |  74406 | 128413 | 101091 |  41597 | 12009 |  29468 | 16035 |
| 3.4 |  84262 | 147364 | 116407 |  46575 | 12666 |  33355 | 17628 |
| 3.5 |  95271 | 168741 | 133725 |  52063 | 13323 |  37705 | 19359 |
| 3.6 | 107538 | 192796 | 153256 |  58099 | 13973 |  42562 | 21237 |
| 3.7 | 121178 | 219800 | 175227 |  64721 | 14607 |  47972 | 23268 |
| 3.8 | 136313 | 250042 | 199882 |  71971 | 15216 |  53984 | 25461 |
| 3.9 | 153071 | 283837 | 227485 |  79892 | 15790 |  60651 | 27824 |
| 4.0 | 171591 | 321520 | 258318 |  88529 | 16318 |  68028 | 30364 |
| 4.1 | 192021 | 363451 | 292684 |  97927 | 16786 |  76176 | 33091 |
| 4.2 | 214517 | 410013 | 330906 | 108135 | 17181 |  85156 | 36013 |
| 4.3 | 239244 | 461619 | 373333 | 119204 | 17488 |  95037 | 39139 |
| 4.4 | 266378 | 518705 | 420333 | 131183 | 17688 | 105889 | 42476 |
| 4.5 | 296105 | 581740 | 472301 | 144128 | 17764 | 117788 | 46036 |

This table raises our lower bound a little bit.  Even 2.7 is problematic here, as mastery and parry are still neck and neck despite a strong lead by parry in the data.  I’d go as far as to say that 2.9 is still off in the relative weighting of mastery and parry, though it’s passable.  On the upper end, we have a nearly 2:1 lead of stamina over haste by the time we reach $h=3.3$, which seems excessive.  I like the 5:3 ratio that we get around 3.0, even though I think that still inflates stamina a bit.  But 3.0 seems like the best compromise on this table.

Gear sets and TMI

Now let’s shift gears and take a look at entire gear sets rather than stat weights.  While stat weights are probably the most common way people will use the metric, it’s also important that it holds integrity for entire gear sets.   In fact, that’s how we usually perform our qualitative analysis – pick a bunch of representative gear sets and compare the histograms.

This is necessary because of the many interactions different stats have with one another.  We’ve already seen that mastery’s value depends on the rest of your stats – in some gear sets it rises ahead of avoidance, in others it trails avoidance.  So it’s not unreasonable to expect that you might get different stat weights in highly-disparate gear sets.  But by comparing those gear sets directly, we can tell which one does a better job of smoothing.

Just to reiterate, here are the stats of each gear set:

|   Set: |   Str |   Sta | Parry | Dodge | Mastery |  Hit |  Exp | Haste |
|   C/Ha | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 | 12000 |
|   C/St | 15000 | 34000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|   C/Sg | 15000 | 31000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|  C/Shm | 15000 | 31000 |  1500 |  1500 |    4750 | 2550 | 5100 |  4750 |
|   C/Ma | 15000 | 28000 |  1500 |  1500 |   13500 | 2550 | 5100 |     0 |
|   C/Av | 15000 | 28000 |  7500 |  7500 |    1500 | 2550 | 5100 |     0 |
|  C/Bal | 15000 | 28000 |  4125 |  4125 |    4125 | 2550 | 5100 |  4125 |
|   C/HM | 15000 | 28000 |  1500 |  1500 |    6750 | 2550 | 5100 |  6750 |
|     Ha | 15000 | 28000 |  1500 |  1500 |    1500 |  500 |  500 | 18650 |
|  Avoid | 15000 | 28000 | 10825 | 10825 |    1500 |  500 |  500 |     0 |
| Av/Mas | 15000 | 28000 |  7717 |  7717 |    7716 |  500 |  500 |     0 |
| Mas/Av | 15000 | 28000 |  4000 |  4000 |   15150 |  500 |  500 |     0 |
|   Ha/h | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 |  500 | 16600 |
|  Ha/he | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 2550 | 14550 |
|  C/Str | 27600 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 |     0 |

And here’s the data set we generate with them.  This table isn’t transposed because it’s unwieldy no matter which way we format it, and it’ll be easier to read if we keep things consistent with earlier tables.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set smooth-10000-0

| Set: |   C/Ha |   C/St |   C/Sg |  C/Shm |   C/Ma |   C/Av |  C/Bal |   C/HM |     Ha |  Avoid | Av/Mas | Mas/Av |   Ha/h |  Ha/he |  C/Str |
| mean |  0.262 |  0.285 |  0.286 |  0.297 |  0.278 |  0.280 |  0.274 |  0.259 |  0.269 |  0.278 |  0.283 |  0.287 |  0.263 |  0.264 |  0.270 |
|  std |  0.103 |  0.106 |  0.106 |  0.109 |  0.107 |  0.123 |  0.114 |  0.105 |  0.110 |  0.138 |  0.132 |  0.124 |  0.108 |  0.108 |  0.127 |
|   S% |  0.522 |  0.483 |  0.482 |  0.455 |  0.410 |  0.420 |  0.452 |  0.472 |  0.499 |  0.366 |  0.362 |  0.358 |  0.519 |  0.520 |  0.419 |
|   HP |   755k |   876k |   816k |   816k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.504 |  2.331 |  2.331 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|  50% | 46.138 | 41.369 | 48.197 | 53.342 | 53.604 | 52.688 | 51.264 | 46.218 | 48.634 | 51.389 | 53.786 | 54.764 | 46.584 | 46.965 | 47.688 |
|  60% | 29.956 | 22.394 | 29.557 | 31.895 | 35.512 | 38.205 | 35.731 | 29.593 | 33.093 | 38.767 | 40.584 | 38.962 | 31.072 | 31.242 | 33.733 |
|  70% | 16.220 | 10.126 | 15.830 | 17.885 | 19.880 | 24.647 | 21.671 | 14.841 | 19.265 | 26.914 | 25.468 | 25.584 | 17.745 | 17.761 | 22.163 |
|  80% |  7.390 |  3.221 |  5.890 |  6.899 |  9.238 | 14.172 |  9.097 |  7.025 | 10.267 | 17.109 | 15.024 | 13.988 |  9.010 |  8.805 | 13.043 |
|  90% |  2.529 |  0.773 |  1.954 |  2.387 |  3.480 |  6.556 |  3.996 |  3.120 |  4.449 |  8.868 |  8.461 |  7.155 |  3.517 |  3.430 |  6.549 |
| 100% |  0.635 |  0.141 |  0.545 |  0.779 |  1.579 |  2.012 |  1.434 |  0.848 |  1.463 |  3.752 |  3.719 |  3.684 |  0.992 |  0.972 |  2.500 |
| 110% |  0.104 |  0.025 |  0.101 |  0.235 |  0.506 |  0.713 |  0.398 |  0.194 |  0.483 |  1.707 |  1.654 |  1.706 |  0.257 |  0.234 |  0.882 |
| 120% |  0.023 |  0.001 |  0.010 |  0.035 |  0.169 |  0.280 |  0.163 |  0.067 |  0.168 |  0.824 |  0.883 |  0.806 |  0.084 |  0.074 |  0.282 |
| 130% |  0.002 |  0.000 |  0.000 |  0.005 |  0.029 |  0.050 |  0.029 |  0.008 |  0.043 |  0.373 |  0.418 |  0.341 |  0.017 |  0.010 |  0.066 |
| 140% |  0.000 |  0.000 |  0.000 |  0.000 |  0.003 |  0.005 |  0.003 |  0.000 |  0.010 |  0.131 |  0.136 |  0.111 |  0.003 |  0.001 |  0.009 |
| 150% |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.002 |  0.028 |  0.024 |  0.013 |  0.000 |  0.000 |  0.001 |
| 160% |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.003 |  0.005 |  0.002 |  0.000 |  0.000 |  0.000 |

Not too much we didn’t expect here.  The Control/Stamina sets beat out Control/Haste.  Control/Shm and Control/HM both sacrifice haste for mastery, and thus give slightly lower performance.  Then we have Control/Balance and Control/Mastery lagging a fair bit, with Control/Avoidance and Control/Strength bringing up the rear of the control subset.

We also have a few sets that focus on Haste and forego hit/exp caps.  Pure haste (Ha) is a little better than Control/Avoidance, while the hit-capped only (Ha/h) and soft-expertise capped (Ha/he) sets fall somewhere between C/HM and C/Bal.

Finally, the avoidance sets perform poorly as usual.  Out of the three, we see the same trends we saw with the avoidance gear set stat weights: shifting a little value into mastery doesn’t tend to make a lot of difference (Av/Mas), but shifting to a heavy mastery focus does (Mas/Av).

So in general, we expect the following rough ordering, where near-ties are combined in parentheses:

C/St > C/Sg > C/Ha > ( C/Shm , C/HM ) > Ha/he > Ha/h > ( C/Bal ,  C/Ma ) > Ha > C/Av > C/Str > Mas/Av
> ( Av/Mas , Av )

Let’s see if those predictions hold up in the percentile table.  What I’m showing here is raw TMI score, not a relative (i.e. differential) comparison.  So the lower the number, the smoother the damage intake.  I’m also showing only a few partial percents, partly for readability and partly because by now we know they’re not that useful anyway.  Note that I’ve transposed this from the usual format to make it easier to read.

hdf=3.00, N=200, vary pct

|  pct-> |     1% |     5% |    10% |   100% |
|   C/Ha |   8038 |  12804 |  15018 |  18332 |
|   C/St |   1575 |   3581 |   4846 |   7895 |
|   C/Sg |   6702 |  10846 |  12398 |  16102 |
|  C/Shm |  12242 |  16914 |  18980 |  22631 |
|   C/Ma |  31660 |  36092 |  37874 |  41994 |
|   C/Av |  49031 |  57363 |  60248 |  63949 |
|  C/Bal |  28342 |  34727 |  36801 |  40468 |
|   C/HM |  13505 |  18234 |  19490 |  23096 |
|     Ha |  37936 |  44175 |  46514 |  49835 |
|  Avoid | 216443 | 225819 | 228283 | 231586 |
| Av/Mas | 214205 | 223790 | 225309 | 229068 |
| Mas/Av | 176964 | 184269 | 186410 | 190308 |
|   Ha/h |  19947 |  25642 |  27870 |  31126 |
|  Ha/he |  16537 |  22207 |  24496 |  27795 |
|  C/Str |  53352 |  60420 |  62517 |  66023 |

Just to reiterate: I’m going to be looking at the last column, primarily.  C/St is the clear leader by a pretty big margin, just as it is on the data table.  C/Sg and C/Ha come in next, fairly close to one another (but with the stamina gear set still holding a slight lead).  The next two sets are C/Shm and C/HM, again clustered together.  We then have another medium-sized gap before reaching Ha/he and Ha/h, and a slightly larger gap before we reach C/Bal and C/Ma.

As we round out the bottom, our predictions look pretty stable.  C/Av and C/Str lag the previous sets by a large chunk, and then we have a huge jump (nearly a factor of 4) from C/Str to the closest avoidance-based set, which is Mas/Av.  Bringing up the rear, Av/Mas and Avoid are in a dead heat for last place in a race that’s nearly too close to call, mimicking the qualitative assessment.

As a final consideration, let’s look at how these results vary with HDF:

pct=100.00, N=200, vary hdf

|  hdf-> |    2.5 |    2.6 |    2.7 |    2.8 |    2.9 |    3.0 |    3.1 |    3.2 |    3.3 |    3.4 |    3.5 |
|   C/Ha |  22029 |  20960 |  20086 |  19374 |  18796 |  18332 |  17965 |  17681 |  17471 |  17326 |  17239 |
|   C/St |  11603 |  10592 |   9745 |   9029 |   8418 |   7895 |   7442 |   7050 |   6707 |   6406 |   6142 |
|   C/Sg |  20089 |  18967 |  18040 |  17271 |  16632 |  16102 |  15665 |  15305 |  15013 |  14778 |  14595 |
|  C/Shm |  25431 |  24500 |  23787 |  23256 |  22879 |  22631 |  22497 |  22461 |  22513 |  22644 |  22845 |
|   C/Ma |  39112 |  39167 |  39513 |  40115 |  40948 |  41994 |  43239 |  44673 |  46287 |  48077 |  50040 |
|   C/Av |  55920 |  56821 |  58097 |  59722 |  61677 |  63949 |  66529 |  69415 |  72604 |  76099 |  79901 |
|  C/Bal |  38329 |  38228 |  38419 |  38870 |  39559 |  40468 |  41584 |  42898 |  44405 |  46101 |  47984 |
|   C/HM |  24980 |  24249 |  23725 |  23374 |  23171 |  23096 |  23134 |  23274 |  23506 |  23822 |  24217 |
|     Ha |  42774 |  43458 |  44522 |  45945 |  47717 |  49835 |  52302 |  55122 |  58308 |  61873 |  65833 |
|  Avoid | 138960 | 152806 | 168802 | 187126 | 207980 | 231586 | 258195 | 288075 | 321520 | 358849 | 400403 |
| Av/Mas | 136285 | 150110 | 166106 | 184456 | 205366 | 229068 | 255817 | 285896 | 319612 | 357297 | 399312 |
| Mas/Av | 119574 | 130257 | 142561 | 156587 | 172457 | 190308 | 210295 | 232586 | 257366 | 284835 | 315207 |
|   Ha/h |  30841 |  30433 |  30283 |  30362 |  30647 |  31126 |  31788 |  32627 |  33639 |  34824 |  36184 |
|  Ha/he |  29173 |  28501 |  28056 |  27806 |  27725 |  27795 |  28001 |  28332 |  28780 |  29339 |  30003 |
|  C/Str |  57433 |  58491 |  59901 |  61636 |  63681 |  66023 |  68655 |  71573 |  74778 |  78271 |  82057 |

There isn’t a lot of variation here to consider, as the relative ordering of entire gear sets seems pretty stable with HDF.  The main thing that I notice is that the gap between C/Sg and C/Ha grows (in a relative sense) as $h$ increases.  This isn’t a huge surprise, since stamina performs a slight “shift” of the entire histogram to the left, and will thus benefit more from a high HDF.  What this tells us is that the higher our HDF, the more valuable stamina becomes. We also have to be careful that we don’t over-value stamina by overshooting our HDF.

Repeatability

Finally, there’s one more test we want to run.  It’s all well and good to have one set of data, but how reliable is the result?  If one simulation of 10k-minute duration varies significantly from the next, it will be hard to trust the number that the metric spits out.

To test this, we’ll take the C/Ha gear set and run it through the simulation 20 times.  Then we can use the metric to analyze each trial and see how much variation we get.  Again, apologies for the inconveniently-sized table, but here’s all the raw data:

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-6

| Set: |    #1  |    #2  |    #3  |    #4  |    #5  |    #6  |    #7  |    #8  |    #9  |    #10 |    #11 |    #12 |    #13 |    #14 |    #15 |    #16 |    #17 |    #18 |    #19 |    #20 |
| mean |  0.261 |  0.261 |  0.261 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.262 |
|  std |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.104 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |
|   S% |  0.523 |  0.522 |  0.522 |  0.523 |  0.523 |  0.522 |  0.522 |  0.523 |  0.523 |  0.522 |  0.522 |  0.522 |  0.523 |  0.522 |  0.523 |  0.523 |  0.523 |  0.522 |  0.523 |  0.522 |
|   HP |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|  50% | 45.745 | 46.052 | 45.868 | 46.034 | 45.850 | 46.032 | 46.098 | 45.972 | 45.910 | 46.047 | 46.304 | 45.975 | 45.798 | 45.858 | 45.905 | 46.151 | 45.889 | 46.049 | 45.925 | 46.084 |
|  60% | 29.662 | 29.905 | 29.794 | 29.974 | 29.726 | 29.819 | 29.994 | 29.904 | 29.822 | 29.923 | 30.122 | 29.874 | 29.657 | 29.663 | 29.776 | 30.094 | 29.805 | 29.927 | 29.791 | 29.966 |
|  70% | 15.974 | 16.225 | 16.162 | 16.226 | 16.026 | 16.180 | 16.231 | 16.312 | 16.124 | 16.177 | 16.312 | 16.058 | 15.975 | 15.999 | 16.103 | 16.291 | 16.122 | 16.190 | 16.131 | 16.288 |
|  80% |  7.133 |  7.372 |  7.297 |  7.282 |  7.309 |  7.353 |  7.356 |  7.379 |  7.320 |  7.312 |  7.365 |  7.242 |  7.188 |  7.236 |  7.301 |  7.369 |  7.289 |  7.348 |  7.279 |  7.334 |
|  90% |  2.455 |  2.495 |  2.502 |  2.499 |  2.493 |  2.523 |  2.552 |  2.574 |  2.512 |  2.502 |  2.530 |  2.469 |  2.460 |  2.470 |  2.472 |  2.533 |  2.482 |  2.517 |  2.490 |  2.535 |
| 100% |  0.600 |  0.618 |  0.623 |  0.604 |  0.615 |  0.628 |  0.645 |  0.645 |  0.622 |  0.608 |  0.628 |  0.625 |  0.589 |  0.588 |  0.607 |  0.613 |  0.618 |  0.626 |  0.621 |  0.625 |
| 110% |  0.095 |  0.096 |  0.096 |  0.101 |  0.095 |  0.102 |  0.106 |  0.103 |  0.099 |  0.097 |  0.088 |  0.100 |  0.096 |  0.096 |  0.092 |  0.112 |  0.109 |  0.101 |  0.102 |  0.103 |
| 120% |  0.023 |  0.021 |  0.024 |  0.023 |  0.024 |  0.024 |  0.025 |  0.021 |  0.021 |  0.023 |  0.021 |  0.025 |  0.019 |  0.023 |  0.020 |  0.027 |  0.026 |  0.023 |  0.026 |  0.026 |
| 130% |  0.001 |  0.002 |  0.001 |  0.001 |  0.001 |  0.001 |  0.001 |  0.001 |  0.000 |  0.002 |  0.001 |  0.003 |  0.002 |  0.002 |  0.001 |  0.002 |  0.001 |  0.002 |  0.003 |  0.002 |

As I suggested earlier, we’re seeing changes on the order of +/- 0.001 to 0.002 in the data, which is just simulation noise.  Sometimes you get lucky and don’t see a dangerous string, sometimes you get unlucky and see several.  We want to make sure that the metric isn’t too sensitive to this, but that it’s still sensitive to small changes that are meaningful (like small changes in gear set).

Let’s look at the percentile breakdown for a moment and calculate the mean, standard deviation, and standard deviation of the mean for each trial.

hdf=3.00, N=200, vary pct

|    trial |     1% |     5% |    10% |   100% |
|       #1 |   7437 |  12922 |  14369 |  17573 |
|       #2 |   7654 |  13282 |  14735 |  17955 |
|       #3 |   7668 |  13278 |  14728 |  17938 |
|       #4 |   7578 |  13171 |  14628 |  17852 |
|       #5 |   7661 |  13273 |  14710 |  17902 |
|       #6 |   7935 |  13584 |  15041 |  18249 |
|       #7 |   8116 |  13736 |  15187 |  18417 |
|       #8 |   7922 |  13569 |  15033 |  18251 |
|       #9 |   7740 |  13313 |  14759 |  17968 |
|      #10 |   7858 |  13447 |  14901 |  18120 |
|      #11 |   7553 |  13197 |  14653 |  17892 |
|      #12 |   8642 |  14200 |  15657 |  18867 |
|      #13 |   7408 |  12945 |  14380 |  17583 |
|      #14 |   7491 |  13068 |  14521 |  17719 |
|      #15 |   7370 |  12949 |  14394 |  17603 |
|      #16 |   8135 |  13766 |  15243 |  18471 |
|      #17 |   7898 |  13458 |  14932 |  18132 |
|      #18 |   7932 |  13549 |  15005 |  18217 |
|      #19 |   8152 |  13707 |  15168 |  18372 |
|      #20 |   8023 |  13651 |  15105 |  18335 |
|     mean |   7809 |  13403 |  14857 |  18071 |
|      std | 315.72 | 326.88 | 332.50 | 336.03 |
| std_mean |  15.79 |  16.34 |  16.63 |  16.80 |
|  pct_var |   0.20 |   0.12 |   0.11 |   0.09 |

This looks pretty good.  The standard deviation is pretty reasonable at around 2% of the mean value once we consider more than a few percent of the data.  The standard deviation of the mean is a factor of 20 smaller (by definition), which puts it at about 0.1% of the mean value (this is the pct_var column).

In other words, if we do 20 runs of 10k minutes each, we expect to get a mean value that is 18071 +/- 16.8.  That’s really good news.  It means that we can get meaningful, reliable results out of this metric without having to simulate for ages.  Keep in mind that 200k minutes of combat is still less than what Simcraft can do; a standard high-precision Simcraft run is 50k iterations of ~450 seconds of combat each, which is 375k minutes of combat.

You might ask how HDF choice affects the repeatability.  Ask and ye shall receive.  (Actually, you’re getting it whether you asked for it or not).

pct=100.00, N=200, vary hdf

| hdf |  mean |     std | std_mean | pct_var |
| 1.5 | 71712 |  237.18 |    11.86 |    0.02 |
| 1.6 | 58274 |  227.64 |    11.38 |    0.02 |
| 1.7 | 48710 |  219.53 |    10.98 |    0.02 |
| 1.8 | 41699 |  213.32 |    10.67 |    0.03 |
| 1.9 | 36430 |  209.14 |    10.46 |    0.03 |
| 2.0 | 32388 |  206.99 |    10.35 |    0.03 |
| 2.1 | 29234 |  206.90 |    10.34 |    0.04 |
| 2.2 | 26736 |  208.94 |    10.45 |    0.04 |
| 2.3 | 24736 |  213.26 |    10.66 |    0.04 |
| 2.4 | 23119 |  220.08 |    11.00 |    0.05 |
| 2.5 | 21804 |  229.70 |    11.49 |    0.05 |
| 2.6 | 20730 |  242.52 |    12.13 |    0.06 |
| 2.7 | 19850 |  259.00 |    12.95 |    0.07 |
| 2.8 | 19130 |  279.67 |    13.98 |    0.07 |
| 2.9 | 18544 |  305.13 |    15.26 |    0.08 |
| 3.0 | 18071 |  336.03 |    16.80 |    0.09 |
| 3.1 | 17693 |  373.07 |    18.65 |    0.11 |
| 3.2 | 17399 |  417.01 |    20.85 |    0.12 |
| 3.3 | 17178 |  468.69 |    23.43 |    0.14 |
| 3.4 | 17021 |  528.96 |    26.45 |    0.16 |
| 3.5 | 16921 |  598.80 |    29.94 |    0.18 |
| 3.6 | 16874 |  679.24 |    33.96 |    0.20 |
| 3.7 | 16874 |  771.38 |    38.57 |    0.23 |
| 3.8 | 16918 |  876.46 |    43.82 |    0.26 |
| 3.9 | 17003 |  995.77 |    49.79 |    0.29 |
| 4.0 | 17126 | 1130.74 |    56.54 |    0.33 |
| 4.1 | 17287 | 1282.90 |    64.14 |    0.37 |
| 4.2 | 17482 | 1453.89 |    72.69 |    0.42 |
| 4.3 | 17711 | 1645.50 |    82.28 |    0.46 |
| 4.4 | 17973 | 1859.63 |    92.98 |    0.52 |
| 4.5 | 18268 | 2098.31 |   104.92 |    0.57 |

Perhaps predictably, reducing the HDF also reduces the variation we get in the metric.  I say “predictably” because the HDF is essentially representing “how much we value outliers.”  We already know that we’re susceptible to noise at high HDF because it exaggerates stray high-lying spikes, and this is exactly the same principle at work. A higher HDF will cause those small statistical fluctuations at the very top of the distribution to be more important, and thus increase the variation in our mean value.

This is another reason we want to keep our HDF as low as possible while still maintaining good discrimination between different data sets.  The lower the HDF, the more quickly the metric will converge and the less integration time we need to get a suitable estimate.

Conclusions

So far in this post I’ve mostly been repeating the same things over and over.  That’s to be expected, because the point of this post wasn’t to discover so much as to validate the metric.  The ideal case was that it performed exactly the way we wanted and mirrored our qualitative results.  It more or less did that with few surprises.

The other purpose was to narrow down our value for $h$.  Based on its performance in various gear sets, I would bound the “acceptable” range between $h=2.9$ and $h=3.3$.  Pretty much any value in there gives us reasonable results, and the differences are fairly minor.  The high end of that range tends to overvalue stamina and increases our sensitivity to noise, while the low end tends to undervalue avoidance and reduce our discrimination threshold for meaningful differences.

But our choice here is fairly arbitrary.  The metric will “work” with any value in this range.  As I said earlier, the actual numeric values are a bit arbitrary, what’s important is that they give us the correct relative relationships between gear sets and stats.  We can’t definitively say that “haste is 1.534 times better than mastery,” so it’s not critical that we pick HDF that precisely.

In fact, the reverse is more worrisome – that someone will see an HDF of 3.2, get results that suggest that haste is exactly 1.534 times better than mastery, and assume that it’s gospel.  Obviously there will be some exact relationship given by the metric, because that’s how numbers work.  But it really shouldn’t be interpreted as if we’re getting exact relationships, because that’s not how the index was conceptualized or defined.

Likewise, you shouldn’t look at two gear sets with TMIs of 10k and 250k and decide that the latter is 25x worse than the former, or that you’re 25x more likely to die while wearing it.  That’s not what the index tells you.  It tells you that one gear set is a lot worse than the other, of course, but you can’t extrapolate a likelihood to die out of it.

Going back to the Dow Jones Industrial Average (DJIA) example: if the DJIA goes down 10 points one day, it does not mean that the entire market went down uniformly by a proportional amount.  Some stocks may have done well, others not.  The DJIA just gives you a general sense of “how the market is doing.”  You will generally be able to say that a day when the Dow goes up is better than a day when the Dow goes down, and you might even be able to make rough estimates of the magnitude from that (i.e. going up 20 points vs going down 10 points).  But it’s important to keep in mind that the Dow is an arbitrary indicator.  You might get different statistics if you chose a completely different 30-stock basis set, but if you only swapped one stock out it would probably be fairly similar.

In our case, we get very different results for $h=2$ than we do for $h=4.5$, but fairly similar results for anything in the $h=2.9$ to $h=3.3$ range we’re considering.  So it’s up to us to make a final decision about “which stocks to pick” for our index.

I’m going to make an arbitrary decision and go with 3.0 for a few reasons.  First, it keeps the value towards the low end of the range to help combat statistical noise and artificial stamina inflation.  Second, we’ve already got loads of data presented here for $h=3.0$, which is convenient.  I also think that by choosing an integer, we make it a little more clear that the choice is arbitrary rather than some sort of exact, precise value.  Most of all, it seems to be a good compromise – I rarely saw values in the 3.0 line of an HDF table that disagreed with the intuition I got from the data.

That’s it for today.  Now that we’ve finished deciding on an HDF, the next step will be to clean up the data representation with a normalization scheme.  That’s what we’ll tackle in the next blog post.  We’ll also show why you can’t compare TMI values generated by different bosses.

## The Making of a Metric: Part 1

Simulationcraft now has full support for protection paladin mechanics, and hopefully in a week or two I’ll get around to writing a how-to blog post on using it and interpreting the data.  Once I write a few batch files, it should produce all of the DPS results I could generate from the MATLAB FSM code and more.

What it doesn’t have yet is a good smoothness metric with which we can assess our survivability. Of course, that’s not really Simcraft’s fault, because a good smoothness metric doesn’t exist yet.

So if we want a metric, we have to build it ourselves.

Up until now, we’ve looked at data and made qualitative assessments about the results to come up with statements like “X smooths better than Y.”  But now we want to  quantify that thought, which is a lot more difficult.  We want a numerical estimate of how much better X is than Y.  And to do that, we need to get a little introspective and think about what we’re doing when we make those qualitative assessments.  We need to analyze our process and figure out how to translate it into numbers.

Data

So let’s start with some data.  Below are the results of a 10k-minute sim like I usually show on the blog.  We’ll use this sample data set for all of the following analysis. The gear sets are variants on the Control/Haste setup – the first is just C/Ha, followed by sets where I add 1000 of a given stat.  In the case of hit and expertise, I subtract 1000 since we start at the cap. We’ll use a boss that swings for 350k after mitigation every 1.5 seconds, the standard SH1 finisher priority, back-calculated Seal of Insight with no overhealing apart from inherent, and Sacred Shield enabled.

Here are the gear sets:

|    Set: |  C/Ha |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
|     Str | 15000 | 15000 | 15000 | 15000 | 15000 | 15000 | 15000 | 15000 |
|     Sta | 28000 | 29000 | 28000 | 28000 | 28000 | 28000 | 28000 | 28000 |
|   Parry |  1500 |  1500 |  1500 |  1500 |  1500 |  1500 |  1500 |  2500 |
|   Dodge |  1500 |  1500 |  1500 |  1500 |  1500 |  1500 |  2500 |  1500 |
| Mastery |  1500 |  1500 |  1500 |  1500 |  1500 |  2500 |  1500 |  1500 |
|     Hit |  2550 |  2550 |  1550 |  2550 |  2550 |  2550 |  2550 |  2550 |
|     Exp |  5100 |  5100 |  5100 |  4100 |  5100 |  5100 |  5100 |  5100 |
|   Haste | 12000 | 12000 | 12000 | 12000 | 13000 | 12000 | 12000 | 12000 |

And here are the simulation results:

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-3

| Set: |   C/Ha |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.261 |  0.261 |  0.270 |  0.269 |  0.255 |  0.255 |  0.255 |  0.255 |
|  std |  0.103 |  0.103 |  0.106 |  0.106 |  0.102 |  0.102 |  0.104 |  0.104 |
|   S% |  0.522 |  0.522 |  0.507 |  0.513 |  0.531 |  0.522 |  0.523 |  0.523 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 46.002 | 45.122 | 48.989 | 48.683 | 43.896 | 44.417 | 43.900 | 43.963 |
|  60% | 29.901 | 27.764 | 32.974 | 32.571 | 27.952 | 27.702 | 28.195 | 28.208 |
|  70% | 16.203 | 15.421 | 18.813 | 18.474 | 14.560 | 15.093 | 14.995 | 15.120 |
|  80% |  7.291 |  6.228 |  9.267 |  9.050 |  6.320 |  6.214 |  6.705 |  6.811 |
|  90% |  2.526 |  1.744 |  3.623 |  3.422 |  2.056 |  2.137 |  2.264 |  2.311 |
| 100% |  0.617 |  0.394 |  1.046 |  0.966 |  0.443 |  0.571 |  0.543 |  0.544 |
| 110% |  0.103 |  0.060 |  0.232 |  0.196 |  0.066 |  0.080 |  0.088 |  0.088 |
| 120% |  0.024 |  0.013 |  0.070 |  0.055 |  0.016 |  0.018 |  0.023 |  0.021 |
| 130% |  0.002 |  0.001 |  0.009 |  0.007 |  0.001 |  0.001 |  0.002 |  0.001 |
| 140% |  0.000 |  0.000 |  0.002 |  0.001 |  0.000 |  0.000 |  0.001 |  0.001 |

For now, I’ve narrowed our focus to 4-attack moving averages.  In theory this will be equally applicable to any string size, but there’s little point in providing data that we’re not going to use at the moment.  And for Simcraft, we’re going to have to choose a default time window for our moving average.  The 3- and 4-attack moving averages are the ones we focus on the most, and 4 attacks is closest to the time window I had in mind (5-6 seconds).

This is some Inception-level shit right here…

Now, let’s think about how we analyze this data.  Normally we look at the top few categories and draw qualitative conclusions from that. For example, adding 1000 haste vs. 1000 mastery, we see that haste comes in lower (or equal) in spike representation for all rows above 90% player health.  On the other hand, we pay less attention to rows that contain a large percentage of attacks across the board, because those are very likely to happen, and reducing the amount is rarely meaningful.  If something happens 8% of the time rather than 9%, that’s not a huge change because it’s still going to happen a lot during an encounter, so you need to plan for it happening anyway.

Within those top few categories that we consider, we tend to put heavier emphasis on the larger spikes than the smaller ones.  If we can significantly reduce spikes that are 130% of our health, then that’s perceived as a lot more important than an equal reduction in spikes that are 110% of our health, especially if there’s still a sizable percentage of those events.

So according to this data, we would conclude that haste is better than mastery, though not by a huge amount. Dodge and parry are both worse than haste, but stamina is a little better. It’s hard to say how hit/exp fare since we subtracted 1000 points instead of adding 1000 points, so ignore them for now.

Analyzing the analysis

Our qualitative assessment primarily looked at two factors:

Spike magnitude – A 130% spike is more important than a 120%, than a 110%, and so on. We mentally assigned more importance to the largest spikes than the smaller ones.  This has to be accounted for in our numerical analysis.

Spike frequency – This one is more complicated, because it’s not as straightforward as “bigger is worse.”  We care about how frequently things happen, and we care about changes in that frequency.  But not all changes are created equal.  Some examples:

• If we have 5%-10% representation in a category, those are going to happen unless we eliminate them. Going from 7% to 6% (as we do in the 80% spike category when we add 1000 stamina to C/Ha) isn’t really all that meaningful a change, and shouldn’t be worth a whole lot.
• But a representation of 0.002% isn’t that likely at all, and may not even be worth worrying about. Going from 0.002% to 0.001% is probably not that meaningful either even though you’re halving the number of spikes, because those spikes weren’t very likely in the first place.
• Reducing a 1% chance to 0.1% would be a meaningful change, because that’s a pretty noticeable reduction from a non-trivial amount.  You’re taking something that was very likely and making it fairly unlikely.  Similarly for 0.1% to 0.01% – something unlikely to something really unlikely.
• Going from 0.01% to 0.001% is the same change (a factor of 10), but maybe not as important because it wasn’t very likely to begin with.  Going from 0.001% to 0.0001% is almost irrelevant, because both are so unlikely.
• That said, there’s always some comfort in the certainty that an event can’t happen, so there’s almost always some (admittedly nebulous) value in reducing a representation to 0.000%.  But it’s obviously more valuable when you reduce a non-trivial chance into 0%.

It’s easier to describe how to do this with some pictures. Below is the overall spike histogram for the haste and mastery gear sets. This is exactly the same data that’s in the table, just in bar plot format, and instead of using bins that are 10% of your health wide, I’m using 2% wide bins.  The x-axis is the percentage of your health (expressed in decimal form, so 1.00 = 100%) and the y-axis is the number of events. The distribution is roughly centered around 50% health, with a large spike at 0% due to 4-attack avoidance or full absorb strings. This is pretty standard for these sorts of histograms.

Histogram of the 4-attack damage string data.

However, we don’t look at the whole histogram (in fact, the table only ever shows the top half of it). We look only at the highest-damage parts, which you can’t really see on that plot because the bars are tiny compared to the bulk of events in the middle. So in the plot below I’ve zoomed in on the very top end of the distribution. This figure shows the top 5% of all events – i.e., the 5% of events that have the highest damage value. Relating this to the table, it’s cutting off every row that has a number greater than 5.000 in the C/Ha column (around 82% player health is the cutoff).

A close-up of the top 5% of the data.

We see that haste and mastery are pretty similar here, which isn’t surprising since their data isn’t that different. Haste is a little better, but it’s easier to see that on the table than in this plot because the table uses a coarser binning. Nonetheless, we want to use this plot (or rather, the data it’s showing) to generate a quantitative metric.

The first thought is to subtract the two bar plots from one another. In other words, do something like this:

Histogram showing the difference of the haste and mastery histograms.

We could just sum all of the bins and get a number that represents the difference between haste and mastery. However, that’s ignoring the fact that these events are not equal: the events at 1.3 (130% of your health) are much, much worse than the ones at 0.85 (85% of your health). So what we really want to do is apply a weight function. Basically, we multiply each bar by a number representing how important that bin is, and then perform the sum. You may be familiar with the term “weighted average,” which is exactly the operation we’re performing.

For a simple example, consider the table below.  The first two data columns show the representations for the +1000 haste and +1000 mastery gear sets.  The third data column shows the difference between these values, just like we’re showing in the differential histogram above.  The next column is the weight factor, which represents how much we care about a certain category.  The final column is the product of the difference and the weight factor, which gives us a numerical representation of how much “value” we’ve gained or lost in that spike category.  And if we sum that column we get the weighted average, which is an overall “score” that tells us how much better or worse haste performs than mastery at smoothing.

Percentile
Haste Set
Mastery Set
Diff
Weight
Weighted
Value
80% 6.320 6.214 0.106 0.25 0.0265
90% 2.056 2.137 -0.081 0.5 -0.0405
100% 0.443 0.571 -0.128 1.0 -0.128
110% 0.066 0.08 -0.014 2.0 -0.028
120% 0.016 0.018 -0.002 4.0 -0.008
130% 0.001 0.001 0 8.0 0
140% 0.000 0.000 0 16.0 0
Sum: -0.178

In practice we’d do things slightly differently to get scale factors.  We would start the same way, by calculating the histograms for a baseline (C/Ha) and for new configurations with 1000 of each stat added.  But then we would subtract all of the +1000 sets from the baseline (rather than from each other), and then perform the same weighted average sum to get the scale factors describing how well each stat “smoothed” our damage intake.

Of course, we’ve glossed over the most important part of the problem: what weight function do we use?  This is a critical consideration, because the weight function is everything.  Get it right and you have a robust metric that works well; get it wrong and you have garbage.  This is where our breakdown of the factors we used in the qualitative assessment come into play.  We want to weight the higher spike categories more heavily than the lower spike categories, and we want larger changes to be more valuable than smaller changes.

There are lots of functions we could choose – a simple linear function, the Fermi function we explored when modeling Seal of Insight, and dozens of others. But what felt natural to me was an exponential function. For example, $w(x) = e^{a(x-1)}$, where $a$ is some constant that determines how quickly the function changes and $x$ is the spike size (again, in decimal form, so 100% of your health is 1.0, 90% of your health is 0.9, and so on). For those who aren’t familiar with what an exponential function looks like, here it is:

Exponential weight function with $a=10\ln(2)$.

So for example, the bin corresponding to 100% of your health is worth exactly $w(1)=e^{a(1-1)}=e^{0}=1$. The bin corresponding to 90% of your health is worth $w(0.9)=e^{a(-0.1)}=e^{-a/10}$. This form is a bit unwieldy because it’s hard to translate between the constant $a$ and the behavior of the function.  We know that a larger $a$ will make the function steeper and increase the value change between one bin and the next, and that a smaller $a$ will reduce that value change.  But it’s not obvious from looking at it what happens to the relative valuation of one spike size to another with an arbitrary change in $a$.

To make this more intuitive, let’s make a substitution.  Let $a=10 \ln(h)$. That makes the equation $w(x)=e^{10 \ln(h) (x-1)} = h^{10(x-1)}$. Why does this make it easier?  Well, consider what happens if we evaluate $w(x)$ for increments of 10% of your health, as I’ve done on the table below:

 50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 1/32 1/16 1/8 1/4 1/2 1 2 4 8 16

In other words, for every 10% of your health you get a factor of $h$.  A spike that’s 10% larger is $h$ times more important, and a spike that’s 10% smaller is $1/h$ times as important.  The single variable $h$ controls how much weight a bin gains or loses if it’s one health decade away from 100% health. So we could call it the health decade factor, or HDF for short. If our top events are at 140% health and the smallest event we want to consider is at 90%, the top events will be approximately $h^5$ more valuable than the bottom ones.  Very straightforward.

The other reason this is good is that it works no matter where you are on the x-axis.  If our top events were between 50% and 100% of our health, each bin would get multiplied by a smaller factor, but you still have the same relative $h^5$ weight between the top and the bottom.  If we used a different function this might not be the case, and the metric wouldn’t work as well for an arbitrary distribution.

Of course, we still need a good value for $h$ that represents the effects we want.  There’s no obvious choice here, either.  It is, by its nature, sort of arbitrary.  We’re not attempting to model an exact number we can expect to see in game, like DPS or DTPS.  We’re trying to come up with a number that represents smoothness in damage intake, but the actual value isn’t that important.  What is important is that the value mimics a thorough qualitative assessment.  It doesn’t matter whether the value we get is 10 or 100 or 1000 as long as a similar amount of improvement from another stat gives a similar value and a larger or smaller amount of improvement gives a larger or smaller value, respectively.

Meloree and I discussed this at some length and guessed at a value of $h = 2$, which is also what I used for the weight function plot above. The idea here is that a 90% health spike is about half as important as a 100% spike, while a 110% spike is twice as important.  To see how this weight function affects the histogram, let’s multiply the entire histogram by the weight function.   That gives you the plot below.

Weighted histogram generated by multiplying the raw histogram data by an exponential weight function with $h=2$.

If you compare this to the un-weighted distribution provided earlier, you can clearly see that the weighted distribution gets shifted up by the nonlinear weighting function.  The events near 0 are practically worthless, while the events near the top get more valuable.  If we zoom in on the top 5% of events again, we get the following plot:

The top 5% of the raw histogram data after multiplication by the exponential weight function.

This has clearly increased the value of the higher-magnitude spikes compared to the lower-lying spikes. Mission accomplished, perhaps?

Well, no, not quite.  There are a few problems with this.

First, note that the few events occurring near 1.35 on the x-axis are still not worth very much – far less than the thousands of events that occur at ~0.85. Those events at 0.85 are going to happen, that’s around 1% of all events just in that bin. I’m not sure they should have so much more weight than the far more dangerous ones at the top.

Second, if you calculate the weighted average of the difference between the haste and mastery data, we get a number that suggests that mastery is better than haste. But our qualitative assessment gave us exactly the opposite answer!  What’s going on here? Did we screw something up?

The result seems paradoxical at first until you look a little deeper and realize what’s happening. With this value of $h$ we’re still very, very sensitive to “edge effects.” You get a different answer if you look at the top 5% of events than if you look at the top 4%, top 3%, top 7%, etc.  To illustrate that, here are the differential figures for the top 3%, 4%, 5%, and 7% of all events:

Differential histogram of the top 3% of the raw haste and mastery data.

Differential histogram of the top 4% of the raw haste and mastery data.

Differential histogram of the top 4% of the raw haste and mastery data.

Differential histogram of the top 7% of the raw haste and mastery data.

Differential histogram of the top 10% of the raw haste and mastery data.

You can see that the last bin on the left is generally the largest one, and while these plots are un-weighted, the problem is that the number of events in that last bin tends to increase in size faster than the weight function dies off.  To quantify that further, let’s actually calculate some stat weights.  Below is a table of the stat weights calculated this way for different cutoffs, starting from the top 1% and going to the top 10% most damaging attacks, along with a final row where we include 100% of attacks (i.e. all-inclusive) for a reference. Note that since we’re considering stat weights, a bigger number means it’s a better smoothing stat.  I’ve also properly adjusted for the fact that we’re subtracting hit and expertise instead of adding them, which just requires an inversion (e.g. -5922 becomes 5922).

hdf=2.00, N=200, vary pct

|   pct |  Stam |  Hit |  Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  1885 | 4481 | 3586 |  1507 |   712 |   688 |   680 |
| 0.020 |  3471 | 5448 | 4361 |  1986 |  2127 |   977 |   877 |
| 0.030 |  2947 | 5922 | 4710 |  2206 |  1724 |  1129 |   998 |
| 0.040 |  4013 | 6309 | 5070 |  2388 |  1547 |  1233 |  1034 |
| 0.050 |  4290 | 6652 | 5406 |  2597 |  2884 |  1358 |  1149 |
| 0.060 |  4290 | 6652 | 5406 |  2597 |  2884 |  1358 |  1149 |
| 0.070 |  3636 | 6843 | 5651 |  2715 |  2409 |  1427 |  1230 |
| 0.080 |  4907 | 6946 | 5643 |  2886 |  3173 |  1617 |  1433 |
| 0.090 |  4907 | 6946 | 5643 |  2886 |  3173 |  1617 |  1433 |
| 0.100 |  4814 | 7055 | 5740 |  2921 |  2188 |  1630 |  1436 |
| 1.000 |  4627 | 7309 | 6001 |  3257 |  2923 |  2096 |  1906 |

As you can see, the values fluctuate a lot as we change the percentage of attacks we consider.  If you go down the haste and mastery rows, you see that they’re swapping places depending on which row you look at.  That’s not good. Ideally they would be fairly consistent from row to row. Even if the values change (which is fine), the relative value of the two shouldn’t. But because that last bin is so important, our results depend heavily on what exactly we choose as a cutoff, even though the events near that cutoff are the least important!

Apart from the oddity with haste/mastery, the order is generally Hit>Exp>Stam>(Haste/Mast)>Dodge>Parry. This is about what we expected, so that part at least is good.  Dodge beats out parry because of diminishing returns, as in these gear sets dodge is diminished much less.  Hit and expertise are both very strong, just as we know they are.

The bottom row includes 100% of events, so it sums the entire histogram.  This tends to inflate the value of dodge and parry more. In general, the more exclusive you make the metric, the worse dodge and parry do because they trade the presence of worse spikes for lower average damage taken. Why? Well, when you restrict your view to just the top X% of events, those worse spikes really hurt dodge and parry.  As you increase the percentage of events being considered though, dodge and parry start to perform better because it starts adding value to the large mass of events in the middle of the unweighted distribution.

This all leads to the third problem: the choice of bin size matters. I’ve made the plots with 100 bins (between 0% and 200% health so bins of 2% health), but the data table above uses 200 (bins of 1% health). You get slightly different answers with the exclusive metrics because changing the number of bins sometimes shifts a chunk of events into or out of the part of the data you’re considering. That’s not good because again, the events near the bottom of the region we’re considering are supposed to be the least valuable ones, and thus have the smallest influence on the distribution.  Being so sensitive to edge effects introduces this artificial dependence on bin size.  And we don’t want to get significantly different results just because we altered the bin size slightly.

There are a few ways to try and fix this, but the most obvious to me was to increase the HDF.  That increases the weight discrepancy between lower-damage bins and higher-damage bins, making the lower bins less valuable and the higher bins more valuable. So I repeated the calculation for an HDF of 3:

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |  Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  3126 |  8488 | 6824 |  2259 |  1318 |   537 |   781 |
| 0.020 |  4320 |  9221 | 7414 |  2622 |  2383 |   752 |   926 |
| 0.030 |  3954 |  9529 | 7642 |  2762 |  2086 |   847 |  1001 |
| 0.040 |  4548 |  9747 | 7845 |  2864 |  1984 |   905 |  1021 |
| 0.050 |  4670 |  9919 | 8014 |  2970 |  2647 |   968 |  1080 |
| 0.060 |  4670 |  9919 | 8014 |  2970 |  2647 |   968 |  1080 |
| 0.070 |  4373 | 10006 | 8126 |  3024 |  2418 |   999 |  1117 |
| 0.080 |  4879 | 10048 | 8122 |  3094 |  2711 |  1077 |  1200 |
| 0.090 |  4879 | 10048 | 8122 |  3094 |  2711 |  1077 |  1200 |
| 0.100 |  4844 | 10091 | 8159 |  3108 |  2331 |  1082 |  1201 |
| 1.000 |  4772 | 10194 | 8260 |  3226 |  2590 |  1221 |  1342 |

Much better-looking. The variation with percentage is smaller, though we’re still seeing edge effects.  If you go down the haste and mastery rows you’ll see mastery jump around relative to haste a good bit. But the increased hdf has created a larger ramp on the weight function, making those edge effects smaller overall. The other interesting thing here is that the 100% (all-inclusive) version is a pretty good average – we’re still seeing the approximate 3-to-1 ratio of haste-to-avoidance value in the 100% row that we’re seeing in the 5% row. This all-inclusive version has another perk – it isn’t subject to edge effects at all, so it’s quite a bit more robust.

Just for completeness, here are the weighted histograms we get with $h=3$:

Weighted histogram with $h=3$.

The top 5% of the weighted histogram with $h=3$.

Of course, 2 and 3 are pretty arbitrary choices for the HDF.  To get the best metric, we really need to nail down the best value for $h$.  So the next step is to explore what happens when we change $h$ while keeping the percentage of events fixed. When we do this for the top 5%, 10%, and 100% of events, we get the results below:

pct=0.05, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  4836 |  6256 |  5193 |  2735 |  3668 |  1566 |  1261 |
| 1.6 |  4660 |  6258 |  5170 |  2680 |  3443 |  1511 |  1227 |
| 1.7 |  4523 |  6301 |  5183 |  2640 |  3259 |  1465 |  1201 |
| 1.8 |  4419 |  6384 |  5228 |  2614 |  3108 |  1425 |  1179 |
| 1.9 |  4343 |  6501 |  5303 |  2600 |  2984 |  1390 |  1163 |
| 2.0 |  4290 |  6652 |  5406 |  2597 |  2884 |  1358 |  1149 |
| 2.1 |  4259 |  6836 |  5537 |  2602 |  2803 |  1328 |  1139 |
| 2.2 |  4247 |  7051 |  5695 |  2616 |  2738 |  1299 |  1130 |
| 2.3 |  4251 |  7297 |  5881 |  2637 |  2689 |  1270 |  1123 |
| 2.4 |  4271 |  7574 |  6094 |  2666 |  2652 |  1240 |  1118 |
| 2.5 |  4305 |  7883 |  6335 |  2701 |  2627 |  1207 |  1113 |
| 2.6 |  4353 |  8223 |  6607 |  2743 |  2613 |  1170 |  1108 |
| 2.7 |  4414 |  8596 |  6908 |  2790 |  2608 |  1130 |  1103 |
| 2.8 |  4488 |  9003 |  7242 |  2844 |  2613 |  1083 |  1097 |
| 2.9 |  4573 |  9443 |  7610 |  2904 |  2626 |  1030 |  1089 |
| 3.0 |  4670 |  9919 |  8014 |  2970 |  2647 |   968 |  1080 |
| 3.1 |  4779 | 10431 |  8455 |  3041 |  2676 |   897 |  1069 |
| 3.2 |  4900 | 10982 |  8936 |  3119 |  2713 |   815 |  1056 |
| 3.3 |  5032 | 11571 |  9460 |  3202 |  2757 |   720 |  1039 |
| 3.4 |  5177 | 12201 | 10029 |  3292 |  2809 |   611 |  1018 |
| 3.5 |  5333 | 12873 | 10645 |  3388 |  2868 |   486 |   994 |
| 3.6 |  5501 | 13588 | 11312 |  3490 |  2935 |   344 |   964 |
| 3.7 |  5681 | 14349 | 12033 |  3599 |  3009 |   182 |   929 |
| 3.8 |  5874 | 15157 | 12812 |  3715 |  3090 |    -2 |   887 |
| 3.9 |  6080 | 16014 | 13651 |  3837 |  3179 |  -210 |   839 |
| 4.0 |  6299 | 16922 | 14554 |  3967 |  3276 |  -444 |   783 |
| 4.1 |  6531 | 17883 | 15526 |  4105 |  3382 |  -707 |   719 |
| 4.2 |  6777 | 18899 | 16571 |  4251 |  3495 | -1002 |   646 |
| 4.3 |  7038 | 19971 | 17692 |  4404 |  3617 | -1331 |   562 |
| 4.4 |  7314 | 21103 | 18895 |  4566 |  3748 | -1698 |   468 |
| 4.5 |  7605 | 22295 | 20184 |  4737 |  3888 | -2105 |   362 |
pct=0.10, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  5949 |  6996 |  5796 |  3333 |  2441 |  2068 |  1790 |
| 1.6 |  5601 |  6903 |  5698 |  3201 |  2363 |  1949 |  1689 |
| 1.7 |  5327 |  6869 |  5649 |  3098 |  2301 |  1849 |  1606 |
| 1.8 |  5111 |  6887 |  5642 |  3020 |  2252 |  1765 |  1538 |
| 1.9 |  4943 |  6950 |  5673 |  2962 |  2215 |  1693 |  1482 |
| 2.0 |  4814 |  7055 |  5740 |  2921 |  2188 |  1630 |  1436 |
| 2.1 |  4719 |  7199 |  5839 |  2895 |  2170 |  1573 |  1397 |
| 2.2 |  4653 |  7380 |  5969 |  2881 |  2161 |  1521 |  1364 |
| 2.3 |  4611 |  7597 |  6131 |  2879 |  2159 |  1472 |  1336 |
| 2.4 |  4592 |  7848 |  6323 |  2887 |  2165 |  1424 |  1312 |
| 2.5 |  4593 |  8134 |  6546 |  2904 |  2177 |  1375 |  1291 |
| 2.6 |  4611 |  8455 |  6801 |  2929 |  2195 |  1326 |  1272 |
| 2.7 |  4647 |  8810 |  7088 |  2963 |  2220 |  1273 |  1254 |
| 2.8 |  4698 |  9201 |  7409 |  3004 |  2251 |  1215 |  1236 |
| 2.9 |  4764 |  9627 |  7766 |  3052 |  2288 |  1152 |  1219 |
| 3.0 |  4844 | 10091 |  8159 |  3108 |  2331 |  1082 |  1201 |
| 3.1 |  4937 | 10592 |  8591 |  3170 |  2379 |  1003 |  1182 |
| 3.2 |  5045 | 11131 |  9064 |  3239 |  2433 |   914 |  1161 |
| 3.3 |  5165 | 11711 |  9580 |  3315 |  2494 |   813 |  1138 |
| 3.4 |  5298 | 12333 | 10141 |  3398 |  2560 |   699 |  1111 |
| 3.5 |  5444 | 12997 | 10751 |  3487 |  2633 |   569 |  1081 |
| 3.6 |  5604 | 13705 | 11412 |  3584 |  2711 |   422 |  1046 |
| 3.7 |  5776 | 14460 | 12128 |  3688 |  2797 |   255 |  1006 |
| 3.8 |  5962 | 15262 | 12901 |  3799 |  2889 |    67 |   961 |
| 3.9 |  6161 | 16114 | 13736 |  3917 |  2988 |  -144 |   908 |
| 4.0 |  6374 | 17016 | 14635 |  4043 |  3094 |  -382 |   849 |
| 4.1 |  6601 | 17972 | 15603 |  4177 |  3208 |  -648 |   781 |
| 4.2 |  6843 | 18984 | 16644 |  4319 |  3329 |  -946 |   705 |
| 4.3 |  7099 | 20052 | 17762 |  4469 |  3458 | -1278 |   619 |
| 4.4 |  7370 | 21180 | 18962 |  4628 |  3596 | -1647 |   522 |
| 4.5 |  7658 | 22369 | 20248 |  4796 |  3742 | -2056 |   413 |
pct=100.00, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  5269 |  7122 |  6009 |  3834 |  3753 |  3033 |  2764 |
| 1.6 |  5122 |  7133 |  5982 |  3692 |  3552 |  2793 |  2538 |
| 1.7 |  4971 |  7139 |  5953 |  3557 |  3361 |  2578 |  2339 |
| 1.8 |  4834 |  7165 |  5941 |  3437 |  3191 |  2393 |  2169 |
| 1.9 |  4719 |  7220 |  5956 |  3337 |  3045 |  2233 |  2026 |
| 2.0 |  4627 |  7309 |  6001 |  3257 |  2923 |  2096 |  1906 |
| 2.1 |  4558 |  7434 |  6077 |  3194 |  2823 |  1978 |  1805 |
| 2.2 |  4511 |  7596 |  6186 |  3149 |  2742 |  1874 |  1720 |
| 2.3 |  4485 |  7793 |  6328 |  3118 |  2678 |  1781 |  1648 |
| 2.4 |  4478 |  8027 |  6501 |  3101 |  2630 |  1696 |  1587 |
| 2.5 |  4489 |  8297 |  6708 |  3097 |  2595 |  1616 |  1534 |
| 2.6 |  4516 |  8603 |  6948 |  3103 |  2573 |  1540 |  1488 |
| 2.7 |  4558 |  8946 |  7222 |  3120 |  2563 |  1464 |  1447 |
| 2.8 |  4616 |  9324 |  7531 |  3147 |  2562 |  1387 |  1410 |
| 2.9 |  4687 |  9740 |  7876 |  3182 |  2572 |  1307 |  1375 |
| 3.0 |  4772 | 10194 |  8260 |  3226 |  2590 |  1221 |  1342 |
| 3.1 |  4870 | 10686 |  8683 |  3278 |  2617 |  1130 |  1310 |
| 3.2 |  4981 | 11218 |  9148 |  3339 |  2652 |  1029 |  1277 |
| 3.3 |  5105 | 11791 |  9657 |  3406 |  2695 |   918 |  1243 |
| 3.4 |  5241 | 12406 | 10213 |  3482 |  2746 |   794 |  1208 |
| 3.5 |  5390 | 13065 | 10817 |  3565 |  2805 |   657 |  1169 |
| 3.6 |  5552 | 13768 | 11474 |  3656 |  2872 |   502 |  1128 |
| 3.7 |  5727 | 14518 | 12185 |  3754 |  2946 |   330 |  1081 |
| 3.8 |  5916 | 15316 | 12954 |  3861 |  3028 |   136 |  1030 |
| 3.9 |  6117 | 16164 | 13785 |  3975 |  3118 |   -81 |   973 |
| 4.0 |  6332 | 17064 | 14681 |  4097 |  3215 |  -323 |   909 |
| 4.1 |  6561 | 18017 | 15646 |  4227 |  3321 |  -593 |   837 |
| 4.2 |  6804 | 19025 | 16684 |  4366 |  3436 |  -895 |   757 |
| 4.3 |  7062 | 20091 | 17799 |  4513 |  3559 | -1230 |   667 |
| 4.4 |  7335 | 21216 | 18997 |  4670 |  3690 | -1602 |   567 |
| 4.5 |  7624 | 22403 | 20281 |  4835 |  3831 | -2014 |   455 |

There are a few effects happening here.

First, a small HDF tends to keep the scaling between stats small, which means they’re more vulnerable to edge effects.  We can see this from the first two tables, as haste and mastery swap positions from one table to the other. We already sort of knew that, but it’s good that the data reaffirms that expectation.

A really high HDF tends to increase that gap dramatically, which reduces edge noise.
On the other hand, a high HDF does something strange to dodge and parry. As you can see, the stat weight of dodge and parry both plummet, and the dodge value even becomes negative at one point.

Your first thought might be to rationalize these results.  For example, dodge and parry tend to give a wider distribution, allowing for larger spikes but reducing overall damage taken.  But remember, we’re not comparing a dodge/parry gear set to a control gear set here, we’re adding 1000 dodge or parry to it.  There should be absolutely no circumstance where adding 1000 dodge actually makes your smoothness metric worse!  It should always make it better, just not necessarily as good as other stats.

In fact, it’s a different issue entirely.  If you look at the source data, adding 1000 dodge reduces spike presence in every category except one: the very top one, 140%, where all of the sudden we have a 0.001 instead of a pure zero. This isn’t because adding dodge suddenly created higher-damage spikes; it’s simulation noise. That’s what’s causing the huge drop in dodge’s value here, and a similar (though less pronounced) effect is happening in the parry data. This is bad, it essentially means that our HDF is so large that it’s becoming super-sensitive to simulation noise.

So we’re stuck between a rock and a hard place.  If the HDF is too high it causes simulation noise problems, but reduces our edge-effect issues.  But if it’s too low we have the reverse: edge-effect noise problems, but low sensitivity to simulation noise in the higher categories. One solution is to just simulate longer and reduce the noise, but that’s not a great solution either. Ideally, we want to pick an HDF somewhere in the middle, so that we get good data fidelity and low sensitivity to noise so that we don’t have to simulate for hours at a time.

For the tables where we look at only the top 5% and 10% of spike events, it’s clear that $h=2$ is too low.  2.5 is on the border of what I’d deem acceptable, but even that is a little volatile.  On the high end, 3.5 is pretty good, but is right on the borderline of the region where simulation noise is a problem.  But probably our ideal value lies somewhere between 2.5 and 3.5.  Nominally, I’m going to say it’s around 3.0, because that data seems pretty consistent between both tables.  But I have to do a more detailed analysis of this range to really feel comfortable picking a final value.

However, there’s also another solution to consider.  The table that uses 100% of the data eliminates boundary issues entirely because it includes all of the data, and the weight function makes sure that each category is worth less as we work our way from higher to lower damage ranges.  That makes it more stable at low $h$-values than either of the percentile-based versions. I’d want an HDF larger than 2, because if it’s too low it risks watering down the strength of small changes near the top end and over-valuing avoidance.  But it’s even passable at $h=2$ in this table, unlike the percentile versions.

From considering all of this data, I’m leaning towards defining the metric using an all-inclusive histogram with a moderately high HDF, probably around $h=3$. However, we have a lot of further testing to do before we settle on final values.  And of course, we’ll want to implement some sort of normalization scheme such that the values are independent of the number of iterations we use.  I’ll detail all of that in the next two blog posts.

What’s in a name?

However, before we end, I want to offer a parting thought about the greater applicability of this metric, and suggest a name for it.  While I’ve framed this entire discussion in terms of subtracting histograms from one another, the way you would likely do this in a computer is a little different.  The distributive property tells us that $a(b-c) = ab – ac$. That means it doesn’t matter whether we subtract the two histograms first and multiply by the weight function afterwards or vice versa.

This is actually really good news, because it means you could multiply the weight function times the histogram for one experimental configuration (e.g. gear set, talent combination, glyphs, etc.) to get a single number. Then you could repeat that process for any other gear set or configuration you want and get a new number. And you’ll get a unique number for each configuration that describes the smoothness of your damage intake under those particular conditions.

There’s a bit of arbitrariness to this, in that the weight function is whatever we decided to come up with. However, that’s OK, since we’re not trying to give a measurable in-game number like DPS or DTPS. As an analogy, think about the stock market. The Dow Jones Industrial Average (DJIA) is a stock market index, which is just a weighted average of the stocks of 30 large companies. It’s used to reflect the performance of the entire stock market, but the companies chosen are pretty much arbitrary. Even if some sort of formula is used to choose them, that formula is essentially arbitrary, and there are other indexes that use different formulas (S&P 500, Russel 2000, etc.) and give slightly different impressions of the overall health of the market.

Another good analogy is your credit score.  Your credit score is calculated by an algorithm, which is somewhat arbitrary and chosen to model credit-worthiness.  But your credit score isn’t an estimate of some measurable value.  Having a credit score of 750 doesn’t tell you exactly what APR you’ll get on your mortgage or how large a loan you can take out, even though it affects those things.  But it’s also very clear that a score of 750 is better than a score of 600.  And there are multiple ways of calculating credit score with different ranges, all trying to convey the same rough information about your borrowing risk.

And that’s really what we’re doing if we come up with a unique number for each simulation configuration: producing an index.  You sim your gear set and you get a number out that tells you how smooth your damage intake is. You can re-sim with a different gear set to see if it improves.  In this case, that means the number goes down, because just like DTPS a lower number is better with this type of smoothness metric. And you can calculate scale factors, which is just a fancy way of saying “sim once with a baseline gear set, then sim again with +1000 haste, then with +1000 mast, etc., and subtract each of those indices from the baseline value.”

In short, it’s exactly what an index should be: a solid number with a very clearly-defined calculation method that you could compare to any other configuration that uses the same boss mechanics.  That’s really the only constraint here: the boss’ attacks need to be identical for two different indices to be comparable.  You can’t compare an index generated with Hogger to one generated by Lei Shen and expect to get anything useful out of it.  We’ll go into more detail on that point in the third post in this series, later this week.

But it will be clear that a result of 100 is a lot smoother than a result of 1000.  It may not be clear exactly why until you look at the details – maybe one boss was Hogger and the other was Lei Shen, or maybe it was a gearing difference, or who knows what.  Either way, it will be clear that the tank with the result of 100 wasn’t in much danger, while the tank clocking in at 1000 was.  If we choose our overall normalization factor properly, we’ll be able to do this very accurately.  For example, a smoothness value larger than 1000 would clearly be dangerous, while one below 500 wouldn’t be, or something like that.  Again, we’ll talk about the details of that normalization process later this week.

The concept of the DJIA came up while Mel and I were discussing this, so my first thought was that we should call this the Theck-Meloree Industrial Smoothness Average as a bit of a joke. However, it struck me that there was a much more natural name that was less of a mouthful: the Theck-Meloree Index.  Which, of course, would be abbreviated TMI. Which is not only amusing, but also eerily accurate based on the amount of background work we’re presenting here to develop it.

So yes, you heard it here first. We are going to get tanks to compare their TMIs. And it will be glorious.

## SotRdämmerung

As was pointed out in the comments on a previous blog post, the protection 2-piece-turned-4-piece that returns holy power when Bastion of Glory (BoG) stacks are consumed will allow us to keep SotR up 100% of the time.  That comes with a small disclaimer, namely that it takes about 40-45% haste to pull it off.  But many players are already at or above that threshold, and more of them will reach it with T16 gear.

The more I think about it the less I think it was intended.  To illustrate why, first let’s look at how the process works.  Let’s assume a player has T16 4-piece and barely enough haste to pull this trick off.  His finisher cast sequence looks something like this:

1: Build 3 HP, cast SotR, repeat 5 times to build 5 BoG stacks
2: Build 1 HP, cast WoG, cash in on 5 free Holy Power from set bonus
3: Cast SotR
4: Build 1 HP, cast SotR
5: Build 3 HP, cast SotR, repeat 3 more times to reach 5 BoG stacks
6: Build 1 HP, cast WoG, cash in on 5 free Holy Power from set bonus.  This takes you back to step 3.

The player then repeats 3-6 indefinitely since that forms a closed loop.  Let’s call a single traversal of that that loop a “cycle.”

For that cycle to maintain 100% SotR uptime, you need to generate 11 HP every 15 seconds.  There’s 16 HP total expenditure per cycle, but you get 5 back each cycle as well.  Eleven HP every 15 seconds is 0.733 HP/sec, which is easily achieved with ~40% haste and any of the T75 talents. Rather than re-do the math myself, I’ll just quote Thels, who worked it out:

Let’s take 2 minutes from a fight, so we can pop HA exactly once. We need 8 cycles of 15 seconds of SotR uptime, so 8 cycles of 11 HoPo, for a total of 88 HoPo every 2 minutes.

During 2 minutes, at 50% haste, we press CS/HotR exactly 40 times, and Judgment 26 times and a bit, for a total of 66 HoPo.

Our HA covers 6 CS/HotR presses and 4 Judgment presses, for a total of 20 extra HoPo, which brings us to 86 HoPo.

That would mean that we need 1 Grand Crusader proc per minute to indeed cap on SotR. Each additional proc allows us to underperform slightly.

The only thing I want to add is that each additional proc doesn’t just allow for under-performance, it also reduces the haste threshold.  My conservative estimate is between 40% and 45% haste, but it will vary a lot based on the number of mobs you’re tanking, the amount of time you spend tanking those mobs, and the average melee swing timer of the bosses (Ji-Kun, for example, spends so much time casting that his effective swing timer is absurdly long).  If you consider any real encounter with tank swaps, banking to 5 HP before starting the cycle essentially ensures 100% uptime while you’re tanking even at lower haste levels.

There are a few reasons I don’t think this was entirely intended.  The fist concern is the “block cap” issue, because that’s really the same problem being recreated by this effect.  We stack a particular stat (haste now, mastery in Cataclysm) so that we can reach a point where we take a lot less damage from physical attacks (>40% now, 30% in Cataclysm.  That certainly feels like it could be a balance problem on any hard-hitting boss, just like the Cataclysm-era block cap was.  How do you challenge all five different tanks when one takes almost half of the damage from the largest attacks?

In Dragon Soul, they simply made most attacks bypass block, but that solution doesn’t work with SotR.  And an entire tier of Lei Shi doesn’t sound appealing, nor is the next tier likely being designed with this in mind.  Block cap was a persistent problem the entire expansion, and they could (and did) plan for it when designing encounters. I doubt the same is true of this set bonus.

Second, the fact that they swapped the T16 set bonuses makes it clear that the developers thought the interaction between 2-piece tier 15 and this holy-power banking set bonus was a problem.  Maintaining a constant +40% block chance buff was bad enough to swap the set bonuses so that it couldn’t happen.  This effect is even worse, because it’s maintaining a constant 40% damage reduction.  We may have had a lot of that uptime already, but I’d argue it’s still a much stronger effect than an extra 40% block chance.

Third, there are some rotational concerns.  On these I’m a little more mixed.  The set bonus means that players will literally be able to macro SotR to every ability in their rotation and ignore it.  All they need to worry about is timing that single 1-holy-power WoG every ~15 seconds to keep their income level.  This bothers me from a skill standpoint, obviously, but also from an annoyance standpoint.  I don’t really want to write 7 or 8 new macros for my rotational abilities just so I can be lazy with SotR presses.  But I probably would, because not having to pay attention to SotR would open up some attention bandwidth that would probably be beneficial.

That said, to pull this trick off you give up WoG as your emergency button, so you are making a sacrifice.  Losing that big self-heal is certainly one fewer reactive tool we have at our disposal.  But it’s almost certainly a sacrifice worth making if you ensure that you never take a full-sized hit.  It’s sort of like giving up sunscreen to become immune to sunburn.

I’m also not certain that it was entirely intended that we could use a 1-holy-power WoG to return all five of our reserve holy power.  That’s the key to this trick, after all: use a 1-HP WoG to refund 5 Holy Power, netting you 4 HP.  If you can do that every 15 seconds, that’s 0.267 HP/sec.  For comparison, you get an equivalent HP generation rate from 67% haste, which is ludicrous.  The set bonus literally gives us about 67% haste worth of holy power generation, or about 28k haste rating.  Now admittedly, that’s assuming that you already have about 40% haste, and the overall value of the set bonus will reduce somewhat if you’re below that point already.  But no matter how you slice it, it’s absurdly good.

In fact, it feels a lot like using one-HP WoGs to game Divine Purpose back in Cataclysm.  When that trick became fairly widespread, the developers modified the proc chance so that it scaled with holy power spent.  The parallels between the two are somewhat uncanny, which leads me to believe that this bonus will be tweaked eventually as well.

I like the general idea of the set bonus, though.  Removing the opportunity cost of WoG is neat.  Having a Holy Power reserve is sort of neat as well.  I can see interesting situational uses for the bank, like getting into real danger and saving the day with a clutch sequence of SotR – WoG – SotR -SotR.  That would be pretty fun, and I feel like that’s the sort of thing the developers had in mind with this set bonus.

But I think I would like the “banking” idea a lot better if we were not able to use it to reach 100% uptime on SotR.  Being able to reach that threshold changes the entire feeling of the bonus.  It will feel not like a neat tool at our disposal, but like a maintenance buff.  And a cheesy maintenance buff at that – something we have to do because it’s just so broken that we’re at a disadvantage if we don’t.

The ability to reach 100% SotR uptime with this removes the interesting aspects of the banking, like being able to choose when you want to use that HP reserve, and replaces it with a rote, “macro SotR to everything” style that feels really bland and boring.  Instead of thinking about timing SotR on short time scales, we phone in our 1-HP WoG every 15 seconds and call it a day.

In more technical language, for the banking idea to be fun it needs to be more reactive than proactive.  Just as WoG is a tool you use to react to a big hit, this set bonus needs to be a tool we use primarily to react to a dangerous situation. There’s certainly still going to be proactive potential here, like using the bank to cram a few extra SotRs in during a period where you expect to be in danger.  But that’s still somewhere in the middle of the reactive-proactive continuum, because you’re reacting to a potential threat.  When it becomes purely proactive, as it does if we’re just withdrawing from the bank every 15 seconds for the steady-state HP gain, it stops being nearly as fun.

I do think that it’s possible to tweak the set bonus slightly to keep the interesting aspects while preventing abuse.  But to do that, the holy power refund we get needs to be scaled or limited somehow, either by the amount of HP spent (as Divine Purpose was) or by some other constraint.  I can think of a few fairly simple systems for this:

1) Return 1 holy power per stack of BoG consumed, capped by HP spent.
In other words, you can only get as much HP back as you spend.  That still removes the opportunity cost of WoG, but also removes the banking potential entirely. That’s a bit of a bummer, but at least it prevents abuse.

2) Let’s assume that solution is too severe.  Instead you could cap the returns by (HP spent + 1).  For example, with a 5-stack of BoG:

3 HP WoG grants 4 HP back
2 HP WoG grants 3 HP back
1 HP WoG grants 2 HP back

This has the downside of not being very intuitive.  It also encourages you to use WoG at less than 5 stacks of BoG to avoid “wasting” potential gains.  For example, the 1-HP WoG cycle would turn into two consecutive SotRs followed by a 1-HP WoG.  That requires 5 HP every 6 seconds if we’re aiming for 100% uptime on SotR, which is a higher threshold (0.8333 HP/sec), but probably still attainable with L75 talents and ~50% haste.

Another concern is that it doesn’t “fix” the problem, it simply raises the threshold.  And if one L75 talent turns out to be the best at HP generation (like the new version of Sanctified Wrath, or “Holy Judgment Spam” as I like to call it), it might be the only valid choice if it lets us reach that threshold.  That just ends up turning the set bonus into a constraint on talent choices, which isn’t much fun.

3) Internal cooldown. HP gains from the set bonus are limited to once every N seconds.  This doesn’t eliminate the banking feature, but does limits the maximum HP generation rate one can achieve by it.  For a few points of reference, here are the rates one would achieve for several different values of N assuming we use a 1-HP WoG at 5 stacks of BoG, which is a 4 HP gain:

N=30 gives 0.1333 HP/sec, or about 33% haste equivalent
N=45 gives 0.0889 HP/sec, or about 22% haste equivalent
N=60 gives 0.0667 HP/sec, or about 16.7% haste equivalent
N=75 gives 0.0533 HP/sec, or about 13.3% haste equivalent
N=90 gives 0.0444 HP/sec, or about 11.1% haste equivalent

Any longer N would probably be too weak, simply because we’re likely to give up about 10% haste just to wear 4-piece unless the itemization team starts embracing our new Mists mechanics.  So at that point it would be a wash in the HP generation department, and whether or not to use the set at all would become a question of whether it’s worth giving up the raw HP generation rate (and DPS!) we’d gain by using well-itemized off-set pieces to gain the short-term HP banking ability that the set bonus provides.

That may still be an interesting choice, but I think that a 1.5-minute or longer effective cooldown will turn players off.  I already heard complaints about the new version of Sacred Shield (30% absorb bubble every 2 minutes, triggered by low health) even though it’s rather powerful. The complaints were always, “meh, once every 2 minutes is so rare it might as well be never.”  I completely disagree, of course. How soon we forget the stupidly overpowered Wrath-era Ardent Defender talent.  But most players form their opinions based on gut feeling rather than rational analysis (you’d be surprised how many prot paladins still gear for dodge and parry!), so I think a long internal cooldown on the set bonus will be generally end up being viewed as not worth it.

And it’s worth noting that this version still only raises the threshold, it doesn’t eliminate it.  It raises it by a lot more than the other methods because it severely reduces the steady-state holy power generation rate that the set bonus provides.  So by the time we have enough haste to hit 100% uptime with this extra income, perhaps we’d already be well above 90% uptime anyway.

Still, I think the internal cooldown idea is probably the most even-handed.  And likely the easiest to implement as well, given that it doesn’t involve a significant change to how the HP returns are calculated.  A 60- to 75-second ICD would put a cap on how effective the set bonus is, and if chosen correctly might make sure that the 100%-uptime SotR panacea is out of reach.

Though I feel obligated to mention that I’ve ignored the potential interactions with the revised version of Selfless Healer, which would return 5 HP for zero HP cost via Flash of Light.  I’m ignoring that because the version of Selfless Healer on the Public Test Realm doesn’t grant the set bonus effect yet.  That may be intended, or it may be an oversight, it’s anyone’s guess as to which.  In fact, my guess is that it’s an oversight that will become intended once this blog post becomes common knowledge.

But more importantly, I don’t believe that the increase in effectiveness we would gain over the regular 1-HP WoG version is worth sacrificing Sacred Shield and tying yourself to the GCD with Flash of Light.  However, if the set bonus is “fixed” in a way that leaves the HP generation threshold in reach, it could become a concern, much like the level 75 talents might be.  It all depends on what, if anything, happens.  As much fun as it might be to be a stupidly-overpowered block-capped paladin tank, I hope something does.  If not for the sake of making the set bonus more interesting, than for the sake of the other tanks we’ll leave in the dust.

## Tiers Are The Silent Language Of Grief

As you may have noticed, the PTR went up recently.  And with it, we got some reveals of our set bonuses for T16.  Let’s take a brief look at them and ponder their usefulness.

2-piece bonus

The 2-piece bonus grants us one holy power for every stack of Bastion of Glory consumed.  This is a really interesting effect because it has a few potential uses.

First, it removes the opportunity cost of using Word of Glory as an emergency heal.  Using WoG means you’re giving up 3 seconds of Shield of the Righteous coverage, which can often mitigate more damage than WoG will heal.  Getting that balance point right can be tricky, and the set bonus makes that choice easier.  Since both WoG and SotR are off-GCD, you can WoG and SotR yourself in rapid succession during a danger period. This set bonus makes the WoG+SotR combination a very potent anti-spike technique, which should mean it’s a great damage-smoothing effect.  Once I have tank metrics properly implemented in SimC, we’ll be able to see exactly how effective it is for that purpose.

Second, and more subtly, it gives us a reserve bank of holy power.  Boundless Conviction already gives us a reserve of 2 holy power to work with, and smart paladins use that to great effectiveness.  This set bonus turns Bastion of Glory into a secondary reserve, such that if we’re in trouble we could fire off a WoG to gain up to 4 holy power instantly.  Pulling that off at max effectiveness will be a little tricky, as you’ll need to pick your order of operations properly to make sure you cast a 1-HP WoG.  But even if used inefficiently, this is a potent effect because it gives you extra SotR coverage whenever you deem it necessary.

But the third (and in my opinion game-breaking) use is the one that’s likely to get this set bonus nerfed.  And that’s the combination of 2-piece T15 and 2-piece T16.  The latter removes the opportunity cost of WoG, making it essentially free.  The former gives you 15 seconds of 40% block that isn’t affected by diminishing returns every time you cast WoG.  While HP generation rates are not quite to the point where we can keep that up 100% of the time, they’re not that far off.  You would need to generate 3 stacks of Bastion of Glory every 15 seconds, or one SotR every 5 seconds.  Right now, most players are casting SotR every 6-7 seconds.  As players approach the 50% melee haste soft-cap, 100% uptime of Shield of Glory should become a reality.

I think that this is probably an issue, because 40% block is almost certainly strong enough to sacrifice two slots of higher-ilvl T16 gear.  Especially if you can use two pieces of heroic, double-upgraded T15 tier.  I expect this interaction will trigger some sort of change as soon as Blizzard finds out it exists.  They could nerf this set bonus or swap it with the 4-piece to ensure that the 2T15+2T16 interaction is impossible.  They could also just nerf the T15 2-piece bonus and leave the T16 bonuses in-tact.  As we’ll see in a second, the 4-piece bonus is considerably weaker, so my vote would be to swap them.

4-piece bonus

The 4-piece bonus grants us a heal over time after Guardian of the Ancient Kings (GAnK) expires.  The amount of healing is determined by the damage we take during GAnK.  Blessing of the Guardians is the HoT spell, so we know that it ticks every second for 10 seconds.

Unfortunately, when I tested it on the PTR it seems the HoT amount isn’t being calculated, because the buff would appear and expire instantly.  In fact, the only reason I knew it was there (and could determine the spell name) was that it showed up in Recount/Skada as a heal that produced 0 healing.  So at this point, we don’t know what damage value it will use.  I’m assuming it will be post-mitigation (unlike Vengeance, which is pre-mitigation), but it’s anyone’s guess as to whether it will use damage done before or after absorb effects are taken into account.

That said, it may not matter that much.  This set bonus is incredibly weak, so as a 4-piece we may just skip it entirely.  You’re generally going to use GAnK in one of two situations.  The first is a predictable period of high damage, and it’s rare for that period to last the entire duration of GAnK.  The second is a situation where something goes wrong and you use GAnK to buy your healers time to recover.  In both of those cases, the HoT is showing up after the danger period is passed, so it’s not terribly effective.  Your healers may be able to ignore you more than usual after GAnK ends, but apart from that the set bonus probably just creates a bunch of unnecessary overhealing.

There’s also tank swaps to consider.  If you and your co-tank are taunting so that you can alternate cooldowns to survive an extended phase of intense damage, then it’s very likely that the HoT will end up ticking on you when you aren’t even tanking.  With some smart pre-planning you could arrange to chain GAnK and Divine Protection, and maybe the HoT will be enough of a buffer that DP will be a sufficient cooldown.  But that’s still a pretty niche application.  A set bonus that’s effective on only one or two fights in an entire raiding tier is a set bonus you’re probably safe skipping.

But the biggest Achilles heel of the 4-piece bonus isn’t the effect itself.  It’s what the effect has to compete with: the superior itemization of higher-ilvl thunderforged off-set pieces.  Which raises a more general point about paladin tier itemization.

Itemization

We don’t have any information about the itemization of tier 16 yet.  The T16 pieces have identical stat combinations to the T15 ones, so I’m assuming that they are all placeholders. That may be a good thing though – while it doesn’t give us any information, it means that maybe there’s still time to influence a change.

Our T15 set was a bit of a letdown for a few reasons, the first and most obvious of which is the set’s itemization.  It feels like a form of punishment to wear our tier gear, because we have to suffer a lot of dodge/parry itemization that we don’t want just in in order to have fun set bonuses.  The incredibly low value of dodge and parry for us also meant that those fun set bonuses weren’t terribly effective either; as we’ve shown, the bonuses don’t even make up for the loss of haste and mastery incurred by skipping thunderforged haste plate gear.

I know I found myself in the situation where I had heroic thunderforged off-set pieces before I had access to heroic tier, which made the decision to skip the set bonuses fairly easy.  And that was a little disappointing, because I would much rather be excited about tier items than feel ambivalent about them.

But an even more important effect is psychological – it feels like our tier set is not being designed for us anymore.  Either because the itemization team doesn’t know what paladins like, or because the developers themselves aren’t sure what we should be wearing.  The game is giving us conflicting messages.  The speed, fluidity, and fun of haste-stacking gameplay that Sanctity of Battle gives us make it clear we want to gear for haste.  But our tier itemization is the disapproving nanny “tsk tsk”-ing, wagging her finger, and suggesting that us naughty haste-stacking paladins should cut our hair, get a job, and go back to wearing respectable tanking stats like dodge and parry.

Since it’s “our” set, it should really feel like it’s made for us.  Obviously the itemization doesn’t have to be perfect.  I don’t expect gobs of haste on every piece.  But it should be on some of them, and for T16 maybe even most of them.

To be more explicit, what I’m suggesting is a paradigm shift in how paladin gear is itemized.  As an example of what I have in mind, let’s consider the other classes for a second.

• Warriors and DKs get combinations of hit/exp/mastery/dodge/parry, again because those are the core tanking stats the classes are designed to use.  DKs get a very small benefit from haste, and warriors get a small benefit from crit, but neither is large enough to warrant gearing for them.
• Monk and druid tier has combinations of hit/expertise/haste/mastery/crit, because those are the core tanking stats the classes are designed to use.  They can certainly wear dodge and parry gear, but the benefit they get is small compared to haste, mastery, and crit, so they don’t want them.

Paladins are somewhere in the middle now.  We get warrior/DK itemization even though we have little interest in dodge or parry. We’ve moved to a more monk- and druid-like active mitigation model, where dodge and parry give us much smaller benefits than haste and mastery.  In a sense, we’ve moved on to a WoW 5.0 active mitigation scheme, yet we’re still being itemized as if we were WoW 4.0.

What I would like to see is an acknowledgement by the developers that they understand we’ve evolved as well.  And that acknowledgement would come in the form of a paradigm shift in our tier itemization.  Rather than being stuck with warrior/DK itemization on our tier, I’d like to see us get combinations of hit/exp/mastery/haste/parry.

This itemization scheme has several advantages:

• It would shift our tier itemization to be more in-line with what we actually value
• It would eliminate dodge+parry combination pieces, which literally feel worthless to many paladins.  While we may suffer one or the other on a piece with another strong stat, like a hit/dodge or expertise/dodge item, double avoidance gear is the first thing we toss aside or disenchant.  A large part of the reason the T15 4-piece is not attractive is because we can replace a dodge/parry piece with well-itemized off-set, which provides a huge performance difference.
• It would send the message that Blizzard understands how prot paladins work
• More importantly, it also sends the message that Blizzard supports haste as a true tanking stat for us.
• Not only that, but putting haste on our tier does a much better job of informing the masses that “hey, haste-tanking is a thing” than any blog post or patch note ever will.  A large majority of players don’t do any research outside of the game, and may have literally no way of knowing that haste is a good stat for them.  Putting it on the tier will force them to consider that, and may even lead them to ask around to find out why.

Note that I’m only talking about our tier gear here.  I don’t think dodge/parry off-set items need to go away – in fact, I think off-set itemization can remain entirely unchanged.  Those pieces still have value for other tanks and can be gap-fillers for unlucky slots.  Though given that DKs don’t seem to care for dodge/parry either, it may be worth reconsidering dual-avoidance itemized pieces entirely.  However, that’s not a pressing issue.

It’s just the protection paladin tier itemization algorithm that really needs to be re-evaluated.  Otherwise we’re in for another tier of ambivalence towards (or outright avoidance of) set bonuses because they’re tied to gear that isn’t designed with a protection paladin in mind.  And I really hope that doesn’t happen, because I’m sort of tired of skipping set bonuses.  It takes a lot of fun out of the game to see all of your friends and teammates be excited about completing their 2- and 4-piece sets while you’re unable to muster up more than a “meh” because your spec’s bonuses just aren’t worth it.

TLDR

T16 2-piece bonus is very good, possibly to the point of being broken.
T16 4-piece bonus is very weak, probably to the point of being ignored.

Protection paladin tier itemization really needs to catch up with protection paladin gameplay, or else we’ll continue to ignore non-broken set bonuses in favor of gear that fits our play style.