The Making of a Metric: Part 2

In our last post, we decided on a functional form for our metric.  And while I didn’t write it out in full mathematical formalism, the pieces were all there.  In short, we start with a damage histogram $H(x)$ computed by taking the moving average of our damage (and healing) taken timeline.  $H(x)$ looks like this:

4-attack damage string histogram

Histogram of the 4-attack damage string data.

From that histogram we can calculate the unnormalized Theck-Meloree Index or TMI as follows:

$\displaystyle \Large {\rm TMI} = \int_{-\infty}^\infty H(x)e^{10\ln(h)(x-1)}  $

where $H(x)$ is the continuous damage intake histogram and $h$ is the health decade factor, or HDF.  That equation assumes you have a continuous histogram, which generally isn’t going to be the case.  Normally we’ll have a data set like the one in the figure, where the data is divided into individual bins.  Each bin is centered at a value $x_i$ (for the $i^{\rm th}$ bin) at which the histogram has a value $H_i$.  As a result, we can write the discrete form of the TMI as:

$\displaystyle\Large {\rm TMI} = \sum_{i=1}^M H_i e^{10\ln(h)(x_i-1)}$

Where $M$ is the total number of bins we’re considering.  As of this point, I have left out normalization conditions.  We won’t tackle that topic today, so for those of you who care about such things, just assume for the moment that there is a specified normalization condition on $H(x)$.  When we do tackle normalization, that’s likely the way we’re going to approach it anyway.

I also want to be very clear that this is our “working definition” for the metric.  As in, it’s not the final version.  We still want to fool around with it some, normalize it, and so on.  And in fact, in the next post I’ll show that we can significantly simplify the process.  But for now, this is the clearest way of saying “here’s the definition we’re thinking about” while we test it to see if we’re happy with the definition.

We decided last time that we were likely to focus on an HDF of around 3.  That determination was based on a few factors, but mostly based on the relative stat weights it produced for a Control/Haste gear set.  However, we want to make sure that this metric works well for a variety of different gear configurations, so we need to check them.  In this post, we’ll look at a variety of gear sets and see whether the TMI metric matches our qualitative observations.

I’m including all of the gear sets we’ve used in the past in this post.  We’ll be looking at a couple in detail first to see how stat weights vary from gear set to gear set.  Then we’ll look at the raw TMI score for all of the gear sets together.  Below is the exhaustive list of all the gear sets we’ll consider in this post. Note that I’ve transposed the usual table format because we’re considering so many different gear sets, and this arrangement is just easier to read in blog format.

|   Set: |   Str |   Sta | Parry | Dodge | Mastery |  Hit |  Exp | Haste |
|   C/Ha | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 | 12000 |
|   C/St | 15000 | 34000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|   C/Sg | 15000 | 31000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|  C/Shm | 15000 | 31000 |  1500 |  1500 |    4750 | 2550 | 5100 |  4750 |
|   C/Ma | 15000 | 28000 |  1500 |  1500 |   13500 | 2550 | 5100 |     0 |
|   C/Av | 15000 | 28000 |  7500 |  7500 |    1500 | 2550 | 5100 |     0 |
|  C/Bal | 15000 | 28000 |  4125 |  4125 |    4125 | 2550 | 5100 |  4125 |
|   C/HM | 15000 | 28000 |  1500 |  1500 |    6750 | 2550 | 5100 |  6750 |
|     Ha | 15000 | 28000 |  1500 |  1500 |    1500 |  500 |  500 | 18650 |
|  Avoid | 15000 | 28000 | 10825 | 10825 |    1500 |  500 |  500 |     0 |
| Av/Mas | 15000 | 28000 |  7717 |  7717 |    7716 |  500 |  500 |     0 |
| Mas/Av | 15000 | 28000 |  4000 |  4000 |   15150 |  500 |  500 |     0 |
|   Ha/h | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 |  500 | 16600 |
|  Ha/he | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 2550 | 14550 |
|  C/Str | 27600 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 |     0 |

In no particular order, let’s consider a few of these gear sets.  Fair warning: the rest of this post is very number-crunchy and not very visually aesthetic.  If numbers aren’t really your thing, I’ve provided a pretty good summary of what’s going on in the tables, but it’s still probably going to bore you to death.  You may want to skip to the conclusions in that case.

Control/Balance

This gear set was a compromise originally, and an attempt to model a character that’s using a mix of different gear rather than strictly following a particular philosophy.  But we’ve only ever compared this to other gear sets; we’ve never really looked at stat weights in this configuration.  This time, we’ll do that by adding 1000 stamina, haste, mastery, dodge, or parry, or subtracting 1000 hit or expertise.  Here’s th data we get if we do that:

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-0

| Set: |  C/Bal |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.275 |  0.275 |  0.282 |  0.281 |  0.267 |  0.268 |  0.268 |  0.269 |
|  std |  0.113 |  0.113 |  0.116 |  0.116 |  0.111 |  0.113 |  0.114 |  0.113 |
|   S% |  0.452 |  0.452 |  0.440 |  0.445 |  0.462 |  0.453 |  0.453 |  0.453 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 51.610 | 49.807 | 53.683 | 53.190 | 49.052 | 49.120 | 49.260 | 49.574 |
|  60% | 36.136 | 32.477 | 38.333 | 37.721 | 33.432 | 34.597 | 33.936 | 34.209 |
|  70% | 21.892 | 16.894 | 24.227 | 23.770 | 19.467 | 18.576 | 20.289 | 20.507 |
|  80% |  9.240 |  8.283 | 11.012 | 10.594 |  7.714 |  8.272 |  8.391 |  8.527 |
|  90% |  4.066 |  3.338 |  5.147 |  4.842 |  3.160 |  3.583 |  3.572 |  3.669 |
| 100% |  1.440 |  0.907 |  2.033 |  1.871 |  1.068 |  1.303 |  1.257 |  1.295 |
| 110% |  0.414 |  0.356 |  0.645 |  0.584 |  0.288 |  0.371 |  0.353 |  0.366 |
| 120% |  0.169 |  0.112 |  0.274 |  0.251 |  0.112 |  0.160 |  0.144 |  0.147 |
| 130% |  0.030 |  0.016 |  0.065 |  0.056 |  0.015 |  0.033 |  0.024 |  0.022 |
| 140% |  0.004 |  0.002 |  0.010 |  0.007 |  0.001 |  0.004 |  0.002 |  0.002 |

This is actually sort of curious.  Most of the old standbys seem to hold true – stamina and haste are both strong, though in this case haste is actually neck and neck with stamina.  Hit and expertise are still the strongest.  But dodge and parry are actually slightly beating out mastery here.  This is probably because of the interactions I talked about long ago, in beta – haste and mastery make each other better, while dodge and parry make mastery worse.  Here we have the perfect storm: relatively low haste and high dodge/parry, both of which keep mastery weak.  Another way to look at it is this: in the low-haste regime, the peak events are going to be strings of attacks that occur without much SotR coverage.  Mastery may help soften one of those attacks, at best, but does nothing for the heavy-hitters.  Avoidance helps break up those strings a little more effectively because it can eliminate one of the unmitigated hits.

In any event, for this data set we would probably argue that hit>exp>haste>stamina>dodge>parry>mastery for the purposes of smoothing.  So our metric ought to give the same results.  Let’s see if it does.

First, let’s fix $h$ at 3 and see how the result varies with the percentage of attacks we consider.  Just like last time, we’ll cherry-pick the top 1% to 10% of the histogram and calculate the metric that way, and then finish the table with the 100% (all-inclusive) calculation.

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  9250 | 18218 | 12760 |  8793 |  1840 |  4128 |  3425 |
| 0.020 | 10715 | 19426 | 13648 |  9816 |  2616 |  4605 |  3852 |
| 0.030 | 11742 | 19714 | 13846 | 10277 |  3107 |  4943 |  4069 |
| 0.040 | 10994 | 19872 | 13964 | 10395 |  2612 |  5004 |  4134 |
| 0.050 | 12034 | 20191 | 14239 | 10683 |  3977 |  5110 |  4233 |
| 0.060 | 12151 | 20226 | 14244 | 10736 |  3398 |  5146 |  4264 |
| 0.070 | 11668 | 20353 | 14364 | 10834 |  2851 |  5197 |  4306 |
| 0.080 | 11860 | 20359 | 14365 | 10865 |  3062 |  5266 |  4373 |
| 0.090 | 11529 | 20446 | 14439 | 10916 |  3124 |  5287 |  4385 |
| 0.100 | 11928 | 20469 | 14466 | 10963 |  3433 |  5319 |  4416 |
| 1.000 | 12135 | 20578 | 14559 | 11144 |  3517 |  5529 |  4599 |

Even with an HDF of 3, we’re getting edge effects in the mastery column.  The reason mastery always seems to be suffering more than the other stats is due to how mastery affects the histogram.  Increasing it slightly tends to shift the entire distribution to the left a little, especially at the top end.  As a result, it will frequently shift a chunk of the histogram over that arbitrary percentile cutoff, making the stat weight more volatile.  Of course, this effect is suppressed in the row that accounts for 100% of all events, which is one of the main reasons I think it will make for a better metric.

In any event, the general trends are all here.  Hit and expertise dominate, followed by stamina and haste.  Stamina uniformly leads haste in these weights, but not by much. Our qualitative assessment put them pretty close to one another, and I think that stamina is inching ahead due to its gains in the 100% and 80% categories.  Also keep in mind that the data table has a very coarse binning, which obscures things somewhat.  If haste’s entries tend to be near the top of a bin while stamina’s entries tend to be lower, for example, they’d look similar on the table but stamina would generate a better stat weight.  That seems to be what’s happening here, and it’s one of the reasons a more finely-grained numerical metric is more reliable than our coarse-binned qualitative method.

Let’s vary the HDF and see what happens:

pct=100.00, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  6621 |  7204 |  5395 |  6343 |  4716 |  4264 |  3572 |
| 1.6 |  6695 |  7592 |  5665 |  6367 |  4530 |  4114 |  3450 |
| 1.7 |  6768 |  7993 |  5943 |  6405 |  4348 |  3994 |  3349 |
| 1.8 |  6868 |  8435 |  6249 |  6476 |  4182 |  3910 |  3279 |
| 1.9 |  7006 |  8935 |  6594 |  6588 |  4038 |  3864 |  3240 |
| 2.0 |  7189 |  9503 |  6988 |  6747 |  3915 |  3855 |  3231 |
| 2.1 |  7423 | 10148 |  7435 |  6952 |  3812 |  3882 |  3252 |
| 2.2 |  7709 | 10876 |  7939 |  7206 |  3728 |  3941 |  3301 |
| 2.3 |  8049 | 11695 |  8505 |  7510 |  3660 |  4033 |  3376 |
| 2.4 |  8446 | 12612 |  9138 |  7864 |  3606 |  4155 |  3477 |
| 2.5 |  8900 | 13633 |  9840 |  8270 |  3566 |  4308 |  3602 |
| 2.6 |  9416 | 14766 | 10616 |  8729 |  3537 |  4490 |  3753 |
| 2.7 |  9994 | 16018 | 11471 |  9244 |  3518 |  4703 |  3928 |
| 2.8 | 10638 | 17398 | 12410 |  9816 |  3510 |  4947 |  4127 |
| 2.9 | 11350 | 18915 | 13438 | 10448 |  3509 |  5222 |  4351 |
| 3.0 | 12135 | 20578 | 14559 | 11144 |  3517 |  5529 |  4599 |
| 3.1 | 12994 | 22396 | 15781 | 11905 |  3533 |  5870 |  4874 |
| 3.2 | 13932 | 24382 | 17108 | 12735 |  3556 |  6246 |  5174 |
| 3.3 | 14953 | 26545 | 18548 | 13637 |  3586 |  6658 |  5500 |
| 3.4 | 16061 | 28897 | 20107 | 14616 |  3623 |  7108 |  5854 |
| 3.5 | 17260 | 31452 | 21792 | 15674 |  3667 |  7597 |  6236 |
| 3.6 | 18555 | 34221 | 23611 | 16816 |  3717 |  8128 |  6646 |
| 3.7 | 19949 | 37219 | 25572 | 18047 |  3775 |  8702 |  7085 |
| 3.8 | 21449 | 40460 | 27682 | 19370 |  3839 |  9321 |  7554 |
| 3.9 | 23060 | 43959 | 29951 | 20790 |  3911 |  9987 |  8053 |
| 4.0 | 24785 | 47731 | 32387 | 22313 |  3990 | 10703 |  8584 |
| 4.1 | 26632 | 51795 | 35000 | 23942 |  4076 | 11471 |  9147 |
| 4.2 | 28606 | 56166 | 37800 | 25683 |  4171 | 12293 |  9741 |
| 4.3 | 30711 | 60862 | 40797 | 27542 |  4274 | 13171 | 10369 |
| 4.4 | 32956 | 65904 | 44002 | 29523 |  4385 | 14109 | 11031 |
| 4.5 | 35345 | 71310 | 47426 | 31633 |  4505 | 15109 | 11726 |

An HDF of 2.0 is clearly too low here, as mastery starts pulling ahead of dodge and parry, and haste catches up to expertise.  2.5 is still a little low as well, since parry is barely edging ahead of mastery.  But 3.0 looks pretty solid; dodge and parry are ahead of mastery by about 1/3, and expertise pulls ahead of haste by a reasonable margin.  By the time we hit 3.5 we’ve definitely gone too far the other way, though.  Avoidance isn’t twice as good as mastery, nor is haste about 5x better than mastery.

I think anything between about 2.8 and 3.3 seems to give reasonable values that sync up pretty well with our qualitative conclusions.  The choice within that band is somewhat arbitrary, but with any luck we’ll be able to narrow it down some more by looking at other data sets.  So let’s do that.

Control/Haste+Mastery

The Control/HM set was added to explore the synergy between haste and mastery.  Back before Seal of Insight and Sacred Shield were added to the model haste and mastery were fairly close in value.  They also had a feedback-like effect on one another, in that each improved the other.  So there was speculation that splitting itemization between the two might be beneficial.  When SoI and SS were added to the simulation haste went up in value, but mastery didn’t change significantly, which put the nail in the haste+mastery coffin.  Haste was just flat-out better at that point.  Still, this set is a useful test bench because it has low avoidance but a sizable amount of both haste and mastery.

Here’s how the simulation data turns out when we start with C/HM as a baseline and add (or subtract in the case of hit/exp) 1000 of each stat.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-1

| Set: |   C/HM |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.259 |  0.259 |  0.268 |  0.265 |  0.252 |  0.253 |  0.253 |  0.253 |
|  std |  0.104 |  0.104 |  0.107 |  0.107 |  0.102 |  0.104 |  0.104 |  0.105 |
|   S% |  0.472 |  0.471 |  0.458 |  0.463 |  0.483 |  0.472 |  0.472 |  0.472 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 46.061 | 45.590 | 49.014 | 48.251 | 43.967 | 45.005 | 44.244 | 44.295 |
|  60% | 29.677 | 26.089 | 32.732 | 31.949 | 27.267 | 27.601 | 27.981 | 28.078 |
|  70% | 14.807 | 13.454 | 17.204 | 16.728 | 13.011 | 13.599 | 13.664 | 13.856 |
|  80% |  6.978 |  6.225 |  8.656 |  8.181 |  5.906 |  6.644 |  6.380 |  6.496 |
|  90% |  3.061 |  1.813 |  4.178 |  3.907 |  2.468 |  2.432 |  2.773 |  2.883 |
| 100% |  0.830 |  0.554 |  1.319 |  1.190 |  0.627 |  0.681 |  0.734 |  0.786 |
| 110% |  0.198 |  0.194 |  0.361 |  0.293 |  0.130 |  0.174 |  0.145 |  0.178 |
| 120% |  0.068 |  0.040 |  0.152 |  0.110 |  0.041 |  0.065 |  0.054 |  0.056 |
| 130% |  0.006 |  0.003 |  0.030 |  0.019 |  0.002 |  0.006 |  0.005 |  0.006 |
| 140% |  0.001 |  0.000 |  0.006 |  0.002 |  0.001 |  0.000 |  0.001 |  0.000 |

Hit and expertise are still going to be clear winners here, with haste and stamina both coming in at a distant third.  Curiously, parry and mastery seem pretty well-matched here, though mastery seems to have a slight edge.  Dodge, on the other hand, is still a little ahead of mastery in this set.  I think what we’re seeing here is a sort of innate diminishing returns on mastery – if you don’t have a lot, then it’s pretty strong.  But as you get more of it, the benefit to SotR is less and less important because the hits it mitigates already fall in the middle of the histogram.  The peaks we’re seeing at the top here are mostly affected by the block chance contribution, which is fairly weak.

In any event, the results seem pretty clear from this data:
hit>exp>(haste/stamina)>dodge>mastery>parry
Let’s see if the metric agrees.

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |  Exp | Haste |  Mast | Dodge | Parry |
| 0.010 |  4663 | 14455 | 7289 |  3799 |  1959 |  2239 |  1271 |
| 0.020 |  4828 | 15236 | 7857 |  4365 |  1745 |  2533 |  1488 |
| 0.030 |  5500 | 15445 | 8035 |  4458 |  2361 |  2570 |  1543 |
| 0.040 |  4629 | 15718 | 8227 |  4619 |  2082 |  2663 |  1614 |
| 0.050 |  5237 | 15818 | 8285 |  4763 |  2229 |  2745 |  1679 |
| 0.060 |  4984 | 15884 | 8316 |  4784 |  2045 |  2766 |  1700 |
| 0.070 |  4578 | 15900 | 8330 |  4845 |  2068 |  2808 |  1767 |
| 0.080 |  4840 | 15976 | 8377 |  4910 |  2159 |  2832 |  1794 |
| 0.090 |  5303 | 16036 | 8445 |  4978 |  2449 |  2889 |  1832 |
| 0.100 |  5303 | 16036 | 8445 |  4978 |  2449 |  2889 |  1832 |
| 1.000 |  5150 | 16122 | 8525 |  5070 |  2382 |  3002 |  1955 |

I’m not sure these “top X% of histogram” rows are going to be very useful.  They just keep re-affirming that edge effects matter a lot, and we really can’t have that degree of volatility in the metric.  Mastery is uniformly behind Dodge in each row, but the value fluctuates an lot more than I’m comfortable with.  That’s going to introduce noise, which makes our estimates less accurate, and that’s no good.  Luckily, the 100% row seems to match our expectations: hit and expertise far ahead, stamina and haste very close, dodge beating mastery, and parry bringing up the rear.

Let’s see how HDF affects that ordering.

pct=100.00, N=200, vary hdf

| hdf |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 1.5 |  4336 |  7316 |  5227 |  4571 |  3162 |  3086 |  2607 |
| 1.6 |  4257 |  7457 |  5260 |  4464 |  2963 |  2933 |  2404 |
| 1.7 |  4178 |  7618 |  5294 |  4365 |  2783 |  2803 |  2231 |
| 1.8 |  4114 |  7824 |  5349 |  4286 |  2630 |  2701 |  2090 |
| 1.9 |  4073 |  8087 |  5431 |  4234 |  2505 |  2627 |  1978 |
| 2.0 |  4057 |  8414 |  5544 |  4207 |  2406 |  2578 |  1893 |
| 2.1 |  4065 |  8810 |  5690 |  4206 |  2330 |  2552 |  1831 |
| 2.2 |  4097 |  9279 |  5868 |  4229 |  2274 |  2545 |  1788 |
| 2.3 |  4152 |  9823 |  6080 |  4274 |  2238 |  2556 |  1764 |
| 2.4 |  4230 | 10446 |  6325 |  4338 |  2218 |  2582 |  1755 |
| 2.5 |  4330 | 11154 |  6604 |  4422 |  2213 |  2623 |  1760 |
| 2.6 |  4452 | 11949 |  6917 |  4522 |  2222 |  2677 |  1777 |
| 2.7 |  4594 | 12838 |  7265 |  4638 |  2244 |  2742 |  1806 |
| 2.8 |  4758 | 13826 |  7648 |  4769 |  2279 |  2819 |  1845 |
| 2.9 |  4943 | 14919 |  8068 |  4913 |  2325 |  2906 |  1895 |
| 3.0 |  5150 | 16122 |  8525 |  5070 |  2382 |  3002 |  1955 |
| 3.1 |  5378 | 17445 |  9021 |  5238 |  2450 |  3108 |  2023 |
| 3.2 |  5628 | 18892 |  9556 |  5416 |  2530 |  3223 |  2100 |
| 3.3 |  5901 | 20474 | 10132 |  5604 |  2620 |  3347 |  2187 |
| 3.4 |  6197 | 22197 | 10751 |  5799 |  2721 |  3478 |  2281 |
| 3.5 |  6517 | 24071 | 11414 |  6002 |  2833 |  3618 |  2385 |
| 3.6 |  6862 | 26104 | 12123 |  6210 |  2956 |  3766 |  2496 |
| 3.7 |  7232 | 28308 | 12879 |  6423 |  3090 |  3922 |  2616 |
| 3.8 |  7629 | 30691 | 13684 |  6640 |  3236 |  4086 |  2745 |
| 3.9 |  8054 | 33265 | 14540 |  6858 |  3394 |  4257 |  2881 |
| 4.0 |  8506 | 36041 | 15449 |  7077 |  3565 |  4436 |  3027 |
| 4.1 |  8989 | 39030 | 16412 |  7295 |  3748 |  4623 |  3181 |
| 4.2 |  9502 | 42245 | 17433 |  7509 |  3944 |  4817 |  3343 |
| 4.3 | 10047 | 45697 | 18512 |  7719 |  4154 |  5019 |  3514 |
| 4.4 | 10625 | 49401 | 19652 |  7923 |  4378 |  5228 |  3694 |
| 4.5 | 11238 | 53370 | 20855 |  8117 |  4616 |  5444 |  3883 |

The lower HDF values perform a little better here than they did with the last set. Even $h=2.5$ looks reasonable, though I think it undervalues dodge a little bit.  3.0 still looks fine, but by 3.3 we’re seeing a little more of a gap between stamina and haste than should probably exist.  I’d probably put the upper limit on this data set at 3.2 or 3.3, which narrows our range a little bit more.

Avoidance

Finally, let’s try an avoidance set.  This is the set we wouldn’t generally use because it fares poorly in smoothness tests.  You can see on the table below that it permits spikes up to 160% of your health while the other sets generally cap out at around 140%.  Nonetheless, it will be a good thing to check this gear set as well because it represents another extreme that the metric may have to deal with.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-2

| Set: |  Avoid |   Stam |    Hit |    Exp |  Haste |   Mast |  Dodge |  Parry |
| mean |  0.277 |  0.278 |  0.283 |  0.283 |  0.271 |  0.273 |  0.271 |  0.272 |
|  std |  0.138 |  0.138 |  0.141 |  0.139 |  0.136 |  0.136 |  0.137 |  0.138 |
|   S% |  0.366 |  0.367 |  0.355 |  0.359 |  0.374 |  0.366 |  0.367 |  0.367 |
|   HP |   755k |   775k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.215 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ |
|  50% | 51.268 | 51.107 | 52.768 | 52.609 | 49.450 | 50.602 | 49.814 | 49.760 |
|  60% | 38.659 | 37.795 | 40.442 | 40.063 | 36.801 | 37.879 | 37.169 | 37.206 |
|  70% | 26.734 | 24.499 | 28.408 | 28.149 | 25.071 | 25.590 | 25.468 | 25.477 |
|  80% | 16.868 | 14.017 | 18.432 | 18.075 | 15.631 | 14.414 | 15.962 | 15.998 |
|  90% |  8.755 |  7.228 |  9.910 |  9.537 |  7.979 |  7.960 |  8.154 |  8.225 |
| 100% |  3.687 |  3.517 |  4.398 |  4.205 |  3.312 |  3.586 |  3.453 |  3.465 |
| 110% |  1.665 |  1.526 |  2.110 |  1.949 |  1.470 |  1.573 |  1.558 |  1.556 |
| 120% |  0.791 |  0.699 |  1.045 |  0.955 |  0.696 |  0.730 |  0.728 |  0.748 |
| 130% |  0.347 |  0.259 |  0.483 |  0.452 |  0.296 |  0.330 |  0.324 |  0.316 |
| 140% |  0.130 |  0.096 |  0.186 |  0.179 |  0.101 |  0.125 |  0.116 |  0.119 |
| 150% |  0.025 |  0.030 |  0.041 |  0.039 |  0.020 |  0.025 |  0.020 |  0.024 |
| 160% |  0.002 |  0.000 |  0.006 |  0.006 |  0.003 |  0.003 |  0.002 |  0.002 |

Stamina fares very well in the avoidance set, though it’s a little hard to tell because of the coarse binning.  Emptying the top category is very strong, as are the gains in the 130% and 140% categories, though the representation in the 150% category actually goes up.  Haste lags stamina slightly, followed by dodge and parry, and mastery brings up the rear once again.  So unsurprisingly, we expect the usual hit>exp>stamina>haste>dodge>parry>mastery ordering.  Just for completeness, let’s look at the percentile table:

hdf=3.00, N=200, vary pct

|   pct |  Stam |   Hit |   Exp | Haste |  Mast | Dodge | Parry |
| 0.010 | 43283 | 77820 | 60492 | 24904 |  5891 | 17175 |  9204 |
| 0.020 | 51380 | 81516 | 63535 | 27400 | 11212 | 18623 | 10511 |
| 0.030 | 47283 | 81644 | 63646 | 27787 |  8947 | 18860 | 10753 |
| 0.040 | 52137 | 82979 | 64524 | 28396 | 11561 | 19467 | 11190 |
| 0.050 | 52137 | 82979 | 64524 | 28396 | 11561 | 19467 | 11190 |
| 0.060 | 50710 | 83411 | 64838 | 28795 |  9867 | 19706 | 11514 |
| 0.070 | 49724 | 83484 | 64878 | 28779 |  9012 | 19719 | 11556 |
| 0.080 | 50326 | 83484 | 64824 | 28896 | 10539 | 19831 | 11641 |
| 0.090 | 51530 | 83674 | 65085 | 29072 | 10647 | 19890 | 11726 |
| 0.100 | 50357 | 83695 | 65116 | 29072 |  9548 | 19884 | 11720 |
| 1.000 | 50785 | 83883 | 65288 | 29371 | 10093 | 20198 | 12017 |

Nothing too surprising here.  The lead stamina has is a little larger than I would have expected, but is probably due to the complete elimination of the top category.  The slight loss in the 150% category doesn’t seem to have hurt it too much.  Otherwise though, we get exactly the ordering we expected, and we see mastery struggling with edge effects throughout the upper regions of the table.  Every one of these tables is throwing more fuel on the fire for the percentile-limited calculation; we’ll be burning it in effigy soon enough, I think.

On to the more interesting table, where we vary HDF for the all-inclusive calculation.

pct=100.00, N=200, vary hdf

| hdf |   Stam |    Hit |    Exp |  Haste |  Mast |  Dodge | Parry |
| 1.5 |   8205 |   8966 |   6782 |   5935 |  4121 |   4408 |  3988 |
| 1.6 |   9072 |  10325 |   7746 |   6384 |  4330 |   4617 |  4090 |
| 1.7 |  10048 |  11896 |   8883 |   6908 |  4547 |   4884 |  4229 |
| 1.8 |  11178 |  13743 |  10241 |   7532 |  4785 |   5226 |  4416 |
| 1.9 |  12497 |  15921 |  11865 |   8275 |  5052 |   5656 |  4658 |
| 2.0 |  14041 |  18493 |  13805 |   9154 |  5349 |   6185 |  4958 |
| 2.1 |  15843 |  21523 |  16111 |  10183 |  5679 |   6824 |  5320 |
| 2.2 |  17940 |  25081 |  18842 |  11381 |  6042 |   7587 |  5747 |
| 2.3 |  20371 |  29245 |  22061 |  12763 |  6439 |   8487 |  6242 |
| 2.4 |  23178 |  34101 |  25839 |  14349 |  6870 |   9540 |  6810 |
| 2.5 |  26406 |  39746 |  30256 |  16159 |  7333 |  10765 |  7455 |
| 2.6 |  30107 |  46285 |  35397 |  18214 |  7828 |  12179 |  8182 |
| 2.7 |  34333 |  53834 |  41359 |  20537 |  8354 |  13805 |  8996 |
| 2.8 |  39145 |  62521 |  48249 |  23154 |  8908 |  15666 |  9903 |
| 2.9 |  44606 |  72486 |  56183 |  26089 |  9489 |  17788 | 10907 |
| 3.0 |  50785 |  83883 |  65288 |  29371 | 10093 |  20198 | 12017 |
| 3.1 |  57757 |  96879 |  75705 |  33028 | 10717 |  22925 | 13237 |
| 3.2 |  65601 | 111656 |  87584 |  37093 | 11357 |  26004 | 14574 |
| 3.3 |  74406 | 128413 | 101091 |  41597 | 12009 |  29468 | 16035 |
| 3.4 |  84262 | 147364 | 116407 |  46575 | 12666 |  33355 | 17628 |
| 3.5 |  95271 | 168741 | 133725 |  52063 | 13323 |  37705 | 19359 |
| 3.6 | 107538 | 192796 | 153256 |  58099 | 13973 |  42562 | 21237 |
| 3.7 | 121178 | 219800 | 175227 |  64721 | 14607 |  47972 | 23268 |
| 3.8 | 136313 | 250042 | 199882 |  71971 | 15216 |  53984 | 25461 |
| 3.9 | 153071 | 283837 | 227485 |  79892 | 15790 |  60651 | 27824 |
| 4.0 | 171591 | 321520 | 258318 |  88529 | 16318 |  68028 | 30364 |
| 4.1 | 192021 | 363451 | 292684 |  97927 | 16786 |  76176 | 33091 |
| 4.2 | 214517 | 410013 | 330906 | 108135 | 17181 |  85156 | 36013 |
| 4.3 | 239244 | 461619 | 373333 | 119204 | 17488 |  95037 | 39139 |
| 4.4 | 266378 | 518705 | 420333 | 131183 | 17688 | 105889 | 42476 |
| 4.5 | 296105 | 581740 | 472301 | 144128 | 17764 | 117788 | 46036 |

This table raises our lower bound a little bit.  Even 2.7 is problematic here, as mastery and parry are still neck and neck despite a strong lead by parry in the data.  I’d go as far as to say that 2.9 is still off in the relative weighting of mastery and parry, though it’s passable.  On the upper end, we have a nearly 2:1 lead of stamina over haste by the time we reach $h=3.3$, which seems excessive.  I like the 5:3 ratio that we get around 3.0, even though I think that still inflates stamina a bit.  But 3.0 seems like the best compromise on this table.

Gear sets and TMI

Now let’s shift gears and take a look at entire gear sets rather than stat weights.  While stat weights are probably the most common way people will use the metric, it’s also important that it holds integrity for entire gear sets.   In fact, that’s how we usually perform our qualitative analysis – pick a bunch of representative gear sets and compare the histograms.

This is necessary because of the many interactions different stats have with one another.  We’ve already seen that mastery’s value depends on the rest of your stats – in some gear sets it rises ahead of avoidance, in others it trails avoidance.  So it’s not unreasonable to expect that you might get different stat weights in highly-disparate gear sets.  But by comparing those gear sets directly, we can tell which one does a better job of smoothing.

Just to reiterate, here are the stats of each gear set:

|   Set: |   Str |   Sta | Parry | Dodge | Mastery |  Hit |  Exp | Haste |
|   C/Ha | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 | 12000 |
|   C/St | 15000 | 34000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|   C/Sg | 15000 | 31000 |  1500 |  1500 |    1500 | 2550 | 5100 |  8000 |
|  C/Shm | 15000 | 31000 |  1500 |  1500 |    4750 | 2550 | 5100 |  4750 |
|   C/Ma | 15000 | 28000 |  1500 |  1500 |   13500 | 2550 | 5100 |     0 |
|   C/Av | 15000 | 28000 |  7500 |  7500 |    1500 | 2550 | 5100 |     0 |
|  C/Bal | 15000 | 28000 |  4125 |  4125 |    4125 | 2550 | 5100 |  4125 |
|   C/HM | 15000 | 28000 |  1500 |  1500 |    6750 | 2550 | 5100 |  6750 |
|     Ha | 15000 | 28000 |  1500 |  1500 |    1500 |  500 |  500 | 18650 |
|  Avoid | 15000 | 28000 | 10825 | 10825 |    1500 |  500 |  500 |     0 |
| Av/Mas | 15000 | 28000 |  7717 |  7717 |    7716 |  500 |  500 |     0 |
| Mas/Av | 15000 | 28000 |  4000 |  4000 |   15150 |  500 |  500 |     0 |
|   Ha/h | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 |  500 | 16600 |
|  Ha/he | 15000 | 28000 |  1500 |  1500 |    1500 | 2550 | 2550 | 14550 |
|  C/Str | 27600 | 28000 |  1500 |  1500 |    1500 | 2550 | 5100 |     0 |

And here’s the data set we generate with them.  This table isn’t transposed because it’s unwieldy no matter which way we format it, and it’ll be easier to read if we keep things consistent with earlier tables.

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set smooth-10000-0

| Set: |   C/Ha |   C/St |   C/Sg |  C/Shm |   C/Ma |   C/Av |  C/Bal |   C/HM |     Ha |  Avoid | Av/Mas | Mas/Av |   Ha/h |  Ha/he |  C/Str |
| mean |  0.262 |  0.285 |  0.286 |  0.297 |  0.278 |  0.280 |  0.274 |  0.259 |  0.269 |  0.278 |  0.283 |  0.287 |  0.263 |  0.264 |  0.270 |
|  std |  0.103 |  0.106 |  0.106 |  0.109 |  0.107 |  0.123 |  0.114 |  0.105 |  0.110 |  0.138 |  0.132 |  0.124 |  0.108 |  0.108 |  0.127 |
|   S% |  0.522 |  0.483 |  0.482 |  0.455 |  0.410 |  0.420 |  0.452 |  0.472 |  0.499 |  0.366 |  0.362 |  0.358 |  0.519 |  0.520 |  0.419 |
|   HP |   755k |   876k |   816k |   816k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.504 |  2.331 |  2.331 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|  50% | 46.138 | 41.369 | 48.197 | 53.342 | 53.604 | 52.688 | 51.264 | 46.218 | 48.634 | 51.389 | 53.786 | 54.764 | 46.584 | 46.965 | 47.688 |
|  60% | 29.956 | 22.394 | 29.557 | 31.895 | 35.512 | 38.205 | 35.731 | 29.593 | 33.093 | 38.767 | 40.584 | 38.962 | 31.072 | 31.242 | 33.733 |
|  70% | 16.220 | 10.126 | 15.830 | 17.885 | 19.880 | 24.647 | 21.671 | 14.841 | 19.265 | 26.914 | 25.468 | 25.584 | 17.745 | 17.761 | 22.163 |
|  80% |  7.390 |  3.221 |  5.890 |  6.899 |  9.238 | 14.172 |  9.097 |  7.025 | 10.267 | 17.109 | 15.024 | 13.988 |  9.010 |  8.805 | 13.043 |
|  90% |  2.529 |  0.773 |  1.954 |  2.387 |  3.480 |  6.556 |  3.996 |  3.120 |  4.449 |  8.868 |  8.461 |  7.155 |  3.517 |  3.430 |  6.549 |
| 100% |  0.635 |  0.141 |  0.545 |  0.779 |  1.579 |  2.012 |  1.434 |  0.848 |  1.463 |  3.752 |  3.719 |  3.684 |  0.992 |  0.972 |  2.500 |
| 110% |  0.104 |  0.025 |  0.101 |  0.235 |  0.506 |  0.713 |  0.398 |  0.194 |  0.483 |  1.707 |  1.654 |  1.706 |  0.257 |  0.234 |  0.882 |
| 120% |  0.023 |  0.001 |  0.010 |  0.035 |  0.169 |  0.280 |  0.163 |  0.067 |  0.168 |  0.824 |  0.883 |  0.806 |  0.084 |  0.074 |  0.282 |
| 130% |  0.002 |  0.000 |  0.000 |  0.005 |  0.029 |  0.050 |  0.029 |  0.008 |  0.043 |  0.373 |  0.418 |  0.341 |  0.017 |  0.010 |  0.066 |
| 140% |  0.000 |  0.000 |  0.000 |  0.000 |  0.003 |  0.005 |  0.003 |  0.000 |  0.010 |  0.131 |  0.136 |  0.111 |  0.003 |  0.001 |  0.009 |
| 150% |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.002 |  0.028 |  0.024 |  0.013 |  0.000 |  0.000 |  0.001 |
| 160% |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.000 |  0.003 |  0.005 |  0.002 |  0.000 |  0.000 |  0.000 |

Not too much we didn’t expect here.  The Control/Stamina sets beat out Control/Haste.  Control/Shm and Control/HM both sacrifice haste for mastery, and thus give slightly lower performance.  Then we have Control/Balance and Control/Mastery lagging a fair bit, with Control/Avoidance and Control/Strength bringing up the rear of the control subset.

We also have a few sets that focus on Haste and forego hit/exp caps.  Pure haste (Ha) is a little better than Control/Avoidance, while the hit-capped only (Ha/h) and soft-expertise capped (Ha/he) sets fall somewhere between C/HM and C/Bal.

Finally, the avoidance sets perform poorly as usual.  Out of the three, we see the same trends we saw with the avoidance gear set stat weights: shifting a little value into mastery doesn’t tend to make a lot of difference (Av/Mas), but shifting to a heavy mastery focus does (Mas/Av).

So in general, we expect the following rough ordering, where near-ties are combined in parentheses:

C/St > C/Sg > C/Ha > ( C/Shm , C/HM ) > Ha/he > Ha/h > ( C/Bal ,  C/Ma ) > Ha > C/Av > C/Str > Mas/Av
> ( Av/Mas , Av )

Let’s see if those predictions hold up in the percentile table.  What I’m showing here is raw TMI score, not a relative (i.e. differential) comparison.  So the lower the number, the smoother the damage intake.  I’m also showing only a few partial percents, partly for readability and partly because by now we know they’re not that useful anyway.  Note that I’ve transposed this from the usual format to make it easier to read.

hdf=3.00, N=200, vary pct

|  pct-> |     1% |     5% |    10% |   100% |
|   C/Ha |   8038 |  12804 |  15018 |  18332 |
|   C/St |   1575 |   3581 |   4846 |   7895 |
|   C/Sg |   6702 |  10846 |  12398 |  16102 |
|  C/Shm |  12242 |  16914 |  18980 |  22631 |
|   C/Ma |  31660 |  36092 |  37874 |  41994 |
|   C/Av |  49031 |  57363 |  60248 |  63949 |
|  C/Bal |  28342 |  34727 |  36801 |  40468 |
|   C/HM |  13505 |  18234 |  19490 |  23096 |
|     Ha |  37936 |  44175 |  46514 |  49835 |
|  Avoid | 216443 | 225819 | 228283 | 231586 |
| Av/Mas | 214205 | 223790 | 225309 | 229068 |
| Mas/Av | 176964 | 184269 | 186410 | 190308 |
|   Ha/h |  19947 |  25642 |  27870 |  31126 |
|  Ha/he |  16537 |  22207 |  24496 |  27795 |
|  C/Str |  53352 |  60420 |  62517 |  66023 |

Just to reiterate: I’m going to be looking at the last column, primarily.  C/St is the clear leader by a pretty big margin, just as it is on the data table.  C/Sg and C/Ha come in next, fairly close to one another (but with the stamina gear set still holding a slight lead).  The next two sets are C/Shm and C/HM, again clustered together.  We then have another medium-sized gap before reaching Ha/he and Ha/h, and a slightly larger gap before we reach C/Bal and C/Ma.

As we round out the bottom, our predictions look pretty stable.  C/Av and C/Str lag the previous sets by a large chunk, and then we have a huge jump (nearly a factor of 4) from C/Str to the closest avoidance-based set, which is Mas/Av.  Bringing up the rear, Av/Mas and Avoid are in a dead heat for last place in a race that’s nearly too close to call, mimicking the qualitative assessment.

As a final consideration, let’s look at how these results vary with HDF:

pct=100.00, N=200, vary hdf

|  hdf-> |    2.5 |    2.6 |    2.7 |    2.8 |    2.9 |    3.0 |    3.1 |    3.2 |    3.3 |    3.4 |    3.5 |
|   C/Ha |  22029 |  20960 |  20086 |  19374 |  18796 |  18332 |  17965 |  17681 |  17471 |  17326 |  17239 |
|   C/St |  11603 |  10592 |   9745 |   9029 |   8418 |   7895 |   7442 |   7050 |   6707 |   6406 |   6142 |
|   C/Sg |  20089 |  18967 |  18040 |  17271 |  16632 |  16102 |  15665 |  15305 |  15013 |  14778 |  14595 |
|  C/Shm |  25431 |  24500 |  23787 |  23256 |  22879 |  22631 |  22497 |  22461 |  22513 |  22644 |  22845 |
|   C/Ma |  39112 |  39167 |  39513 |  40115 |  40948 |  41994 |  43239 |  44673 |  46287 |  48077 |  50040 |
|   C/Av |  55920 |  56821 |  58097 |  59722 |  61677 |  63949 |  66529 |  69415 |  72604 |  76099 |  79901 |
|  C/Bal |  38329 |  38228 |  38419 |  38870 |  39559 |  40468 |  41584 |  42898 |  44405 |  46101 |  47984 |
|   C/HM |  24980 |  24249 |  23725 |  23374 |  23171 |  23096 |  23134 |  23274 |  23506 |  23822 |  24217 |
|     Ha |  42774 |  43458 |  44522 |  45945 |  47717 |  49835 |  52302 |  55122 |  58308 |  61873 |  65833 |
|  Avoid | 138960 | 152806 | 168802 | 187126 | 207980 | 231586 | 258195 | 288075 | 321520 | 358849 | 400403 |
| Av/Mas | 136285 | 150110 | 166106 | 184456 | 205366 | 229068 | 255817 | 285896 | 319612 | 357297 | 399312 |
| Mas/Av | 119574 | 130257 | 142561 | 156587 | 172457 | 190308 | 210295 | 232586 | 257366 | 284835 | 315207 |
|   Ha/h |  30841 |  30433 |  30283 |  30362 |  30647 |  31126 |  31788 |  32627 |  33639 |  34824 |  36184 |
|  Ha/he |  29173 |  28501 |  28056 |  27806 |  27725 |  27795 |  28001 |  28332 |  28780 |  29339 |  30003 |
|  C/Str |  57433 |  58491 |  59901 |  61636 |  63681 |  66023 |  68655 |  71573 |  74778 |  78271 |  82057 |

There isn’t a lot of variation here to consider, as the relative ordering of entire gear sets seems pretty stable with HDF.  The main thing that I notice is that the gap between C/Sg and C/Ha grows (in a relative sense) as $h$ increases.  This isn’t a huge surprise, since stamina performs a slight “shift” of the entire histogram to the left, and will thus benefit more from a high HDF.  What this tells us is that the higher our HDF, the more valuable stamina becomes. We also have to be careful that we don’t over-value stamina by overshooting our HDF.

Repeatability

Finally, there’s one more test we want to run.  It’s all well and good to have one set of data, but how reliable is the result?  If one simulation of 10k-minute duration varies significantly from the next, it will be hard to trust the number that the metric spits out.

To test this, we’ll take the C/Ha gear set and run it through the simulation 20 times.  Then we can use the metric to analyze each trial and see how much variation we get.  Again, apologies for the inconveniently-sized table, but here’s all the raw data:

Finisher = SH1, Boss Attack = 350k, SoI model=nooverheal data set metric-10000-6

| Set: |    #1  |    #2  |    #3  |    #4  |    #5  |    #6  |    #7  |    #8  |    #9  |    #10 |    #11 |    #12 |    #13 |    #14 |    #15 |    #16 |    #17 |    #18 |    #19 |    #20 |
| mean |  0.261 |  0.261 |  0.261 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.261 |  0.262 |  0.261 |  0.261 |  0.261 |  0.262 |
|  std |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.104 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |  0.103 |
|   S% |  0.523 |  0.522 |  0.522 |  0.523 |  0.523 |  0.522 |  0.522 |  0.523 |  0.523 |  0.522 |  0.522 |  0.522 |  0.523 |  0.522 |  0.523 |  0.523 |  0.523 |  0.522 |  0.523 |  0.522 |
|   HP |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |   755k |
|  nHP |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |  2.158 |
| ---- | ------ |  --- 4 | Attack | Moving | Avg.-- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|  50% | 45.745 | 46.052 | 45.868 | 46.034 | 45.850 | 46.032 | 46.098 | 45.972 | 45.910 | 46.047 | 46.304 | 45.975 | 45.798 | 45.858 | 45.905 | 46.151 | 45.889 | 46.049 | 45.925 | 46.084 |
|  60% | 29.662 | 29.905 | 29.794 | 29.974 | 29.726 | 29.819 | 29.994 | 29.904 | 29.822 | 29.923 | 30.122 | 29.874 | 29.657 | 29.663 | 29.776 | 30.094 | 29.805 | 29.927 | 29.791 | 29.966 |
|  70% | 15.974 | 16.225 | 16.162 | 16.226 | 16.026 | 16.180 | 16.231 | 16.312 | 16.124 | 16.177 | 16.312 | 16.058 | 15.975 | 15.999 | 16.103 | 16.291 | 16.122 | 16.190 | 16.131 | 16.288 |
|  80% |  7.133 |  7.372 |  7.297 |  7.282 |  7.309 |  7.353 |  7.356 |  7.379 |  7.320 |  7.312 |  7.365 |  7.242 |  7.188 |  7.236 |  7.301 |  7.369 |  7.289 |  7.348 |  7.279 |  7.334 |
|  90% |  2.455 |  2.495 |  2.502 |  2.499 |  2.493 |  2.523 |  2.552 |  2.574 |  2.512 |  2.502 |  2.530 |  2.469 |  2.460 |  2.470 |  2.472 |  2.533 |  2.482 |  2.517 |  2.490 |  2.535 |
| 100% |  0.600 |  0.618 |  0.623 |  0.604 |  0.615 |  0.628 |  0.645 |  0.645 |  0.622 |  0.608 |  0.628 |  0.625 |  0.589 |  0.588 |  0.607 |  0.613 |  0.618 |  0.626 |  0.621 |  0.625 |
| 110% |  0.095 |  0.096 |  0.096 |  0.101 |  0.095 |  0.102 |  0.106 |  0.103 |  0.099 |  0.097 |  0.088 |  0.100 |  0.096 |  0.096 |  0.092 |  0.112 |  0.109 |  0.101 |  0.102 |  0.103 |
| 120% |  0.023 |  0.021 |  0.024 |  0.023 |  0.024 |  0.024 |  0.025 |  0.021 |  0.021 |  0.023 |  0.021 |  0.025 |  0.019 |  0.023 |  0.020 |  0.027 |  0.026 |  0.023 |  0.026 |  0.026 |
| 130% |  0.001 |  0.002 |  0.001 |  0.001 |  0.001 |  0.001 |  0.001 |  0.001 |  0.000 |  0.002 |  0.001 |  0.003 |  0.002 |  0.002 |  0.001 |  0.002 |  0.001 |  0.002 |  0.003 |  0.002 |

As I suggested earlier, we’re seeing changes on the order of +/- 0.001 to 0.002 in the data, which is just simulation noise.  Sometimes you get lucky and don’t see a dangerous string, sometimes you get unlucky and see several.  We want to make sure that the metric isn’t too sensitive to this, but that it’s still sensitive to small changes that are meaningful (like small changes in gear set).

Let’s look at the percentile breakdown for a moment and calculate the mean, standard deviation, and standard deviation of the mean for each trial.

hdf=3.00, N=200, vary pct

|    trial |     1% |     5% |    10% |   100% |
|       #1 |   7437 |  12922 |  14369 |  17573 |
|       #2 |   7654 |  13282 |  14735 |  17955 |
|       #3 |   7668 |  13278 |  14728 |  17938 |
|       #4 |   7578 |  13171 |  14628 |  17852 |
|       #5 |   7661 |  13273 |  14710 |  17902 |
|       #6 |   7935 |  13584 |  15041 |  18249 |
|       #7 |   8116 |  13736 |  15187 |  18417 |
|       #8 |   7922 |  13569 |  15033 |  18251 |
|       #9 |   7740 |  13313 |  14759 |  17968 |
|      #10 |   7858 |  13447 |  14901 |  18120 |
|      #11 |   7553 |  13197 |  14653 |  17892 |
|      #12 |   8642 |  14200 |  15657 |  18867 |
|      #13 |   7408 |  12945 |  14380 |  17583 |
|      #14 |   7491 |  13068 |  14521 |  17719 |
|      #15 |   7370 |  12949 |  14394 |  17603 |
|      #16 |   8135 |  13766 |  15243 |  18471 |
|      #17 |   7898 |  13458 |  14932 |  18132 |
|      #18 |   7932 |  13549 |  15005 |  18217 |
|      #19 |   8152 |  13707 |  15168 |  18372 |
|      #20 |   8023 |  13651 |  15105 |  18335 |
|     mean |   7809 |  13403 |  14857 |  18071 |
|      std | 315.72 | 326.88 | 332.50 | 336.03 |
| std_mean |  15.79 |  16.34 |  16.63 |  16.80 |
|  pct_var |   0.20 |   0.12 |   0.11 |   0.09 |

This looks pretty good.  The standard deviation is pretty reasonable at around 2% of the mean value once we consider more than a few percent of the data.  The standard deviation of the mean is a factor of 20 smaller (by definition), which puts it at about 0.1% of the mean value (this is the pct_var column).

In other words, if we do 20 runs of 10k minutes each, we expect to get a mean value that is 18071 +/- 16.8.  That’s really good news.  It means that we can get meaningful, reliable results out of this metric without having to simulate for ages.  Keep in mind that 200k minutes of combat is still less than what Simcraft can do; a standard high-precision Simcraft run is 50k iterations of ~450 seconds of combat each, which is 375k minutes of combat.

You might ask how HDF choice affects the repeatability.  Ask and ye shall receive.  (Actually, you’re getting it whether you asked for it or not).

pct=100.00, N=200, vary hdf

| hdf |  mean |     std | std_mean | pct_var |
| 1.5 | 71712 |  237.18 |    11.86 |    0.02 |
| 1.6 | 58274 |  227.64 |    11.38 |    0.02 |
| 1.7 | 48710 |  219.53 |    10.98 |    0.02 |
| 1.8 | 41699 |  213.32 |    10.67 |    0.03 |
| 1.9 | 36430 |  209.14 |    10.46 |    0.03 |
| 2.0 | 32388 |  206.99 |    10.35 |    0.03 |
| 2.1 | 29234 |  206.90 |    10.34 |    0.04 |
| 2.2 | 26736 |  208.94 |    10.45 |    0.04 |
| 2.3 | 24736 |  213.26 |    10.66 |    0.04 |
| 2.4 | 23119 |  220.08 |    11.00 |    0.05 |
| 2.5 | 21804 |  229.70 |    11.49 |    0.05 |
| 2.6 | 20730 |  242.52 |    12.13 |    0.06 |
| 2.7 | 19850 |  259.00 |    12.95 |    0.07 |
| 2.8 | 19130 |  279.67 |    13.98 |    0.07 |
| 2.9 | 18544 |  305.13 |    15.26 |    0.08 |
| 3.0 | 18071 |  336.03 |    16.80 |    0.09 |
| 3.1 | 17693 |  373.07 |    18.65 |    0.11 |
| 3.2 | 17399 |  417.01 |    20.85 |    0.12 |
| 3.3 | 17178 |  468.69 |    23.43 |    0.14 |
| 3.4 | 17021 |  528.96 |    26.45 |    0.16 |
| 3.5 | 16921 |  598.80 |    29.94 |    0.18 |
| 3.6 | 16874 |  679.24 |    33.96 |    0.20 |
| 3.7 | 16874 |  771.38 |    38.57 |    0.23 |
| 3.8 | 16918 |  876.46 |    43.82 |    0.26 |
| 3.9 | 17003 |  995.77 |    49.79 |    0.29 |
| 4.0 | 17126 | 1130.74 |    56.54 |    0.33 |
| 4.1 | 17287 | 1282.90 |    64.14 |    0.37 |
| 4.2 | 17482 | 1453.89 |    72.69 |    0.42 |
| 4.3 | 17711 | 1645.50 |    82.28 |    0.46 |
| 4.4 | 17973 | 1859.63 |    92.98 |    0.52 |
| 4.5 | 18268 | 2098.31 |   104.92 |    0.57 |

Perhaps predictably, reducing the HDF also reduces the variation we get in the metric.  I say “predictably” because the HDF is essentially representing “how much we value outliers.”  We already know that we’re susceptible to noise at high HDF because it exaggerates stray high-lying spikes, and this is exactly the same principle at work. A higher HDF will cause those small statistical fluctuations at the very top of the distribution to be more important, and thus increase the variation in our mean value.

This is another reason we want to keep our HDF as low as possible while still maintaining good discrimination between different data sets.  The lower the HDF, the more quickly the metric will converge and the less integration time we need to get a suitable estimate.

Conclusions

So far in this post I’ve mostly been repeating the same things over and over.  That’s to be expected, because the point of this post wasn’t to discover so much as to validate the metric.  The ideal case was that it performed exactly the way we wanted and mirrored our qualitative results.  It more or less did that with few surprises.

The other purpose was to narrow down our value for $h$.  Based on its performance in various gear sets, I would bound the “acceptable” range between $h=2.9$ and $h=3.3$.  Pretty much any value in there gives us reasonable results, and the differences are fairly minor.  The high end of that range tends to overvalue stamina and increases our sensitivity to noise, while the low end tends to undervalue avoidance and reduce our discrimination threshold for meaningful differences.

But our choice here is fairly arbitrary.  The metric will “work” with any value in this range.  As I said earlier, the actual numeric values are a bit arbitrary, what’s important is that they give us the correct relative relationships between gear sets and stats.  We can’t definitively say that “haste is 1.534 times better than mastery,” so it’s not critical that we pick HDF that precisely.

In fact, the reverse is more worrisome – that someone will see an HDF of 3.2, get results that suggest that haste is exactly 1.534 times better than mastery, and assume that it’s gospel.  Obviously there will be some exact relationship given by the metric, because that’s how numbers work.  But it really shouldn’t be interpreted as if we’re getting exact relationships, because that’s not how the index was conceptualized or defined.

Likewise, you shouldn’t look at two gear sets with TMIs of 10k and 250k and decide that the latter is 25x worse than the former, or that you’re 25x more likely to die while wearing it.  That’s not what the index tells you.  It tells you that one gear set is a lot worse than the other, of course, but you can’t extrapolate a likelihood to die out of it.

Going back to the Dow Jones Industrial Average (DJIA) example: if the DJIA goes down 10 points one day, it does not mean that the entire market went down uniformly by a proportional amount.  Some stocks may have done well, others not.  The DJIA just gives you a general sense of “how the market is doing.”  You will generally be able to say that a day when the Dow goes up is better than a day when the Dow goes down, and you might even be able to make rough estimates of the magnitude from that (i.e. going up 20 points vs going down 10 points).  But it’s important to keep in mind that the Dow is an arbitrary indicator.  You might get different statistics if you chose a completely different 30-stock basis set, but if you only swapped one stock out it would probably be fairly similar.

In our case, we get very different results for $h=2$ than we do for $h=4.5$, but fairly similar results for anything in the $h=2.9$ to $h=3.3$ range we’re considering.  So it’s up to us to make a final decision about “which stocks to pick” for our index.

I’m going to make an arbitrary decision and go with 3.0 for a few reasons.  First, it keeps the value towards the low end of the range to help combat statistical noise and artificial stamina inflation.  Second, we’ve already got loads of data presented here for $h=3.0$, which is convenient.  I also think that by choosing an integer, we make it a little more clear that the choice is arbitrary rather than some sort of exact, precise value.  Most of all, it seems to be a good compromise – I rarely saw values in the 3.0 line of an HDF table that disagreed with the intuition I got from the data.

That’s it for today.  Now that we’ve finished deciding on an HDF, the next step will be to clean up the data representation with a normalization scheme.  That’s what we’ll tackle in the next blog post.  We’ll also show why you can’t compare TMI values generated by different bosses.

This entry was posted in Tanking, Theck's Pounding Headaches, Theorycrafting and tagged , , , , , , , , , , , , . Bookmark the permalink.

19 Responses to The Making of a Metric: Part 2

  1. Qaajn says:

    Perhaps I simply misunderstood you, but I have read in a couple of your blogs now that you add 1000 stamina or haste or… and compare the results. I was wondering why you use an equal amount of stamina, when 1000 stamina is equal to 1500 secondary stats in item budget.

    I also have a concern about the use of an arbitrary metric, specifically this:

    “But our choice here is fairly arbitrary. The metric will “work” with any value in this range. As I said earlier, the actual numeric values are a bit arbitrary, what’s important is that they give us the correct relative relationships between gear sets and stats. We can’t definitively say that “haste is 1.534 times better than mastery,” so it’s not critical that we pick HDF that precisely.”

    I apologize beforehand if this sound patronizing. The intent is to get my concern across and is not any attack on you or your work.

    My problem with this is that you choose a metric that fit with your -expected- finding of how gearing is best done. Of course most would agree that taking more bigger chunks of damage are more likely to kill you, but if you want the TMI to be general it need to be consistent for similar damage patterns regardless of tanking mechanics or specific damage. If the mitigated damage pattern of a paladin and a Death Knight is the same, then they should get the same index. Their way to reach it may be different, but the same pattern should result in the same index regardless of how it is attained. Similarly, a paladin that takes 3×50% health attacks should get the same index and consequently stat weights regardless if they are level 1 or 90 (assuming access to all relevant tanking tools). The approach decides how much a certain way of doing things are better then another, but it seems like all it does is say “this is better because that is what we think it should work”.

    I believe you can’t criticize something while not providing any inputs on improvements. In my eyes, what you should focus on is not as much the total damage taken, but the pattern it arrive in. For example, it would be much easier healing a damage pattern looking like 40%-40%-40%-40% then it would be 20%-20%-40%-80% even though they deal the same total damage in the same time. Correct me if I’m wrong, but the index right now treat both those scenarios equal. Perhaps a cumulative measure of “healing needed to make the events non-lethal” would prove the most useful. That would make my first example be 0%-0%-20%-60% and 0%-0%-0%-60% respectively, and the later obviously “spike” more then the first. This would evaluate avoidance greatly as it reduces total damage taken but not spikes, but you could probably get some sort of probability distribution in there to deal with it. It should also be consistent and work for all the tanking classes in various situations.

    My wish is that you find a neat system to apply to tanking, and that you can extend it to include (at least!) Death Knights soon! :)

    • Qaajn says:

      “This would evaluate avoidance” -devaluate-

    • Theck says:

      In earlier blog posts, I very specifically did compare, say, 1000 haste to 1500 (or 750) stamina to address the itemization issue. However, in this metric we do not want to do that because it will actually be generating stat weights. Whereas in earlier blog posts, we wanted to make a clear determination as to whether, e.g., a stam gem was better than a haste gem, here we want per-point stat weights.

      The reason should be pretty clear: when you put those stat weights into wowhead, AskMrRobot, or any other tool, the tool takes care of the itemization. For example, if it’s comparing a 320 haste gem to a 240 stamina gem, it’s going to multiply 320 by your haste stat weight and compare to 240 times your stamina weight; thus the itemization is already taken into account by the tool, and as such it’s expecting per-point stat weights.

      Regarding your concern: I think you are massively mis-interpreting the point of developing the metric. We already have smoothness data (see one of numerous blog posts over the last 6 months), and have interpreted it qualitatively to make determinations about how stats perform for smoothness. So yes, we’re trying to fit the metric to our expectations, but those expectations are not whimsical. They are strongly supported by a mountain of data, both from my simulations and from WoL analysis.

      More to the point, I think you need to re-read the development a little more carefully. You say that the metric should focus “not as much on total damage taken, but the pattern it arrives in.” But that is, in fact, EXACTLY what the metric does.

      In fact, this metric does not tell you ANYTHING about total damage taken. It is ENTIRELY determined by the “pattern” that damage arrives in, because it is based on a histogram of a short-term moving average of damage taken. So a 20%-20%-40%-80% string will be much worse than a 40%-40%-40%-40% string.

      This is also why it will automatically be applicable to other tanks. If a paladin and a DK both take strings of 20%-20%-40%-80%, that string will produce exactly the same value with the metric It doesn’t care at all about the details of your class mechanics, just the observable changes in your health timeline.

      • Qaajn says:

        “The gear sets are variants on the Control/Haste setup – the first is just C/Ha, followed by sets where I add 1000 of a given stat.”
        and
        “So according to this data, we would conclude that haste is better than mastery, though not by a huge amount. Dodge and parry are both worse than haste, but stamina is a little better.”
        – From The Making of a Metric part 1

        Perhaps I’m reading this wrong then, but from adding 1k stamina to one set and 1k haste to another you seem to draw the conclusion that stamina is better then haste.

        I spent about 4 hours today reading a bunch of your different posts, so I’m not unfamiliar with them. What I’m saying is not that they don’t fit the current data, not at all, they seem to do so very well. My concern is that should -any- mechanic change you would need to completely redo every analysis. I also realise that I expressed myself slightly unclear. By “Total damage taken” I meant the total damage for the string (here of 4 combined strikes).

        Correct me if I’m wrong, but what you do is add the damage of four consecutive strikes together. This number represent one of the events in your histogram.You then do this for all the strikes and from this form a probability distribution for what the damage of any four consecutive strikes would do combined. I assume this as if you took the avarage damage for each of the four strikes (what is called a four-point moving avarage at http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev7.shtml), they could never be higher then the maximum damage of any single strike, and if a single strike could reach more then 100% health at worst case then there is nothing we can do except get better gear to even have a remote chance of surviving. Going back to my assumption that you simply combine the damage for the last 4 events, you agree with me what would more likely kill us, but this does not, however, address my point. Look at this string of events:

        20-20-40-80-20-20-40-80-20-20 and so on.

        The damage for the first 4 events would be 160%. For the 2-5 event it would also be 160%. Obviously this keeps repeating, and you would get 100% of the strings at 160%. Now compare this with this string:

        40-40-40-40-40-40-40-40-40-40 and so on.

        This would obviously -also- become 160% always. Yet as you seem to have agreed with, this is much less dangerous then the first string.

        Naturally, strings when we tank will not behave in set repeatable patterns and we won’t get distributions where all damage is always the same, but that is my point. A high even pattern is far less likely to kill someone then an as high but uneven pattern. Strings consisting of four events are fairly narrow but does not in my view address this enough, and since they doesn’t take into account the individual damage of each event, it could not reliably be used to compare different classes or ways to tank. It is, however, a vast improvement from just looking at DTPS. But why stop there…

        I hope that I made myself clearer and that I actually understood your data correctly.

        • Qaajn says:

          If it’s still is unclear, by the numbers of the strings I mean the damage of each -individual- strike, not the combined damage of 1-4, 2-5, 3-6 and so on. That is, the damage pattern -before- you do your 4 strike moving average, if I understand your use of this correctly.

        • Theck says:

          “Perhaps I’m reading this wrong then, but from adding 1k stamina to one set and 1k haste to another you seem to draw the conclusion that stamina is better then haste.”

          Again, in the context of that post, it is, because we’re comparing 1:1 for the sake of generating stat weights. One then has to apply itemization considerations to that to compare the stats on an item-by-item basis (i.e. 2:1 on gems, 1:1.5 on trinkets, etc.).

          You have basically the correct interpretation of what we’re doing with the 4-attack moving average.

          However, I would step back a second and argue that while the 20-20-40-80 sequence is more dangerous than a 40-40-40-40 sequence, I think you’re splitting hairs a bit overmuch.

          For one thing, that flexibility is built into the metric, though it won’t be apparent until tomorrow’s blog post. If you really are dealing with bosses that can suddenly hit you for 80% of your health in one GCD, then you may want to consider using a 2-attack moving average rather than a 4-attack moving average. That will be covered in tomorrow’s post, in the section on Normalization.

          More importantly, that’s generally not the way deaths happen. A large, 80% of your health attack is almost always intended to be countered with active mitigation. Spike death from one of those events is rarely helped by any stat except Stamina in the first place, so the answer to that sequence is “stop playing badly and use AM correctly.”

          The more common case of tank death is a trickle-down effect caused by several attacks over a small time window. And at that point, the moving average of damage taken in that representative time window is the important metric. I’ve used 4 attacks here because generally it’s a 3- or 4-attack string that does you in, and for anything outside of 25H content it’s generally closer to 4. Bosses in 25H rarely hit for more than 300k after armor mitigation, so a tank with ~800-900k HP raid buffed should be able to easily survive about 4 hits assuming one attack is covered by a block/avoid or actively mitigated.

          So I would argue that more commonly, the distribution of damage within that 6-second period isn’t as critical as the total damage in the period, except in edge cases where we’re truly considering attacks that can nearly global the tank. Whether we’re looking at 30-30-30-30 or 50-20-30-20 isn’t really that important, because both are spikes that will kill you without some sort of healer intervention or reactive measure.

          Really, I think your disagreement comes down to an over-estimation of boss throughput. Boss melee attacks never reach 80% of your health; in fact, the more common case is 30%-40% from a full hit, and attacks covered by some sort of mitigation fall in the 15-25% range. In that scenario, you don’t care about 2-attack moving averages because you are definitely going to take 2 back-to-back boss melees at some point, so it becomes throughput damage. You make sure you have enough healers to handle it and just deal. Those 40-40-15-20 (or sometimes less, if you avoid one) aren’t atually that dangerous. It’s when you take sequences of 3 or 4 full attacks (40-40-40-40) that you’re in danger and your healers panic.

          If we really did see bosses meleeing for 80% of your health in one swing, then
          a) DKs would be royally fucked, and
          b) we’d certainly have to adjust the metric because we’d suddenly care about a shorter window, maybe 3-4 seconds instead of 5-6.
          But luckily, that’s the exception rather than the rule in raiding content. Since it’s an edge case, it doesn’t make sense to build the metric around it.

          However, please reserve judgment until tomorrow’s blog post goes up – I think you’ll be pleased when you see that the metric definition is versatile enough to handle those edge cases (i.e. using tomorrow’s nomenclature, you might want to calculate TMI-3 instead of TMI-6 for a particular boss that can produce your 20-20-40-80 example).

          • Qaajn says:

            Thanks for the detailed reply!

            I’m not so used to how stat weights are genereted, nice to see that clearified.

            As you might have been able to tell from hints in some of my posts, I come at this with the eyes of a DK. While the attack sequence was just an example, I can see what you mean by 80% swing. It was meant as a way to show things like special (none-back-to-back) abilites, as only rarely would a tank die purely from melee in my experience. There is either something else distracting healers or something more going on to the tank.

            Looking forward to and eagerly await tomorrows post… I will hold back my judgement for then :)

          • Theck says:

            Boss specials throw a wrench into the works, but not as badly as you may think. Certainly if the boss has a really hard-hitting special that you fail to actively mitigate, your relevant spike window shortens. Now you might care about Special+melee combinations.

            My experience is actually the opposite – tanks *usually* die from regular melees. The reason is sort of perverse, but those big special attacks end up being not so dangerous (to a Paladin, at least) because they are usually both predictable and mitigated by SotR. That combination is very powerful, because a smart paladin can generally have a 45%+ cooldown (SotR) up for every special attack, bringing them down to the size of a regular melee or smaller.

            In addition, since those spikes are predictable, your healers are also aware they’re coming, so I often get preemptive absorption shields and pre-cast heals.

            What usually causes death is something I can’t control. For example, a healer dying and getting battle-ressed without another healer stepping in to cover, or a healer that’s forced to move. Anything that causes a healer to interrupt their usual healing stream and produce lower throughput. It’s when those lower-throughput periods overlap with a damage spike that I run into trouble.

            You could probably categorize tank deaths into two categories:
            A) “Oops” deaths – big boss special attack that wasn’t properly counted with AM
            B) “Trickle-down” deaths – sequences of 3-4 melees mixed with DoT or environmental damage.

            I tend to treat A as “oops” because with proper play they become mostly irrelevant (though edge cases exist – Lei Shi, Dread Thrash, etc.). Modeling B is obviously easier, and what I’ve generally focused on. But the beauty of moving to Simcraft is that we can now model both.

            I think the TMI metric will end up being very relevant to both cases when applied properly, because it doesn’t make any assumptions about mechanics.

      • Dalmasca says:

        I am curious to see if different HDFs work better for representing the other tank models though. That might speak to how sensitive those models are to edge-effects themselves, and how they vary from the paladin model, right?

        • Theck says:

          I’m not sure that it will matter that much, to be honest. Edge effects are certainly better with a high HDF, but I think most tanks will have those edge effects (at the very least, armor will always cause it because it acts much like mastery does for paladins, so the armor scale factor would be very sensitive at lower HDF).

          In any event, if we go with the all-inclusive definition of TMI (which is what I have planned), edge effects no longer matter. And at that point, almost any HDF works, it just changes the relative value of stats. Most of this post was focused on finding an HDF that fit our expectations (i.e. didn’t tell us that stam was twice as good as haste when it clearly wasn’t in the data tables).

  2. Wrathblood says:

    Friggin brilliant, and not just the usual “oh wow, thats a neat way of solving problem x”, I mean its a groundbreaking way of comparing tank survivability and making it broadly accessible.

    Also, and this is trivial but conceptually nice, is I finally in a modelling way why tank damage is irrelevant. If one tank takes more damage but is less spikey that means the other tank is taking more damage in the lower weighted less dangerous situations but less damage in heigher weight situations. all damage isnt created equal and now thats quantified.

    • Wrathblood says:

      Whats needed now is to evangelize the cincept to string theorycrafters of other classes, idealy to take over simcraft modules, but at a minimum to bring them into the discussion on the methodology. while fasc, zarko and others have created master spreadsheets to compare survivability this is the first time that spike damage has been incorporated.

    • Theck says:

      I think I have the metric more or less working properly in the SimC trunk already, so encouraging other theorycrafters to help improve their SimC modules will definitely be the fastest way to start making meaningful comparisons between tanks. I want to do some fine-tuning yet, but as far as I can tell it’s working reasonably well.

  3. Schroom says:

    Just to get that clarified, are you considering Talent builds in your sims? I guess it sounds logical that stat weights for a tank using SS would be different than if he would use Selfless healer. (like warlocks using different Grimoires is a good example )

    For tanks, in MoP to get a real idea, one should also take more factors into account than survivability. I usually follow the rule “If I am in practically no danger of dying anyway in a certain content (usually because of gear), there is no sense in going for more survivability. At that point, where my survivability is good enough anyway, I grant more help and support to my Raid in generating more DPS and more Healingsupport than to concentrate on even more survivability I don’t need anyway.”
    You usually don’t see a tank that is full hero thunder forged with pure stam gems exactly because of this.
    Whereas a tank with 505ilvl gear tanking HM Raids probably would go full stam, because he is more worried about surviving than actually rocking the DPS and HPS meters.

    That is why IMHO FOR ME and MY character, haste is better than mastery and even stamina in every way (at least ‘til reaching 50% meleehaste), because it provides not only so much survivability but so much more support in DPS and Self- and Raid HPS.

    This is IMHO why simc is a lot better for DPS classes, as they ONLY worry about maximizing their DPS. We tanks, especially paladins being a full hybrid have to worry about a lot more to really maximize our performance as a player.

    A real representative model should consider all of those points.

    • Thels says:

      “We’ll use a boss that swings for 350k after mitigation every 1.5 seconds, the standard SH1 finisher priority, back-calculated Seal of Insight with no overhealing apart from inherent, and Sacred Shield enabled.”

      The simulation was ran based on those parameters. Naturally, I assume you’re able to shift these values around, ie, specify how hard this is hitting you.

      Remember that Theck ain’t calculating stat weights here! He’s providing the tools for use to calculate our own stat weights, based on our current gear set, and boss opponents.

      There’s just one thing I’m wondering about, and curious to see if it shows up in the next post. Would severely in/decreasing the boss damage, or severely in/decreasing our total amount of stats affect the validity of the HDF?

      Also, I assume other classes would have to calculate their own optimal HDF? Or should they be fine with 3.0 as well? In the latter case, would that mean we could compare TMI between different classes?

      • Theck says:

        I don’t think independent calculations of HDF are necessary, but they’re welcome to try. HDF is essentially saying “an attack that is 10% larger is $h$ times worse.” That’s mildly arbitrary, and we’ve chosen 3 because it seems to do the best job of matching our qualitative analysis. But once you’ve set that bar (no matter *where* you set it), you could apply it equally to all tanks. Presumably if an attack that’s 10% larger is $h$ times worse for tank A, it’s also $h$ times worse for tank B.

        So yes, one of the long-term goals is to compare TMI between different classes, just like we could currently compare DPS or DTPS.

    • Theck says:

      Schroom: SS is included, but obviously in SimC you’ll be able to tweak all of those sorts of configuration variables to your heart’s delight.

      As far as your “more factors than survivability,” I think you’re misinterpreting what this metric is “for.” TMI is not supposed to be an all-inclusive “best tank EVAR” measurement, any more than DTPS or DPS would be. All three of those (and potentially even raid HPS) are factors that a good tank has to consider, and the weight each tank puts on each of those factors will be different. Which, frankly speaking, would make any sort of universal tank metric that tried to mix all of those factors pretty useless in the first place.

      So, I’m not sure I really agree with your SimC assessment. It’s a tool that gives you information, and it’s up to you to interpret it. I fail to see how it isn’t very powerful to have a tool that can tell you exactly how your DPS, DTPS, and TMI change with a certain gear tweak or talent change. It’s providing all of the information you’re interested in so that you can make an educated decision based on your raid’s needs.

      Also worth noting: one of the other SimC devs and I are hoping to be able to tweak some things so that it can calculate all of those factors in one run. In other words, a single run will give you your DPS, DTPS, TMI, and scale factors for all three of those metrics.

  4. Schroom says:

    Having all of those factors calculated in one run would offer a lot more potential. Would it calculated different stat weights for each of them so we can really compare between TMI/DPS for say haste? This would be amazing tho and make it much clearer.

    See I am always concerned about the not so good, and new players who find a tool like this (because the good ones are using it, so I have to use it too) but use it completely wrong (Ask MrRobot is a good example of this phenomena). Having all 3 factors calculated at once could prevent some of this and help understand.

    I still think TMI and HPS have to be explained in laymen’s terms as it is pretty complex. So even non theorycrafters can understand why a TMI of X is good and Y is bad. Can set own goals as to: I want to reach at least X TMI, after that I think I can start worrying about other stuff.

    • Theck says:

      Yeah, the concept is that you’d run the sim once and be able to see scale factors for each of them. There’s no reason SimC couldn’t do that stuff already – it’s all the same data after all. It’s just that the scale factors subsystem is only designed to run on one of them at a time. It shouldn’t be that hard to tweak it to produce everything, though we may have to make the reports a little clearer so that they’re readable.

Leave a Reply to Theck Cancel reply